Data Backup and Recovery Considerations for Hadoop and Big Data

IBM says that 90% of the data in the world today has been created in the last two years alone. IBM  also says that 80% of data captured today is unstructured. Sources of unstructured data are, among others, posts to social media sites, digital pictures and videos, point-of-sale systems. All of this unstructured data can be termed as Big Data.

Salvus Data ConsultantsBecause of the wide-ranging benefits that small and medium size businesses can gain from Big Data in today’s competitive world, many are implementing a local Big Data strategy. To help businesses of all sizes manage Big Data, there is Hadoop. The Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

The Hadoop project has various elements. Below are a few of the more pertinent :

  • Hadoop Common – libraries and utilities
  • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity servers while providing high aggregate bandwidth across the cluster.
  • Hadoop MapReduce – The “Map” step takes the input, divides it into smaller sub-problems, and distributes them to worker nodes.  The worker node processes the smaller problem, and passes it to its master node. The “Reduce” step then collects the answers to all the sub-problems and combines them to form the output.

As stated by Nathan Coutinho in his CDW Blog article  5 Ways to Future-Proof Your Data Center for Big Data “The whole point of Hadoop is to keep the data local on commodity servers and economical local storage…”

Small and Medium size businesses find Hadoop attractive because of it ability to provide high availability to data on local commodity servers.

A data strategy is never complete without a Data Backup and Recovery strategy. A Big Data implementation using Hadoop presents a need for even more focus on the ability to recover from a catastrophic event quickly. However, the SMB is not often staffed or tooled to design and execute a backup strategy of this level of complexity.  The other consideration is that since the attractiveness of Hadoop is to use local servers, there is a further need to implement a data backup and recover strategy that can be managed remotely but not have a requirement that the live data be transferred to or running in a cloud environment.

There are Data Backup/Recovery Managed Service Providers (DB/R MSP) that provide remote management of the Backup process, along with professional Disaster Backup and Recovery consultation. Contracting an DB/R MSP with the model of remote DB/R management allows the SMB to maintain their data locally without the need to hire new staff or train existing staff in sophisticated data backup and recovery processes. Additionally, the SMB can have a comprehensive Data Backup and Recovery strategy while housing their Big Data locally.

How Does the SMB Execute Proper Data Backup While Data Types are Increasing in Complexity

The complexity of data is increasing. More types of data are needing to be included in business analytics for a company to best understand its place in a market. Business data is no longer the traditional structured type that comes from ERP and CRM applications. To remain competitive, the SMB must include unstructured data from POS systems, Social Media and other sources.

Big Data Backup and recoveryThis increased complexity increases the already stretched ability of the Small and Medium size business to handle Data backup and recovery. The Spicework’s Voice of IT report says that “…while over half (55%) of respondents believe their company’s spend is sufficient, 30% believe it’s less than sufficient. So what’s getting the “backup” treatment? Not surprisingly, the majority (67%) of SMBs back up highly sensitive data. But, this is only slightly more than the 60% backing up moderately sensitive data. Nearly 40% of respondents back up low sensitivity data.”

An offshoot of the new data requirements and example of increased complexity is NoSQL, also referred to as Not Only SQL. NoSQL  is a database design that’s used for very large sets of data, also called Big Data.

A Search Data Management post states “Arguably, the most popular NoSQL database is Apache Cassandra. Cassandra, which was once Facebook’s proprietary database, was released as open source in 2008. Other NoSQL implementations include SimpleDB, Google BigTable, Apache Hadoop, MapReduce, MemcacheDB, and Voldemort.”  “NoSQL is often mentioned in conjunction with other big data tools such as massive parallel processing, columnar-based databases and Database-as-a-Service (DaaS).”

The SMB has limited budget and Data Backup know-how. Their market has become global, while data has increased in complexity. And, nothing has reduced the need to provide protection against unplanned catastrophic events.

Where does the small and medium size business turn to develop and execute a data backup and recovery strategy – a strategy that will not reduce the company’s ability to compete, but, will increase its viability and sustainability.

Let’s look at the Salvus Cloud Management model. Salvus Data develops and remotely executes the SMB’s business critical backup strategy, while the SMB’s data remains within the network walls of the SMB.

With the Salvus Cloud Management Data Backup/Recovery solution, the SMB experiences the latest backup and monitoring solutions without the overhead and headaches to maintain it. In addition, the SMB has:

  • access to a team that spends their time doing backups 7×24 365 days a year.
  • the option to have Data Backup/Recovery efforts in-house or at the location of their choice.
  • an American partner who understands business needs within the United States.

The Salvus Cloud Management centrally monitors and manages the SMB backup environments more cost-effectively and with increased service levels. Salvus Data Consultants manage the SMB backups. Salvus Data Consultants can assist the SMB in controlling costs and eliminate inefficiencies by aligning business unit priorities with IT capabilities, helping the business achieve the objectives that are essential for growth. For more information contact Salvus Data at 903-201-7233.