Data Backup and Recovery Considerations for Hadoop and Big Data

IBM says that 90% of the data in the world today has been created in the last two years alone. IBM  also says that 80% of data captured today is unstructured. Sources of unstructured data are, among others, posts to social media sites, digital pictures and videos, point-of-sale systems. All of this unstructured data can be termed as Big Data.

Salvus Data ConsultantsBecause of the wide-ranging benefits that small and medium size businesses can gain from Big Data in today’s competitive world, many are implementing a local Big Data strategy. To help businesses of all sizes manage Big Data, there is Hadoop. The Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

The Hadoop project has various elements. Below are a few of the more pertinent :

  • Hadoop Common – libraries and utilities
  • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity servers while providing high aggregate bandwidth across the cluster.
  • Hadoop MapReduce – The “Map” step takes the input, divides it into smaller sub-problems, and distributes them to worker nodes.  The worker node processes the smaller problem, and passes it to its master node. The “Reduce” step then collects the answers to all the sub-problems and combines them to form the output.

As stated by Nathan Coutinho in his CDW Blog article  5 Ways to Future-Proof Your Data Center for Big Data “The whole point of Hadoop is to keep the data local on commodity servers and economical local storage…”

Small and Medium size businesses find Hadoop attractive because of it ability to provide high availability to data on local commodity servers.

A data strategy is never complete without a Data Backup and Recovery strategy. A Big Data implementation using Hadoop presents a need for even more focus on the ability to recover from a catastrophic event quickly. However, the SMB is not often staffed or tooled to design and execute a backup strategy of this level of complexity.  The other consideration is that since the attractiveness of Hadoop is to use local servers, there is a further need to implement a data backup and recover strategy that can be managed remotely but not have a requirement that the live data be transferred to or running in a cloud environment.

There are Data Backup/Recovery Managed Service Providers (DB/R MSP) that provide remote management of the Backup process, along with professional Disaster Backup and Recovery consultation. Contracting an DB/R MSP with the model of remote DB/R management allows the SMB to maintain their data locally without the need to hire new staff or train existing staff in sophisticated data backup and recovery processes. Additionally, the SMB can have a comprehensive Data Backup and Recovery strategy while housing their Big Data locally.

Offsite Data Backup Should Not Be A Distraction

Executing an effective backup strategy should not distract from the IT department’s main focus of managing a quality environment. Data backups should be performed efficiently to ensure recovery, if ever needed, is quick and successful.

Salvus Data offsite backupWhy is  backup of the enterprise’s data storage more complex than sometimes expected. One of the reasons is because all data is not the same. The backup of the type of data must be considered.

Data backup and recovery is a learned discipline. Excessive time can be taken away from the main focus of the IT department in learning this discipline. Or, short cuts may be taken that cause severe issues when recovery is necessary. Refer to our post WHEN IS TIME MORE IMPORTANT THAN MONEY? – DURING DATA RECOVERY.

An offsite data backup process performed by an experienced managed service provider (MSP) offers efficiencies in the backup process and effective recovery of lost, deleted or corrupted data.

Salvus Data is a managed service provider offering an offsite data backup program for businesses. Salvus uses IBM’s Tivoli Storage Management (TSM) to provide optimal results to its customers. Call us to discuss your needs and see how we can help. 903-201-7233