Data Backup and Recovery Considerations for Hadoop and Big Data

IBM says that 90% of the data in the world today has been created in the last two years alone. IBM  also says that 80% of data captured today is unstructured. Sources of unstructured data are, among others, posts to social media sites, digital pictures and videos, point-of-sale systems. All of this unstructured data can be termed as Big Data.

Salvus Data ConsultantsBecause of the wide-ranging benefits that small and medium size businesses can gain from Big Data in today’s competitive world, many are implementing a local Big Data strategy. To help businesses of all sizes manage Big Data, there is Hadoop. The Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

The Hadoop project has various elements. Below are a few of the more pertinent :

  • Hadoop Common – libraries and utilities
  • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity servers while providing high aggregate bandwidth across the cluster.
  • Hadoop MapReduce – The “Map” step takes the input, divides it into smaller sub-problems, and distributes them to worker nodes.  The worker node processes the smaller problem, and passes it to its master node. The “Reduce” step then collects the answers to all the sub-problems and combines them to form the output.

As stated by Nathan Coutinho in his CDW Blog article  5 Ways to Future-Proof Your Data Center for Big Data “The whole point of Hadoop is to keep the data local on commodity servers and economical local storage…”

Small and Medium size businesses find Hadoop attractive because of it ability to provide high availability to data on local commodity servers.

A data strategy is never complete without a Data Backup and Recovery strategy. A Big Data implementation using Hadoop presents a need for even more focus on the ability to recover from a catastrophic event quickly. However, the SMB is not often staffed or tooled to design and execute a backup strategy of this level of complexity.  The other consideration is that since the attractiveness of Hadoop is to use local servers, there is a further need to implement a data backup and recover strategy that can be managed remotely but not have a requirement that the live data be transferred to or running in a cloud environment.

There are Data Backup/Recovery Managed Service Providers (DB/R MSP) that provide remote management of the Backup process, along with professional Disaster Backup and Recovery consultation. Contracting an DB/R MSP with the model of remote DB/R management allows the SMB to maintain their data locally without the need to hire new staff or train existing staff in sophisticated data backup and recovery processes. Additionally, the SMB can have a comprehensive Data Backup and Recovery strategy while housing their Big Data locally.

Why is the SMB relying more on Big Data?

Big Data is becoming more critical to the SMB.

Data backup and recoveryWhy is the SMB relying on Big Data that was once only important to the large enterprise? The SMB simply recognized the need to analyze factors that affect their ability to remain competitive in a global market. These factors that affect the SMB business are concealed within the depths of structured and unstructured data.

To remain competitive, the SMB must understand sentiment, cyclical headwinds and other related influences. These non-traditional factors are contained in unstructured data from Social Media and News sources. As XO states about unstructured data: A massive surge in unstructured data creates tough challenges for IT. Businesses need more intelligent network infrastructure to support their Big Data plans.

Structured data that would describe the typical customers and sales data in the CRM and ERP applications would be challenging enough. In the Post “Big Data” Means More than Just Lots of Data, a white paper from Mint Jutras suggests that “businesses of all sizes need to be concerned about Big Data.”

More SMBs are understanding that they need to analyze structured and unstructured data as a single source. That is Big Data.

With the importance of high volume and velocity data increasing, data backup and the ability to quickly recover data after a disaster is critical. Let us know how these trends have affected your business. If you have questions about strategy, call us at 903-201-7233 or email us at info@salvusdata.com