Saturday, October 29, 2011

Big Data Starts with ABCs














If you haven't noticed Big Data has created a lot of buzz lately.  Much of the buzz is from the absolute wow factor of how big is big.  With the number of smart phones nearing 6 billion all creating content, Facebook generating over 30 billion pieces of content a month and data expected to grow at 40% year on year it's easy to imagine big really is BIG.

In fact the digital universe has recently broken the zettabyte barrier which is approximately equal to a thousand exabytes or a billion terabytes.  How big is that?  To give you an idea of scale it would take everyone on the planet posting to Twitter 7*24 for 100 years to generate a zettabybe.

So you get the idea - it’s really big. 

As an IT organization you may be thinking that your own data growth will soon be stretching the limits of your infrastructure. A way to define big data is to look at your existing infrastructure, the amount of data you have now, and the amount of growth you're experiencing.  Is it starting to break your existing processes? If so, where?

“Big” refers to a size that's beyond the ability of your current tools to affordably capture, store, manage,and analyze your data. This is a practical definition since “big” might be a different number for each person trying but unable to extract business advantage from their data.


When we talk to our customers, we find that their existing infrastructure is breaking on three major axes:

  1. Complexity.  Data is no longer about text and numbers; It includes real-time events and shared infrastructure. Data is now linked at high fidelity and includes multiple types. The sheer complexity of data is skyrocketing. Having to apply normal algorithms for search, storage and categorization is a lot more complex.
  2. Speed.  How fast is the data coming at you? High definition video, streaming over the Internet to storage devices, to player devices, full motion video for surveillance – all of these have very high ingestion rates. You have to be able to keep up with the data flow. You need the compute, network and storage to deliver high definition to thousands of people at once, with good viewing quality. For high performance computing you need systems that can perform trillions of operations and store pedabytes of data per second.
  3. Volume.  For all of the data you are collecting and generating you have store it securely and make it available for ever. IT teams today are having making decisions about what is “too much data”. They might flush all data each week and start again. But there are certain applications like healthcare where you can never delete the data. It has to live forever.

These trends in data growth are something we at NetApp have been following for quite a while now.  We’ve been enhancing ONTAP to deal with the scale needed to handle large repositories of data and we have also made strategic acquisitions anticipating the need for high density high performance (Engenio) and infinite content repositories (Bycast).

In conversations with our customers dealing with the onslaught of data we have noticed 3 important use cases that are stretching the limits of their existing infrastructure.

We’ve named these axis’ the ABCs of Big Data.

  • Analytics.  - Analytics for extremely large data sets to gain insight and take advantage of that digital universe, and turning it into information. Giving you insight about your business to make better decisions.
  • Bandwidth - Performance for data-intensive workloads at really high speeds.
  • Content - Boundless secure scalable data storage that allows you to keep in forever.