How Big is Big Data?

I have been doing a lot of behind the scenes work lately (talking on the phone or in person with lots and lots of people – and also being sick and unable to write or talk, but thinking).  There are many good things that will come out of this past month of misery and agony (OK, not that bad — but gotta keep up the drama queen attitude so my daughters continue to have a role model).

In the middle of all this work, I was able to corral some interesting thoughts, especially as I dive deeper into Analytics and Big Data — I am sure you heard about the agent of doom (if the Mayans were wrong and the world does not end in December of 2012 that is) that is hanging over our heads.  Big Data, now measured in zettabytes and numbers that were never imagined is the looming dark cloud for organizations.  We will never be able to master so much data, less along process it and do something with it.  Definitely not be able to get value out of it.

Well, I am not so sure…

Big Data is nothing new.

We have had tons of data to manage for very long times.  If you really think it through, the problem is not Big Data.  Social, and its cousin UGC (user-generated content), create tons of information every day — nay, every second.  Won’t bore with you with numbers you can find elsewhere (like we generate the equivalent of 25,000,000 library of congress of content every nanosecond or whatever the killer stat de jour is), but the reality is that we are generating lots and lots of noise.  Not all that comes from Twitter is actual data (seriously, have you taken a look outside of what you traditionally cover?) nor all user-generated content is related to you and your situation.

No, we did not grow the amount of data we handle by leaps-and-bounds overnight, but we did grow out abilities to process it faster and more efficiently (partly thanks to in-memory processing, partly thanks to better data manipulation and storage techniques, and partly due to increases in horsepower for computers — think Moore’s law).  The problem for virtually everyone is not how to handle and manage and what to store (well, maybe this is a partial problem – more later), but what constitutes data to us.  Indeed, the greater challenge to organizations is not how to manage Big Data, rather how to separate data from noise and just handle data and discard noise.

You don’t need a new analytics strategy, you need a new filtering strategy .

The title of this post makes reference to many conversations I had with seasoned practitioners lately where we discussed analytics and social data.  The consensus, and these are some of the largest and most active organizations in the world, was that around 10% of their data now comes from Social.

Not what you expected – right?  I mean, as little as a month ago I was giving a speech and (mea culpa) I said that social data will increase the amount of data an organization needs to handle by 20x-100x.

Of course, revising that today (although, to my benefit I did mention that most of that was noise) I’d say that organizations get bombarded by Big Noise, not Big Data — data is what is filtered out of that noise.  The resulting data is not  something you need to fret about how to handle; year-over-year data growth for a business is not that different from ten percent.

Good time to shift strategies from panic, knee-jerking mode to calculated, strategic mode – don’t you think?