How Big is Big Data?

I have been doing a lot of behind the scenes work lately (talking on the phone or in person with lots and lots of people – and also being sick and unable to write or talk, but thinking).  There are many good things that will come out of this past month of misery and agony (OK, not that bad — but gotta keep up the drama queen attitude so my daughters continue to have a role model).

In the middle of all this work, I was able to corral some interesting thoughts, especially as I dive deeper into Analytics and Big Data — I am sure you heard about the agent of doom (if the Mayans were wrong and the world does not end in December of 2012 that is) that is hanging over our heads.  Big Data, now measured in zettabytes and numbers that were never imagined is the looming dark cloud for organizations.  We will never be able to master so much data, less along process it and do something with it.  Definitely not be able to get value out of it.

Well, I am not so sure…

Big Data is nothing new.

We have had tons of data to manage for very long times.  If you really think it through, the problem is not Big Data.  Social, and its cousin UGC (user-generated content), create tons of information every day — nay, every second.  Won’t bore with you with numbers you can find elsewhere (like we generate the equivalent of 25,000,000 library of congress of content every nanosecond or whatever the killer stat de jour is), but the reality is that we are generating lots and lots of noise.  Not all that comes from Twitter is actual data (seriously, have you taken a look outside of what you traditionally cover?) nor all user-generated content is related to you and your situation.

No, we did not grow the amount of data we handle by leaps-and-bounds overnight, but we did grow out abilities to process it faster and more efficiently (partly thanks to in-memory processing, partly thanks to better data manipulation and storage techniques, and partly due to increases in horsepower for computers — think Moore’s law).  The problem for virtually everyone is not how to handle and manage and what to store (well, maybe this is a partial problem – more later), but what constitutes data to us.  Indeed, the greater challenge to organizations is not how to manage Big Data, rather how to separate data from noise and just handle data and discard noise.

You don’t need a new analytics strategy, you need a new filtering strategy .

The title of this post makes reference to many conversations I had with seasoned practitioners lately where we discussed analytics and social data.  The consensus, and these are some of the largest and most active organizations in the world, was that around 10% of their data now comes from Social.

Not what you expected – right?  I mean, as little as a month ago I was giving a speech and (mea culpa) I said that social data will increase the amount of data an organization needs to handle by 20x-100x.

Of course, revising that today (although, to my benefit I did mention that most of that was noise) I’d say that organizations get bombarded by Big Noise, not Big Data — data is what is filtered out of that noise.  The resulting data is not  something you need to fret about how to handle; year-over-year data growth for a business is not that different from ten percent.

Good time to shift strategies from panic, knee-jerking mode to calculated, strategic mode – don’t you think?

33 thoughts on “How Big is Big Data?”

  1. “You don’t need a new analytics strategy, you need a new filtering strategy” – Agree. It is about Big Noise, not Big Data. Convergys spoke about this during the MundoContact event in Mexico… They called it Social Decisioning Cloud…. see this: http://www.slideshare.net/mundocontact/14-b-vvalle – slide 18.

    However, you still need some sort of new data design (B2b2c, p2p, persons, customer experience, multichannels, ecommerce, etc) to manage big data with your analytics…

    1. do you need new data design or do you need someone who can understand the data and help you use it properly? i’d put money in #2 as long as the data model being used if functional. if not, does not matter what you do with big data – without a working data model in place it is all about just more noise

  2. Great post. Couldn’t agree more. A simple look at many of the twitter lists I follow can show how little is important to us for reading and how BIG is the NOISE which we DO NOT need.

    Finding efficient way to filter out the good content out of all this noise I follow is important for me for
    1. Reduce time I waste on internet
    2. Get meaningful info (knowledge) out of what is available to me

    Same is true for any organization which is looking at big data. Try spend more time in filtering good data than worrying about how big the data is.

    – @ColdPC

  3. Hi E, that 10% social sounds about right, as long as we’re talking about the data generate in social media and UGC etc. etc.

    All the data in the world means squat without a great hypothesis to test right? and someone who can infer how meaningful both the question and the results are (i.e. a “traditional analyst type”).

    The caveat I would put on this is that once all the data in the organisation becomes “freely available” (internally) and “on-net” then your data issue becomes more important again. It would be interesting to see under the hood at Salesforce.com to get early insights here me thinks.

    1. Paul,

      Did you read Wim’s post? he is saying basically the same things you are saying — sorta. Must be an European worry, testing the right hypothesis and all that jazz… we just shoot from the hip over here, you know? In all seriousness, I totally agree with you – but i find the second part of your comment most intriguing: when and how does the data issue become important again? do you mean the size of the data set? am i missing something there?

      Thanks for the read and comment!

  4. I don’t think you’re wrong, I just think you’re observing the problem from the end of the telescope. Quantity of data isn’t a new problem, there has always been too much, the only thing a business can decide on is who decides what’s important. Dept A filters one set of data one way, Dept B filters different data a different way.

    The only business I’ve seen address this is in a completely consistent way is WL Gore where all decisions are made in the same way: Deciding everything through multi-discipline project teams on the basis of ‘Can we win it and is it worth it?’ Career progress is determined by who gets asked to be which project teams – so if you all you get to decide is the brand of coffee in the vending machine then maybe your skillsets are best applied elsewhere. In this way the biggest decisions are made by the best people. That’s a real answer to the problem of dealing with big data, little data and all points in between.

    Best wishes

    1. Nigel,

      I am not familiar with WL Gore, but I am concerned by any organization that must use the same rules and processes to make data decisions for different parts of the same. I’d endorse, I imagine, using what works best for each part of the org, not to just use a similar process for the sake of using the same across the organization.

      I may not be understanding what you are presenting, but if I am correct in understanding that a single process and set of rules apply to all decision-making resolution, then I am seriously concerned.

      Thanks for the comment, looking forward to hearing more.

      1. Where I disagree with your original post is that by presenting it as a choice between better analytics and better filters seems to be a false choice between technological solutions when data is only meaningful when it gets into the hands of humans who deal with it inconsistent ways. The best scalpel in the world is only as good as the surgeon using it.
        There are certainly downsides as you say to having too rigid a way of approaching data-based decisions which is the attraction of WL Gore’s approach which is elegant, simple to grasp and rewards good decision making in a transparent, democratic way.

        1. Nigel,

          I am not sure I presented a choice between one or of the other — I don’t recall saying that anytime. I said that analysis w/o filtering is useless insofar as it analyzes bad data and thus produces bad results (and if I did not say it before, I am saying that now – hard to keep track of all said in past posts and now, sorry). I totally, absolutely, positively agree with you about people being the weak link and I did a series of posts last year on that same topic — we don’t have people who know what data is, less along sufficient people who know how to use it. No amount of filtering or analysis will make up for that — actually, filtering and analysis are biased by the people implementing them, so w/o knowledge of what its being done, likely to be biased in one way or another.

          However, I am still not certain that a “one solution to all problems” approach is the way to do it, but then again — i may be misunderstanding the model you suggest.

          Thanks for engaging, great convo (sorry for the latency in it, trying to get caught up).

  5. Spot on, what we need is efficient filtering indeed.
    In fact, that is what is missing in structured “regular” data too for the most part. Well that, and smart visualization. It kills me that in 2012, most SW cannot intelligently suggest some useful visualization options from a given data set to users.

    The 10% data that comes from social – we need to ask what decisions get driven by it, and what is the $$ impact. If it is low impact – and I suspect it is, then the right solution might be to do nothing for now, and wait for the big data analytics to get more mature and cheaper.

    1. Vijay,

      Thanks for stopping by and reading. I am with you on the problems you indicate for SW – I don’t think it has managed to keep pace with data innovation models and processes. However, you do bring an excellent point – there is no point in doing anything with metrics and data without the ability to correlate it with KPIs and proving the optimal results. This is where I fear we are lacking the most in our approach to understanding and using data.

      Time will tell, but I am hoping that these correlations will be find out soon enough.

      What says you?

  6. As I think about your call for a “filtering strategy,” three key, but interrelated filters come to my mind:
    – Marketing (or Business) goals/objectives filters: these set the stage for the type of data to be collected
    – Content filters: these would come out of the goals and objectives and constitute the value of the data (who, what, why, when, etc.)
    – Analytic filters: these would be the enablers or tools that would extract the data from the noise (search algorithms, key words, predictive analytics, profitability analysis, risk assessment, etc.)

    The more these three elements are aligned and integrated, the better the chances of honing in and acting on the data, while discarding the noise. Thoughts?

    Thanks for a thought-provoking post, Esteban!

    1. Wilson,

      Very interesting model. I cannot say I disagree at first (which only means try harder to find where to disagree — just kidding). I am not sure I fully get the differences, maybe I need to learn more before giving you an opinion. Give me some time to sleep, we can chat later about this.

      Thanks for the great comment!

  7. Very thoughtful. In my view it all feeds into the customization theme. We have to help brands collect and use the right data in order to customize the consumer brand expreriance.

    1. Unfortunately, no.

      Most, virtually all actually, use keyword, phrase, or NLP tools with complex rules set-once-and-unlikely-to-change that are not very accurate in their attenpts.

      Thanks for the read and the question. Not even sure where to recommend you find some of that software yet.

  8. Great comments.
    Much needed data convergence & social media consolidation, information aggregation, curation and great filtering tools (grouping, listing, ranking, prioritizing & selection means) as many mentioned previously said.

    Thanks for sharing Esteban!

    1. Thanks for the read and comment Maria,

      I find that the data convergence between social and corporate is not going so well, I wish there was more emphasis on that side of the world — but then, again, w/o filtering you cannot find the right social data to add to corporate data… somewhat of a catch-22 I guess.

      Thanks

    1. Kevin,

      Isn’t contextual evaluation of data part of filtering? Origin is definitely not the first level of filtering, but certainly lives up there in the filtering world from where i sit — and when I see origin, i am summarizing for the most common contextual inference, but will include all contextual metadata in that summary.

      Thanks for the read and comment!

  9. Heh Estaban – you really must have been poorly. Euan Spence has talked about effective filtering for many years. He recognised this as a problem while working at the BBC. That goes back around 10 years. Worth following his thoughts – he’s a very smart guy.

    At the analog level, I reckon we are becoming very good at filtering. I remember some 6 years ago, my (then 15 year old son showed me how he could flip between different social networks (at that time dominated by MySpace) while simultaneously playing games AND doing work. He didn’t skip a beat or miss a thing. I wonder if anyone has thought about that, studied to see how the younger generation operate? It’s us old farts that have the bigger problem methinks.

    1. Thanks Dennis,

      I had a crappy December, let’s just say that. I follow Euan, I love his work as well.

      I think that the diff between the new generations and the older ones (I agree with you on their skills on digital) is that because of positions in the enterprise for the next 8-10 years, we (older) are the ones “entrusted” with solving the ugly problems. Alas, I would say that if we were to put the young’ums in charge, they would look at the problem totally differently and more than likely solve it. We did when compared to preceding generations, mostly related to use and abuse of technology.

      Thanks for the read,

  10. I’m guilty of not reading all the comments so this may have been said before… but while useful social data is only 10% of total (which is probably actually quite high in some industries like travel or retail where massive amounts of transaction data is gathered every day) it will probably take up a much higher % of time, energy and resources to create any value from it because it is unstructured and lives in multiple locations that aren’t owned by the brand.

    1. Michael,

      Don’t think it has been said before in this post, but I agree with you and some smart people have said it before as well. Structuring the unstructured is a complicated thing, but w/o filtering, it makes it even more complicated. Then again, chicken-egg, should we filter before unstructuring? or unstrcture before filtering? I say #1, some will say #2.

      Thanks for the read, btw I do agree with you 100%

  11. My take on what constitutes “big data”: Big data is merely data that’s an order of magnitude greater than you’re accustomed to…Grasshopper. –Doug Laney, VP Research, Gartner, @doug_laney

Comments are closed.