Big data is simply another name for complicated business int Source: Matt Asay
Big data vendors like Cloudera talk about how Hadoop and other technologies "democratize data" for users. One way they do this, Cloudera's Justin Kestelyn insists, is by giving data analysts plenty of ways to access data: "Analysts can use BI tools, SAS, SQL command line, or even free-text search to access Hadoop now. Many options for all kinds of users."
Well, sort of -- really what Kestelyn showcases is "many options for a certain class of user."
Of course, there are good reasons for big data to be a big pain for all but the data science experts. As Mitchell Sanders highlights, the best data scientists blend domain knowledge, programming talent, and math/statistical analysis skills. As much as we may want to democratize access to data, doing something meaningful with it is hard.
Or as MongoDB's Joe Drumgoole wryly notes, "Some things can't be implied for the mass market, e.g., flying a plane or doing analytics."
Hadoop remains complex even for data scientists. Still, as DataStax's Alex Popescu suggests, this complexity can be forgiven because Hadoop "allows experimenting and trying out new ideas, while continuing to accumulate and storing your data." It's open source and free, making trial-and-error affordable.
Yet I can't help but feel that big data doesn't go far enough if it remains a tool for the data elite.
Democratizing big data
Kestelyn nails the problem when he argues that "both BI and Hadoop have/had the same challenge, and it's not tech. Rather, it's how to sell enterprises on becoming ‘data-driven.'"
If true -- I believe it is -- then wouldn't it be more powerful if more than a select few had the ability to query that data? It's hard to be "data driven" if you can't access the data, and often the person with the most insight into a company's business is not the person that groks Spark or Hive.
Unfortunately, modern BI and big data leave the mainstream user out, as Serendipity's Mare Lucas posits:
        For years, the BI and data analytics conversation was framed around how to aggregate massive volumes of data and then unleash the data scientists to find the value. Today, despite the information deluge, enterprise decision makers are often unable to access the data in a useful way. The tools are designed for those who speak the language of algorithms and statistical analysis. It's simply too hard for the everyday user to "ask" the data any questions -- from the routine to the insightful. The end result? The speed of big data moves at a slower pace ... and the power is locked in the hands of the few.
As industry expert Peter Goldmacher explains, "The biggest winners in the big data world aren't the big data technology vendors, but rather the companies that will leverage big data technology to create entirely new businesses or disrupt legacy businesses."
But it's hard to imagine this happening so long as the ability to decipher that data is hoarded by data science priests and priestesses.
Big data and you
A new generation of data visualization tools like Tableau, Clearstory, and Domo aims to unlock enterprise data for a broader audience than before. Such companies deliver interactive dashboards that pull from a variety of data sources -- Hadoop or Spark clusters; Teradata EDWs; MongoDB, MySQL, Cassandra or Oracle databases; and more -- and make it all accessible to business users, no doctorate required.
The market for making big data simple is much bigger than the market for peddling big data infrastructure. As such, and given the difficulty of selling support contracts for open source infrastructure, it will be interesting to see if today's Hadoop vendors, flush with IPO and venture cash, will buy the Clearstorys and SlamDatas of the world to truly democratize data.
For now, however, most enterprises should be paying close attention to the data visualization vendors. In most cases, these won't be yesterday's BI vendors (all of which struggle to deal with unstructured data), but rather modern BI startups that understand that today's data is messy but can be made to tell stories with the right visualization.
| }
|