How To Invest In A Big Data Platform Source: Manny Puentes
“Big Data” is no longer a buzzword. Businesses big and small that don’t invest now in big data technologies risk getting left behind as the marketplace becomes more and more data-driven. In fact, a recent McKinsey and Company report suggested that companies that invest in big data and analytics consistently outperform their peers in both productivity and revenue.
The advent of the Internet of Things means there is going to be more data collected and transmitted than ever before.    Even small local businesses will need to invest in ways to collect, store, and process data in order to understand more about their customers and their products.
But where to start? Investing in big data infrastructure doesn’t have to be an overwhelming proposition. Here are a few things that you need to know when making the leap from antiquated databases of old to the modern big data platform.
Start small (and free). Hadoop is the open-source software framework of choice for many in the big data game. It’s built to scale, and can run on single servers to thousands of machines, and it is designed to handle failures at the application level rather than at the hardware level. Though it is an open source technology spearheaded by Apache, enterprise deployments built for business can make it easier to get off the ground.
When I was building out the framework for one of the first real-time bidding engines in digital advertising, I took advantage of the free version of MapR, and successfully used it until we grew out of it and needed to invest in a larger deployment to support our exponentially growing business. It was the perfect low-risk way to try out a platform without needing to spend hundreds of thousands of dollars. I learned how to use and manage the service, so that when we invested in the higher level enterprise platform, there were no surprises.
Get familiar with the ecosystem. The big data world can be a daunting place for those new to the game, but the architecture around your platform can make a big difference in how effective and efficient your business can be. On top of a big data distribution (MapR, Cloudera, and Hortonworks being some of the most used in the space), Hadoop is integrated with a number of tools to make it easier to manage, understand, and use data. This includes Apache Drill for querying large data sets, Apache Hive for ad-hoc queries and data summarization, and Cascading as a Java framework for building machine-learning and other data processing applications.
To manage this suite of tools most effectively, look for a provider who packages these different solutions and makes them easy to experiment with and test. Ask about any training materials or support available to make the most of these types of tools.
At Altitude Digital, we often experiment with newer tools like Spark (a new data processing engine) on a small set of data, and when it proves to be valuable, we’ll expand it to our platform more broadly.
| }
|