Smart Infrastructure & Smart Applications for the Smart Business – Infrastructure & Application Performance Monitoring
Data Warehouse Automation & Real-time Data – Reducing Time to Value in a Distributed Analytical Environment
Two weeks ago I attended the popular Strata conference in Santa Clara California where, frankly the momentum behind Big Data was nothing short of unstoppable. 3100 delegate poured into the Santa Clara Convetion Centre to see a myriad of big data technologies. Things that stood out for me include the massive interest in the new Hadoop 2.0 Apache Spark in-memory framework that runs on top of YARN and the SQL-on-Hadoop wars that broke out with every vendor claiming they were faster than everyone else. The momentum behind Spark was impressive with vendors including Cloudera and Hortonworks now running Spark on their Hadoop distributions.
The tables below show the SQL on Hadoop initiatives
Some of the sessions on the SQL on Hadoop were a little disappointing as they focussed on far too much on query benchmarks rather than the challenges of using SQL to access complex data such as JSON data, text data or log data. Log data is of course very much in demand at present to provide insight into on-line behaviour. In addition what about multiple concurrent users access Hadoop data via SQL? It is clear that the in-memory Shark on Spark (Hive on Spark) initiative coming out or AMPlab at UC Berkeley is looking to address this.
Pureplay Hadoop vendors Cloudera, Hortonworks and MapR were all out in force with like In addition to the above there were new self-service data management tools like Paxata and Trifacta who are aiming there products at data scientists and business analysts. This is fuelling a trend where users of these tools and self-service BI tools are now getting the power to clean and prepare their own data rather than using enterprise data management platforms from vendors like IBM, Informatica, SAP, SAS, GlobalIDs and Oracle. I have already blogged about this in my last blog. Also new visualization vendors like Zoomdata dazzed everyone with the ‘Minority Report’ demo and virtual reality demos. Then of course there are the giants. IBM, Oracle, SAP, Microsoft. Microsoft has integrated Excel with it HDInsights Hadoop distribution and its HDInsights on Windows Azure. Meanwhile IBM’s recent Watson Announcement shows Big Blue’s commitment to not just run analytics and BI tools against big data but to move beyond that into cognitive technologies on top of its big data platform with the emergence of Watson Foundations, Watson Explorer, Watson Analytics and Watson Discovery Server.
With all this technology (apologies to other vendors not mentioned), it is not surprising people are feeling somewhat overwhelmed when it comes to putting together a big data strategy. Of course it is not just about technology. Questions about business case, roles, skills, architecture, new technology components, data governance, best practices, integration with existing technology, pitfalls to avoid and much more all need answered. Therefore please join me on Twitter on Wednesday March 5th on IBM’s Big Data Tweetchat at 12:00 ET to discuss “Creating a Big Data Strategy”