Data Warehouse Automation & Real-time Data – Reducing Time to Value in a Distributed Analytical Environment
Enterprise DataOps – Curating Trusted Data as a Service from Data Lake to Data Marketplace (25-26 March 2019, Helsinki)
Smart Infrastructure & Smart Applications for the Smart Business – Infrastructure & Application Performance Monitoring
Several surveys this year have indicated a significant rise in the importance of analytics. These include the BI Survey 2014, the 2014 Wisdom of Crowds® Advanced and Predictive Analytics Market Study and the Information Week 2014 Analytics, BI, and Information Management Survey which highlighted the top two factors driving interest in advanced analytics as:
- The desire to optimize business operations (sales, pricing, profitability)
- The desire to identify business risk (e.g. customer churn, fraud, default etc.)
In this survey 67% cited business optimization while 50% cited risk.
Much of the use of advanced and predictive analytics is on data stored in relational DBMSs or in the Hadoop Distributed File System (HDFS). This is sometimes referred to as data at rest. While this is extremely useful, many companies are also striving to analyse data in real-time to prevent risk occurring or to continually optimise and re-optimise business operations. Analysing real-time data is sometimes referred to as analysing data in motion.
Wikipedia describes Data-in-Motion as “data that is traversing a network or temporarily residing in a computer memory to be read or updates”. A key point about this is that with data at rest, we capture data, store it an analyse it. With data-in-motion, we capture data, analyse it and then perhaps store some of it. So it is quite different. Using real-time analytics on real-time data in this regard is something that first started in investment banking where banks needed to analyse markets data as trades occurred in order to predict and recommend buy and sell activity so they could respond in a timely manner. This first generation software was referred to complex event processing (CEP). Today we are well beyond just markets data and structured data. The arrival of sensor networks, smart products, live streaming video and clickstream are relatively new types of data that companies want to analyse in real-time. There is no doubt that is velocity and volume of data from these new data sources along with the fact that data is also semi-structured or unstructured has re-classified much of the real-time analytics requirement as very much a big data workload. Deployment of sensors in industry is growing rapidly. GE and Accenture’s Industrial Internet Insights report for 2015 shows that 73 % of companies are already investing over 20 % of IT budget on big data analytics with most of those surveyed indicating they are currently stronger in monitoring and connecting equipment than in predicting and optimising operations. However, there is no question that the desire is there to go further to leverage real-time stream processing to prevent and optimise.
Prevention includes areas like fraud, security breaches, compliance violations, shrinkage, stockouts, machine failures etc.
Optimisation includes optimisation of supply chains, manufacturing production lines, routing, pricing, personalised recommendations (including location based advertising), field service preventative maintenance, traffic, resources to dynamically respond to demand and much more.
Stream processing applications typically connect to data streams (and also data at rest), filter the data, clean and integrate it, analyse it and then act if a pattern is found and/or filter data of interest for further analysis offline. This can be shown in the figure below. Stream processing is about finding and analysing event correlations in un-sequenced event storm whether that be millions of clicks, sensor events, market trades, machine generated log data or live streaming video. The potential business benefits are significant.
Today, there are many technologies capable of analysing data in real time. Products include
- Apache Storm and Kafka (Storm is now capable of running on a Hadoop Cluster)
- Informatica RulePoint
- IBM InfoSphere Streams
- Microsoft StreamInsight
- Software AG Apama
- SAP BusinessObjects Event Insight
- Spark Streaming (part of the Spark framework commercialised by Databricks shipping with many Hadoop distibutions)
- Tibco Streambase
- ThinkAnalytics Intelligent Enterprise Server
You can find more information about real-time analytics and operational BI on our website Please join me and other leading speakers to learn more about realtime analytics and listen to real case studies at the Real-time Analytics Conference on January 29th in London