Data Warehouse Automation & Real-time Data – Reducing Time to Value in a Distributed Analytical Environment
Smart Infrastructure & Smart Applications for the Smart Business – Infrastructure & Application Performance Monitoring
Last week I was in Munich to present at the annual TDWI (The Data Warehouse Institute) conference on “Business Intelligence and Data Management in a Cloud Computing Environment”. It was a very well attended conference with some great speakers and sessions. My session focused on the following:
- What is Cloud Computing and why use it as a deployment option?
- Why Cloud BI? – What are the requirements for a public cloud or externally hosted BI system?
- Understanding what is on offer – The Cloud BI Marketplace
- Getting data into a cloud based BI system
- Managing access to cloud based BI systems and analytic applications
- Integrating cloud based BI systems with on-premise systems
- Pros and cons of deploying on the cloud?
- Getting started with Cloud based BI
Bear in mind that both public cloud and private cloud based BI were under discussion even though the hype seems all around public cloud or externally hosted BI systems. Looking at these points it is the third bullet down that for me is the clear inhibitor to cloud based BI adoption. In other words the lack of understanding as to what exactly is on offer. And there is a lot on offer. On the public cloud we have everything from plain Infrastructure as a Service (IaaS) all the way through to Software as a Service (SaaS) based packaged analytic applications. On the private cloud several BI platforms are already running on virtualisation software such as VMware and/or Microsoft Hyper-V. However there seems very little in the way of best practice advice on do’s and don’ts when it comes to deploying BI systems on a private cloud based virtualised environment.
In total I came up with 6 options, the last of which is simply where many of us are today i.e. BI systems not deployed on a cloud (whether it be public or private). The options are as follows:
1. Public cloud based IaaS for a BI system
2. Public cloud or externally hosted BI/DW PaaS for building your own cloud-based BI system
- Multi-vendor or single-vendor BI PaaS offerings
3. Public cloud or externally hosted SaaS BI packaged analytical applications
4. Public cloud or externally hosted SaaS BI for operational reporting on cloud based operational data
5. Private cloud based BI system running internally
6. Dedicated hardware based BI system (this is what most companies have today)
Option 1 is simply subscribing to an IaaS vendor like Savvis, Amazon, Rackspace or GoGrid where you pay as you use on hardware and systems software and then buying and deploying your own ETL, DBMS and BI software (assuming they have no restrictions on what they will support). I am not sure that this is attractive enough on its own without a BI/DW Platform as a Service (PaaS) as well.
Option 2 is the BI/DW Platform as a Service (PaaS) option on public cloud or even externally hosted. Here you find another choice however. Should you choose a multi-vendor DW/BI PaaS or a single-vendor offering. An example of a multi-vendor option is the RightScale/Talend/Vertica/Jaspersoft PaaS offering on Amazon EC2. A single vendor PaaS offering (of which there are several on offer) would be GoodData, or SAP BusinessObjects On-Demand. Others include Birst, Indicee and PivotLink. A key question here is going to be “Is Data Integration included?” Clearly in the multi-vendor offering mentioned there is an ETL solution such as Talend in the above example. Data integration is very much file based with BI/DW PaaS vendors i.e. you upload files of data and then there is some processing of that data to load it into the PaaS DW/BI database. Several single-vendor PaaS offerings give you only fairly lightweight data integration once data is uploaded. Certainly not full blown ETL with built-in data quality that you might be used to in a data centre. In fact if you are looking for full blown DQ you are going to be disappointed in most cases. The ‘get out’ clause is you can add your own script but what about metadata lineage and auditability once the script writer has left for a better job? A vendor like SAP (mentioned earlier) does have ETL (SAP BusinessObjects Data Integrator) available but only if you subscribe to their Advanced Edition of SAP BusinessObjects On-Demand (there are 3 editions on offer). I was even more surprised to see that SAPs BI/DW PaaS offering uses Microsoft SQL Server as the database and not BW. I would expect that to change to SybaseIQ fairly soon. GoodData on the other hand have refreshingly recognised that you may want to go beyond the data integration you get out-of-the-box on subscription and have gone the extra mile to provide pre-built integration with cloud based data integration tools such as Informatica Cloud, SnapLogic and Boomi. Therefore you can use these tools to integrate your data before passing the data sets to them. The alternative to all of this is to do the lions share of the data integration in-house before uploading data files.
Option 3 is a fast growing market with many relatively new vendors (e.g. Cloud9 Analytics, Rosslyn Analytics, Lixto) as well as traditional mainstream vendors e.g. SAS, IBM Cognos. The attraction here is a pre-built solution ready to go. These will clearly appeal to small and medium size businesses (SMBs) and even lines of business in some large organisations. While we see horizontal applications looking at Salesforce.com data, spend analysis and pricing (to name a few), I am predicting that vertical analytic apps on the cloud will appear.
Option 4 is simply using a cloud based reporting system on operational data typically from a cloud based transaction processing system such as Salesforce.com. In fact it would seem that Salesforce.com is dominating this space. An example here is SAP BusinessObjects CrystalReports.com for Salesforce.
Option 5 is private cloud based BI systems. The largest private cloud based BI system I know of is IBM’s internal Blue Insight which is based on IBM System Z and IBM Cognos 8 BI. An estimated 200000 IBMers are using this. IBM have since launched the Smart Analytics Cloud, a private cloud offering for large enterprises based on the same technologies. However it is still early days for BI deployments on internal private clouds. There appears to be more support coming from developer forums than vendors at present. From what I can see, companies are taking a ‘toe in the water’ approach to deploying on virtualized environments. No doubt, confidence will grow over time. However does everything need to move to private cloud? Many companies with very large EDW initiatives may be reluctant to move to private clouds until they prove their scalability and lower TCO. This issue here is should ETL, DBMS and BI platform all be on the same virtual servers? Should each have their own virtual server configuration? What is that configuration? Can I adjust it? etc. etc. I don’t think there will be a mad rush to put a 100TB DW on virtual servers. I do like the fact that vendors like MicroStrategy have given this some serious consideration and have released a private cloud enterprise edition of MicroStrategy 9. MicroStrategy components are packaged as Virtual Appliances and tuned for expected load. These Virtual Appliances contain fully configured software components and the number of running virtual appliances can be adjusted to accommodate specific performance goals. This is a damn sight better than just saying to a customer “it’s up to you, just deploy it and you figure out the virtual server configuration” What MicroStrategy have done is to allow you to adjust the underlying assigned physical resources to satisfy performance demands and have made available administrative facilities to control virtualized MicroStrategy environment.
It is early days in Cloud based BI. I recommend looking at your requirements and then match the options available to your needs
I would be interested if any of you have experiences in this area. Do’s and Don’ts. What works, what doesn’t. Please share them by placing your comments.