Data Warehouse Automation & Real-time Data – Reducing Time to Value in a Distributed Analytical Environment
Creating Data Products in a Data Mesh, Data Lake or Lakehouse for Use in Analytics (9-10 June 2022, Stockholm)
Smart Infrastructure & Smart Applications for the Smart Business – Infrastructure & Application Performance Monitoring
At TDWI in the US recently, ParAccel became the latest vendor to announce its arrival into the Data Warehouse Appliance market. This market already has players like DatAllegro, Dataupia, GreenPlum , HP , Netezza, and Teradata to name a few.
ParAccel’s new Massively Parallel Analytic Database runs on commonity hardware (first available release is Sun but other releases will follow on Dell, HP and IBM hardware) and comes in two flavours. These are:
ParAccel MAVERICK – a stand alone analytic platform using ANSI SQL
ParAccel AMIGO – a drop-in acceleration platform with query routing, synch, and syntax coverage for existing Microsoft SQL Server and Oracle instances
Interestingly this product works on columns during its parallel query processing rather than rows which raises some observations. For a start, in many of the DW reviews that I have conducted over the years I have seen the well known practice of “Mini Dimensions” introduced to isolate popular columns in a large dimension (e.g. Customer) into their own separate Mini-dimension table. This is a performance ‘trick’ that can often speed up joins between large dimensions and large fact tables whereby if popular columns in the large dimension are selected in a query to qualify metrics in the fact table, then the join occurs between the mini dimension and the fact table instead of the much larger ‘real’ dimension and the fact table. As a result, join processing is faster.
The only problem with mini dimensions is that they complicate the key structures of fact tables, all BI tool business views (E.g. SAS Information Maps, MS Report Models, Business Objects Universes etc.) need to know about them in order to generate the right join SQL and tracking history across changes to columns in the main dimension and mini dimensions can be complex. The column approach to parallel query processing taken by ParAccel may well negate the need for mini dimensions as a performance tuning mechanism in many star schemas thereby simplifying design. If you are in the process of selecting a DW Appliance product it would certainly be worth investigating this point and worth a benchmark test.
In addition SQL advanced analytic aggregate functions are all column based. Once again therefore it would be worth benchmarking this in ParAccel Vs other DW Appliance products as there may well be a boost in performance here too for parallel processing at the column level. I might add that I have NOT personally benchmarked this but simply looking at the way this product is architected it would certainly warrant a look for specific analytic applications. Benchmarking is recommended however.
The ParAccel Analytic Database software is available now. ParAccel offers two licensing options
$1,000 per gigabyte for all-in-memory systems beginning at 100GB,
or $40,000 per node plus $10,000 per terabyte for disk-based systems beginning at 5 nodes.
Subscription licensing is also available starting as low as $5,000 per month.