Smart Infrastructure & Smart Applications for the Smart Business – Infrastructure & Application Performance Monitoring
Creating Data Products in a Data Mesh, Data Lake or Lakehouse for Use in Analytics (9-10 June 2022, Stockholm)
Data Warehouse Automation & Real-time Data – Reducing Time to Value in a Distributed Analytical Environment
First of all let me apologise to all my readers for not having blogged for a while. This year has turned out to be manic – crazily busy. I also confess to having become addicted to twitter – a “tweetaholic” where I have been micro-blogging. If you want to see my tweets you can do so here. So I return to my blog the day after “Arthur Day” – 250 years ago yesterday a certain young Irishman named Arthur Guinness started a beer making company in Dublin.
My topic today is that exciting topic of Enterprise Data Governance. From research I did in a survey it was clear that many companies at the end of 2008 were not fully underway with Enterprise Data Governance in terms of getting their data under control and into a trusted, well managed state. Many had more to do in terms of organising themselves together with getting the necessary technology and processes in place to do this. But the question I get asked the most is how do you know how well or poorly your company is governing its data? There are a few questions you can ask that will give you a good inkling.
These are as follows:
- Do you know what data exists in your enterprise?
- Do you have an inventory of data items in use?
- How many names have you got for the same data item?
- How many metrics with same name but with different formulae?
- Do your Excel metrics formulae, DBMS metrics formulae, BI tool metrics formulae, ETL tool calculations, … all agree?
If the answer is no to any of these questions, what chance do you stand of remaining compliant or of trusting your data? If you don’t know how many different variations of a data item exist in your enterprise how can you govern your data? Some other questions to ask here from a business perspective are:
- How many times do your core processes break because of dirty data?
- Have your company ever messed up an order and angered a customer because of dirty data?
- In terms of compliance, do you trust your data enough to tell it to a judge?
In my opinion what companies actually need is an interactive data map so that you can press a button and see where your customer data is or where your order’s data is. In order to be able to do this you need to have common data definitions for your customer data attributes and for your order data attributes etc. In fact you need to have a common set of enterprise wide definitions for all core entity data, transaction data and metrics.
Having established this, the next step is to discover where you data actually is. Therefore Data Discovery technology (albeit a new area in data management) is critical help do this. In fact I would go as far as to say that without data discovery technology it is very difficult to get data under control. Increasingly therefore we are seeing vendors acquire or build this kind of software. Once you data is located you need to map disparate data definitions for the same data to common enterprise wide definitions to be able to see where data is. Physical column names, data models, BI tool semantic layers, reports, SPREADSHEETS, files, XML schema, Access databases… If you can’t tie all these to the same corporate definitions how to you govern data? This is where data dictionaries/ business glossaries are key. Lineage matter.
Ultimately the objective is to get consistency across the enterprise. You have to unravel your spaghetti ball. This means you need to get organised correctly, get the right technologies in place and get the right processes in place for enterprise data governance.
Master data is also part of the program. You need to find out where your master data is maintained. How is it synchronised? What screens on what applications are used to update it? Do you know? MDMis not as simple as it looks. It is often a multi-year investment. So ask yourself do you want to start with read-only or read write? If you buy an MDM system and you start updating master data centrally, what happens if you are still updating it also in other applications? Companies need a well thought out strategy for MDM as part of the Data Governance program. I will be addressing this in further blogs in the near future.
For now though, let me salute my fellow Irish countryman. Arthur Guinness. If you are stressed out on Data Governance at the end of a hard week there is nothing like a decent pint to help you unwind. Cheers!