Data Vault is emerging as a highly credible and successful technique to represent data in modern analytics solutions. Properly implemented, it solves a lot of problems inherent in traditional Star Schema techniques, and can provide a great deal more flexibility plus ongoing productivity gains.
Like any nascent technique, however, Data Vault can be implemented for the wrong reasons by inexperienced practitioners, which can end in a disastrous mess. There are several examples of high profile Data Vault projects close to home that have foundered. (If you’ve worked in a large data environment in Wellington in the last couple of years, chances are that you may be familiar with the concept! )
So, let’s look at Data Vault in more detail.
The rise of the data vault
We’re seeing a significant rise in the popularity of Data Vault methodology, and rightly so. I’m a Data Vault advocate, because a well-designed and implemented Data Vault solves a lot of issues that regular Kimball Star Schemas struggle with.
It does this by separating out the concerns of the Data Warehouse, which in turn reduces the need to have endless debates about things like slowly-changing dimensions, implementation of business rules and definition of measures and KPIs.
A Data Vault should exist between your source systems and Kimball star schemas (or whatever presentation layer you choose).
Dan Linstedt has written many posts criticising Star Schemas. This irritated some Star Schema advocates who felt their livelihood was under threat and kicked back. The argument should not be “Data Vault versus Star Schemas,” however. It should be “How can Data Vault make Star Schemas better?”.
A Data Vault on its own is not fit for business consumption; a presentation layer is still required. An analytics solution must serve its user community. Ralph Kimball refers to this need using the “publishing metaphor”:
The publishing metaphor underscores the need to focus outward on your customers rather than merely focusing inward on products and processes. Although you use technology to deliver the DW/BI system, the technology is at best a means to an end. As such, the technology and techniques used to build the system should not appear directly in your top job responsibilities.
Why do people choose Data Vault?
Every BI professional who lies awake at night and thinks “surely there must be a better way” has probably stumbled upon Data Vault. It promises great things like speedy delivery, auditability, always-on history tracking, simplicity, automation, parallel loads, scalability and extensibility.
These are great attributes but, unfortunately, some organisations choose Data Vault for dubious reasons. For example: because it gives the BI team something to do without the need to talk to the business.
This might be done under the guise of “future-proofing” or “agility”. But avoiding a conversation with the business generally results in the following outcomes:
- Delayed or extended delivery times. (Imagine a DV2 initiative going for more than 18 months without going into production).
- Inability to communicate scope to stakeholders.
- Dozens and dozens of tables that get built but are never used.
- Incorrect data in the Data Vault due to incorrect assumptions. A good example is people assuming source systems do not delete data.
Dan Linstedt emphasises the need to work closely with the business to define business keys. He also requires that Data Vault implementations start with requirements and scope. Involving the business is a key component of the Data Vault methodology.
Valid reasons to choose Data Vault
We all love being busy, and building a Data Vault that has the potential to deliver future BI solutions seems like a noble cause. A successful Data Vault implementation is a matter of maturity and timing.
If you don’t have suitable governance, engagement and delivery structures in place then choosing Data Vault may initially feel like a step in the right direction – but it will most likely end with dissatisfied users.
Senior executives need to know what they stand to gain from success – the WIFM (“what’s in it for me”) factor. They need to know who is responsible for what, and how the programme is tracking, without delving into the boring minutiae of analytics project management.
People sometimes make the mistake of thinking governance constructs are there to create a nice paper trail that can be used when things go wrong. In fact, governance ought to minimise the risk of things going wrong in the first place. Failure to address governance, engagement or delivery are typical reasons for project failure, in my experience.
Get these fundamentals right, however, and you will start seeing all those wonderful benefits that Data Vault promises, and stakeholders will be delighted with the programme.
Then, when you do reflect, and bask in your agility and success, be sure to give the governance, engagement and delivery structures a nod of appreciation and credit, along with the Data Vault.