Data lineage is becoming more important for financial services organisations today. Increasingly, it is becoming hard-wired in regulations and in data quality frameworks like the European Central Bank’s (ECB) Targeted Review of Internal Models (TRIM) – and ultimately this is all related to the need for ‘explainability’. If a bank values a position at £25 million for example, it needs to explain why it is valued at that amount, how it came to that decision and what data points it used in arriving at that valuation. All this context and more needs to be tracked.
However, banks and other financial services organisations can also use data lineage and the related concept of ‘explainability’ in other contexts. When changes are made to existing processes, for example, data lineage can be used in diagnostics to improve data quality. It can also be key for data licensing. If the organisation is undertaking content licensing and therefore has to abide by the associated restrictions, it will need to know what data is a derivative of another piece of data.
Finally, data lineage can also be key in achieving better management of client records and ensuring greater care is taken about where client data is moved to and used moving forwards. After all, if a bank client, under GDPR, demands that their data be expunged from the records, the bank or other financial services organisation will need to know where all of that customer’s details have ended up in order to be able to achieve this.
To do all this efficiently and well, banks and other financial services firms will effectively need to implement two different kinds of data lineage: horizontal and vertical.
Horizontal data lineage traces the journey of a piece of data as it moves through the system from source to destination. It could be a piece of content originally picked up from a prospectus or a company’s articles of incorporation, or an exchange; that subsequently finds it way into a data product, is licensed by a financial institution; comes into an end database and is looked at and possibly augmented there before it flows through to a report or business application. In this way, horizontal data lineage effectively tracks that journey of a specific item of data – typically across systems and reports.
Vertical data lineage, in contrast, describes the transformations that happen to a piece of data on that journey. It could be an element feeding into a calculation: one of the sources of a bond curve calculation, for example. And the lineage in this case would be to ‘go back’ from the bond curve and see what individual bonds formed part of the input at a specific point in time.
In short, horizontal data lineage traces the data back to the original source, while vertical data lineage effectively reverse-engineers the transformations that happen along the way, whether they are simple processes like cross-referencing or tracking the different taxonomies that exist for financial instruments or industry classifications. For example, vertical data lineage might come in to track cross-referencing between industry classification schemes from the European Union and the US Federal Government but also proprietary industry classifications from data providers and index providers like Morgan Stanley Capital International (MSCI) and the Financial Times Stock Exchange (FTSE).
Often, in order to be able to compare like for like, the organisation might want to express an issuer or counterparty within the same taxonomy. So, for example, if one taxonomy labels a segment ‘IT’ and another called the same segment ‘computer systems’, it might want to ensure that the same label was used for both. That is an example of simple transformation or a cross-reference but there are other far more complicated ones that may involve multiple pieces of data feeding into a single calculation.
Meeting the Challenge
The specific challenge in terms of the organisation’s ability to reverse engineer is that it will need to keep track both of its input data sources and their value at the time the transformation took place; all the calculation parameters that fed into the calculation and their value at the time it was done; and the algorithm that was used. These are some of the capabilities and pre-conditions that need to be met in order to carry out this vertical data lineage in the first place.
Horizontal data lineage is much more focused on the ostensibly more straightforward process of keeping track of the data that the business has consumed, where it subsequently went and who
touched it on its journey. The end objective is to trace the journey of the data upstream while vertical data lineage involves the ability to reverse engineer the manipulations that have happened to the data in the past.
Financial services organisations need both types of data lineage to show the provenance of financial, regulatory and customer reporting. They are both equally important and effectively two sides of the data lineage coin. They can’t have one without the other.
Most businesses across the financial services sector do not fully understand the multi-faceted nature of data lineage as outlined above. There is little consensus within larger firms about what is meant by data lineage. Often, they know that their audit trail is broken within a specific application but they don’t typically have an overarching view or the ability to follow data around on its journey across the organisation or to efficiently document it. The TRIM programme of the ECB referred to above takes a much closer look at the journey of the data from source to internal model and has brought the issue to the forefront.
To address the data lineage challenge, firms have a need for bi-temporality so that they know the value of the data when a calculation happens. They need to be able to track metadata and keep cross-reference tables between different taxonomies and different classification schemes up to date but they also need a clear administrative process, detailing who can access data, where does it go, where did they get it from and what their sources were.
Moreover, they also need a sourcing hierarchy, so they have the process of looking at data sources clearly documented and accessible by everybody who needs it. There has to be a clear understanding of what the data sources are and where the business goes for specific kinds of data, what the typical definition of the data elements are and what the calculation parameters and other metadata are.
It’s a complex undertaking but those organisations that understand the requirement and can put the right combination of processes and technology in place to support it will be best placed both to meet the multi-faceted data lineage requirements of today’s financial services sector and gain a real edge over the competition.