By Reuben Vandeventer, Chief Data Officer in Residence at Infogix
As the role of data in organizations, both large and small, has rapidly evolved the era of big data has ushered in data sprawl. This means that organizations must look for ways to effectively manage, govern and derive value from data that is increasingly complex and cumbersome. Analytics are a critical tool for turning raw data into actionable insights, but organizations must first understand the form and function of their data, as well as the flow of their data supply chain. Data catalogs are the foundational data governance tool that allow organizations to effectively leverage the right data for the right job at the right time to gain competitive advantage.
A data catalog synthesizes all the details of an enterprise’s data supply chain, often contained in multiple data dictionaries, and translates them into laymen terms for business users to easily understand. A data catalog provides transparency into data definitions, synonyms and critical business attributes to ensure that the appropriate data assets are being used. It assures a common understanding and streamlines communication across diverging lines of business because the catalog identifies data owners, stewards and subject matter experts. Business users have a reliable reference about who is accountable, should questions or issues arise.
Business lineage is another critical feature of a data catalog as it provides a clear and concise understanding of the flow and dependencies of data. This includes data origination within the data supply chain, transformations, uses, and processes. Business lineage may link to data quality scores, access parameters and usage restrictions. This clear view of critical data assets across the enterprise gives business users business intelligence at their fingertips. This, in turn, enables collaboration and allows business users to leverage the right data for maximum impact.
The Do’s and Don’ts of Creating and Maintaining a Data Catalog
The continued survival of any business depends on the data they’re collecting, sharing and deploying. Although data is vital, too much data can create a myriad of problems, including confusion around how users might find what they need, discover what’s new and what’s unusable and determine what contains legacy knowledge.
Managing and preparing massive amounts of data for a data catalog is typically the job of the IT department, however, if business users are not involved, this approach is likely to confuse and confound anyone who is not versed in IT’s technical jargon. Another risk with large, disjointed organizations is that they will have so many data catalogs that typical users won’t know where to start.
Organizations can’t afford to leave business users reliant upon IT for the data they need if they want to garner a competitive advantage and make quick decisions in a fast-paced business environment.
On one hand, IT doesn’t understand data from a business perspective. On the other hand, business users lack the technical expertise to understand IT’s technical lingo and translate it into a business advantage.
Instead, businesses must formally align disparate lines of business, technologies and processes through data governance to create one, all-inclusive data catalog. With a team approach, businesses can organize their data catalog around the technical details of their data assets and metadata to create a well-defined, meaningful, searchable catalog of business assets. This approach enables consistent understanding among all business users.
Creating a Data Catalog Through Data Governance
Building a data catalog is difficult because data is, in most organizations, varied and complex. Various employees in various roles across various departments often have contrasting views about data. Establishing agreement on data definitions based on individual perspectives across departments is a monumental task.
A successful data catalog requires a thoughtful data governance program to streamline communication and collaboration. To simplify and automate the creation and maintenance of a comprehensive data catalog, a data governance tool offering a comprehensive suite of capabilities is the ideal solution. With unified capabilities for data governance, data quality and analytics, the tool can merge data lineage, data definitions and data quality with a variety business terms from to create an all-encompassing data catalog.
The tool must enable business users to access and understand data quality scores and the tool must have a mechanism to measure the impact of data quality efforts across the data supply chain. It should provide business users with clarity in terms of data ownership and accountability and it should facilitate collaboration and communication between the data owners, stewards and consumers.
The tool should also incorporate automatic discovery capabilities which allows the business to capture, track and monitor changes to data. When changes are uncovered, the responsible parties can promptly investigate these changes and ultimately, speed up time to insights. With the appropriate set of capabilities, businesses can successfully create a comprehensive data catalog that will enable a complete understanding of their data landscape and foster a culture of data knowledge among their business users.
About the Author
Reuben Vandeventer has been driving progress and helping to shape the data space for more than a decade, across many industries including medical device, pharmaceutical, insurance and asset management. As the data space has rapidly evolved, his background in science, statistics and finance has created a robust foundation to create real economic value for organizations in new and innovative ways. Throughout his career, he has served in key strategic leadership roles for some of the top financial services organizations in the world. In these roles, he has developed a repeatable way to analyze, study and design data communities (people that are accountable and care about data), revolutionizing the respective organization’s approach to pragmatic data strategy.