Businesses run on data. Metadata is the foundation for making that asset usable.
The simplest way to define metadata is that it’s data about data; it helps organizations find, maintain and compare data. As Gartner explains, metadata describes the various facets of an information asset that can improve its usability throughout its lifecycle.
“Data about your data lets you understand its importance, accuracy and relevance to the business,” says Jay Limburn, director of product offering management and distinguished engineer with IBM Data and AI.
More Big Data and a growing number of data consumers means there’s an increased urgency to adopt or improve metadata management for organizing information assets. The increasing need for data governance, risk and compliance, data analysis, and data value drives the growth of metadata management solutions, according to Gartner.
“Metadata management is getting more attention because it is now part of the business conversation,” says Reetika Fleming, research vice president at HFS Research. The global enterprise metadata management market is projected to reach $9 billion by 2023.
What Are the Types of Metadata?
Metadata puts data and its associated processes, services, rules and policies that support an organization’s information systems into context. Types of metadata in the context of data management environments are:
- Descriptive metadata, which describes a resource for purposes such as discovery and identification, including elements such as title, abstract, author, and keywords. It’s business metadata, where descriptions make sense to the business end users who want to locate and build their own data collections and data visualizations.
- Structural metadata, which is data about the structure of content. It indicates how compound objects are put together — for example, how pages are ordered to form chapters.
- Administrative metadata, which is necessary to manage and use information resources, such as when and how a resource was created and its file type. It may include information about rights and reproduction and archiving and preserving a resource. A subtype of administrative metadata is technical data, which is information necessary for decoding and rendering files.
By also defining who gets access to the data, administrative metadata “directly informs organizational policy and vice-versa, and helps with governance,” says Jeffrey Pomerantz, associate professor of practice at the School of Library and Information Science at Simmons University and the author of “Metadata.”
- Provenance metadata, which indicates the relationship between two versions of data objects and is generated whenever a new version of a dataset is created. This metadata is critical for trust, Pomerantz says, providing data history, including who and what organizations touched a piece of data over its lifecycle, to show how the data set has changed over time.
What Is Metadata Management?
Companies have been accumulating data for years in multiple systems with inconsistent associated metadata. The integration of social media data and its associated metadata creates additional complications.
Business intelligence is one of the most important solutions for metadata management, Fleming says. It’s often the case that business users and analysts confront data in a report that doesn’t make sense or is incomplete, or that the same report generated in two different reporting systems shows two different results.
Metadata makes it possible to locate and verify data, and metadata management makes it possible to trace the lineage of that data so that its quality can be assessed before it is loaded into an analytics tool. “If we can understand the quality of our data and where it’s being used, we can expose rich data to the business community,” Limburn says.
Metadata management in the past has been a long, laborious and expensive process, requiring many data curators and stewards to stay on top of an ever-growing and changing amount of data. Automating the discovery and data movement processes makes metadata management easier. Time can be saved, and errors and costs reduced, when metadata is automatically discovered from sources including databases and reporting tools versus using a manual process to expose the data journey.
There has been a lot of progress on automated tagging and creation of metadata content using technologies like machine learning, according to Fleming. Companies can apply their metadata tags — proper terms and descriptions that are most relevant and contextually important for data classification — to digital assets and feed that to a machine learning model for autotagging subsequent ones.
“It’s very important to get value of machine learning efforts and analytics,” she says.
There’s an opportunity to let business users have a direct role in metadata management too. “We can push control of data through a crowdsource model so that people can create and add metadata they think is appropriate to data,” Limburn says. They can rank metadata contributions too. “Everyone in the business can enrich metadata across the business,” he says.
Metadata Management vs. Master Data Management: What’s the Difference?
Metadata management should always come before any other data disciplines, says Philip Russom, senior research director for data management at TDWI. “Just about all we do in data management or data access needs metadata and its descriptions of the data,” he says.
There is a difference between metadata management and master data management, though these terms are sometimes confused. Master data is used for the purpose of creating a single view of a business entity.
Affiliated with that is reference data — the data about the types of transactions conducted with that business entity.
Customer-facing businesses, such as financial services and insurance, particularly can benefit from maintaining customer master data and are among the verticals that require master data management tools to help resolve data errors, overlaps and redundancies. “You need metadata management for customer service reps to find a client’s record,” Russom says.
Another vertical with a strong need for master data management is manufacturing, where having a single and accurate view of products bought and sold is vital.
“The practice of master data management really acknowledges just how important metadata management is for doing any type of asset management,” Pomerantz says.
Top Metadata Management Tools
Vendors such as Oracle and IBM provide solutions for metadata management automation. As companies move to the cloud, including modernizing their reporting tools, metadata management platforms can map how data flows around their BI tools to streamline cloud migration, Pomerantz says.
IBM stresses the importance of data cataloging being tightly integrated to a data governance program. Its Watson Knowledge Catalog automates creating a catalog from the data discovery phase, where machine learning is used to classify data and derive data quality.
“Proper governance needs metadata management to collect, understand, organize and enrich metadata so that data can be used correctly and be available for a consumer catalog,” Limburn says.
Oracle leverages its metadata management solutions as part of its overall data management value proposition, Gartner points out. Oracle sees metadata management as a foundation for integrating critical core capabilities such as business continuity, data movement, data transformation, data governance, catalogs, analytics and streaming data solutions.
Metadata management always will be a work in progress. But given the increased focus around compliance and the fact that companies can derive value that differentiates them from competitors by smartening up their data, it’s work that matters in a big way.