The documents, spreadsheets, images, audio files, emails, etc., that most of us create every day are “human-generated data, ” Not only do these files take up the majority of digital storage capacity in most organizations, there is an enormous amount of metadata associated with them.
While this human-generated content is big; the metadata is bigger. Metadata about a file might be who created it, what type of file it is (spreadsheet, presentation), what folder it is stored in, who has been reading it, who has access to it, or who sent it in an email to someone else. Over its lifespan, a file’s metadata is so big that if you collect and store it all in its raw form, before long its size will dwarf the files themselves. The metadata associated with this content is human-generated Big Data.
Just as analyzing machine-generated data has practical applications for business, analyzing the “big metadata” associated with human-generated content has enormous potential. More than potential, harnessing the power of big metadata has become essential to manage and protect human-generated content. Now that technology exists, such as advanced metadata frameworks, to listen to the heartbeat of our organization, we would be remiss not to.
Some of the fundamental questions that we can start to answer:
- Who is creating the most content?
- Who is accessing the most data?
- Where is my sensitive data stored?
- Where is my sensitive data over-exposed?
- Which servers aren’t being utilized?
- Is there anything abnormal going on?
Once you start combining metadata streams, the insights become that much more powerful, from data management and protection perspectives, as well as from the perspective of how, when, and with whom we collaborate.