venugopal-arvind_2Over the past several years, we’ve graduated from the general buzz surrounding the concept of big data to actually being able to implement big data projects. Unfortunately, data center IT teams tasked with implementing these business-driven initiatives are now just beginning to realize the depth of complexity in achieving real big data integration.

Big data presents large volumes of data in a variety of formats that live within multiple and disconnected platforms, applications, and devices. The vast amount of structured and unstructured content often makes it extremely difficult for users to access and analyze the required information.

The modern data center tends to be a complex system of interconnected servers and devices that store, process, and distribute large amounts of information to and from a variety of sources. But smart big data integration, in reshaping traditional IT systems, can ease the struggle of aggregating and analyzing information from geo-dispersed sites and even other data centers.

If a data center is the brain of an organization, think of its data sources as the nerves and cells feeding information back to it. Smart big data integration means the organization’s “nervous system” quickly communicates information and coordinates agile movement throughout your enterprise, a vital role for the modern business ecosystem. But it also means data center managers get the security, quality, control, and management they seek for accurate and efficient data processing.

Where to Start

The goal of any big data project is to gain better outcomes – including immediate real-time insights and long-term perspectives based on recurring patterns – but first you must overcome the early integration challenges. Ask yourself:

  • Where is all your critical data coming from?
  • How can your organization aggregate and move all of that data quickly?
  • Is there value in analyzing that available data?
  • How can your enterprise maximize that value through investments in technology and infrastructure?

Ultimately, big data integration is ingesting, preparing, and delivering data, no matter what the source. This includes leveraging every type of data in an enterprise, including the complex, often-unstructured machine-generated kind, and often requires a more converged data center infrastructure.

So one of the first steps – and arguably the most important step – is to integrate all the available data. Here are three key areas to identify before your big data integration project gets off the ground.

Reliable Data Flows

Ingesting big data into a platform like Apache Hadoop is a no-brainer, but it’s not enough to simply spin up a Hadoop cluster, pour all types of data into it, and play with it until groundbreaking new insights show themselves. Big data players release new tools and upgrades seemingly every week, and even one technology introduced into your stack that doesn’t play nice can render your entire platform obsolete.

It’s common to experience data flow and data degradation issues between enterprise applications and the Hadoop cluster. So most reactions involve hand-coding or throwing some other type of technology at it in a desperate attempt to make it work. Often, it’s a solution. But it’s not the solution.

A secure, agile integration platform that focuses on pipelines for mobilizing the actual flow of data into and out of the data center ensures reliable information exchange in increasingly complex workplace ecosystems.

Scalability                                                                                                                   

There are major integration, governance, and security issues that need to be addressed at various levels of big data initiatives, especially within the data center. The scale at which we’re doing business today – the sheer volume of information – is what makes data “big data.” Trying to manage big data across geographies and data centers with traditional, older tools is a severe underestimation of modern needs.

As businesses evolve and new data sources come into play, the need for different technologies will increase and your system will invariably have to adapt. And if you’re throwing all this hand-coding at the problem now (as mentioned above), what will you have to throw at it later when you’re trying to scale?

Simply putting more people or code on the problem is not a scalability strategy, nor will it solve the complicated transfer issues with big data. There needs to be a solid data integration and management platform underneath the business intelligence tools that easily scales, and works with numerous big data tools and sources without disruption.

Data Quality, Classification, Governance

Structured data coming out of CRM and ERP applications often line up nicely in enterprise analysis, but it’s the unstructured kind that’s more difficult to manage. Enterprises must somehow govern information chaos, because even the smallest problems with data quality can create massive errors. Successful companies do this at the metadata level.

Defining information through metadata is crucial as it provides structure to what becomes big data, and helps to classify and organize that information to easily locate later. As information flows into your data lakes, some kind of classification must be performed so the data you’re doing analytics on is actually accurate.

Companies waste technology cycles on wrong data, and it’s especially costly today when the data analysis is running on 10 terabytes versus, say, 1 gigabyte. All this quality and classification has to be done at some point, but it should be performed at an earlier level, even during the integration cycles. Enterprises that think about data quality early get better, more valuable analytics.

Summary                                                         

Every organization will become a data organization, or get left behind. What makes a company unique is the data they have and how that data is being used. Thus, a successful big data project ultimately depends on an organization’s ability to capture its data.

Rapid ingestion and processing of big data requires a reliable integration infrastructure that can easily scale to accommodate massive data volumes, drive real-time access, and support every request for analysis. Leveraging information to gain a competitive edge for your organization sounds great, but that only comes after building a usable data lake that reliably and accurately integrates all of your data sources.

Maximizing the value of your big data integration will succeed when the right information gets to the right person so it can be understood and acted upon. But only when companies support their big data investment with a reliable integration platform underneath will they reap the big data rewards every enterprise seeks.

Senior Product Manager Arvind Venugopal manages enterprise integration and big data gateway solutions at Cleo.