Real Time Replication
– Robert Hodges, CEO of Continuent, says:
The open source community – and all those who thrive in the MySQL marketplace are looking closely for new technology approaches that will help them achieve real-time integration between operational database systems and Hadoop. Rapid loading of data into Hadoop enables effective analytics and is becoming increasingly crucial, as Hadoop itself raises the bar for real-time query through technology like the Cloudera-sponsored Impala project, Spark, and HBase.
Herein lies the problem; how to find a high-performance, low-impact process to transfer data from multiple upstream systems into Hadoop and handle the billions of daily transactions in order to derive demonstrable business value. Many have already tried using traditional ETL tools and slow data-scraping techniques, but they only put a heavy load on operational systems and worse, they are often unable to meet demand. Recognizing that fast decision-making depends on real-time data movement, organizations are looking for a solution that can quickly and flexibly move data out of operational databases into Hadoop, where they can run analytics.
Chris Schneider, Database Architect at Groupon summed it up best when he said, “Hadoop has clearly changed the landscape of data management by providing a central data hub that receives data from across the business; however, there is now a wide range of services that need to move OLTP data efficiently into Hadoop. Using real time replication techniques, we can quickly and flexibly move data out of operational databases into Hadoop where we run analytics that answer important business questions on timelines, matching the needs of our users.”
IT teams need to replicate transactions from databases such as MySQL™, MariaDB and Oracle® to Hadoop in real-time, in order to produce carbon-copy tables and enable the execution of analytic views in Hadoop within multiple environments including Hive. By automatically reading the DBMS and forwarding transactions as soon as they commit, organizations can reduce the load on operational systems, minimize the amount of data that needs to move between locations, and enable transactions to quickly enter the data warehouse.
When using real time replication in conjunction with Apache Sqoop™, businesses can provision information and then replicate transactions incrementally in real-time. This provides continuous data loading so existing data and changes can easily be applied into Hadoop. Real-time replication from operational databases into Hadoop is a technology worth watching, as it will prove to be big move for the MySQL community.
About The Author
Robert Hodges is the CEO of Continuent www.continuent.com a leading provider of database clustering and replication, enabling enterprises to run business-critical applications on cost-effective open source software.