The Pillars of a Modern Data Prep Platform

– Nenshad Bardoliwalla, Co-founder and VP of Products for Paxata, says:

The five pillars of a modern data prep platform – performance, elasticity, efficiency, connectivity and scalability – all underscore the need for a solution that can deliver interactive performance on data preparation tasks at massive volumes. Elasticity should be achieved through a utility-based model that dynamically scales costs and capacity up or down as workloads and users increase or decrease. Efficiencies are gained by performing massive data prep projects with no coding, sampling or pre-defined models required. Connectivity options should provide seamless access to machine and interaction data stored in Hadoop clusters and relational databases. Finally, scalability needs to be addressed through automation of data prep tasks, project repeatability and data extraction with a new REST API toolkit.

Paxata, the first Adaptive Data Preparation™ application and platform, has delivered its Spring ’15 release at Strata + Hadoop World in an effort to address each unique pillar. The new solution helps business analysts, data scientists, developers, data curators, and IT teams automate, collaborate and dynamically govern the data integration, data quality and enrichment process in a self-service fashion.

The Paxata platform was built with a data management layer that persists data inside the Hadoop Distributed File System (HDFS) and a real-time columnar parallelized in-memory pipeline data prep engine powered by Intellifusion™. The data prep engine wraps Apache Spark v1.2 with additional functionality built to optimize Spark performance and responsiveness. This unified experience allows for a seamless transition between interactive and batch execution models supported by advanced compilation and caching techniques.

According to Forrester Research, “Vendors are building a new generation of data preparation tools that use Apache Hadoop and Spark to run machine-learning algorithms in support of data preparation tasks. These new tools provide users with more predictive and intelligent assistance as they structure data, identify types, script transformations, and clean or enrich data.” The report goes on to say: “With Paxata, analysts work in spreadsheet-like interface on top of a machine-learning engine that shapes and improves the data automatically. They can source data, reports and pivot tables. Paxata learns from acceptance and additional shaping of the data by analysts from other projects.”

My colleague, Co-founder and CEO Prakash Nanduri commented, “The power center of data preparation has moved away from legacy IT-only, relational and batch processing, into the self-service, interactive, poly-structured, elastic world.”

The Paxata Adaptive Data Preparation is available in a multi-tenant cloud service as well as an on-premise deployment. Cloud customers are able to access the robust Paxata architecture without additional cost or burden of maintenance. On-premise or private cloud customers have the ability to deploy Paxata within a dedicated Hadoop environment or as part of their existing Hadoop cluster. Regardless of deployment model selected, all customers benefit from the only modern data preparation solution built for data prep at scale.

About the Author: Nenshad Bardoliwalla is Co-founder and VP of Products for Paxata. An executive and thought leader with a proven track record of success leading product strategy, product management, and development in business analytics, Nenshad is also the lead author of Driven to Perform: Risk-Aware Performance Management From Strategy Through Execution. For more information, visit www.paxata.com.

The Pillars of a Modern Data Prep Platform

Recent Posts

Archives