Retaining Data At The Lowest Possible Cost And Efficiency Scale

– Deirdre Mahon, vice president of marketing at Rainstor (www.rainstor.com), says:

Probably the single most challenging part of proactively managing the data center is the strategy and planning around IT infrastructure and how much capacity is required to retain existing enterprise data in addition to future storage capacity requirements. Most organizations today retain enterprise data for many years and in fact many never actually delete the data – once transacted, it is retained from “now on.” This places burden on IT that requires data to be online and available for continuous query and analysis in addition to providing fast access to external regulators that govern how long data be retained.

Typically, IT keeps the data in the systems it was originally transacted until such a time where that system is no longer used and becomes legacy but where the data still needs to be retained and accessed. Increasing demands from the business to query this data enforces IT to keep it in expensive systems that require costly DBA resources to maintain over time. However, more diligent information life-cycle data management is required which enforces policies around how long data is retained in enterprise production environments that will ultimately make IT much more efficient and satisfy both the business needs and additionally the IT budget. Offloading large volumes of transactional data from production to a dedicated online archive is key to enabling Big Data to be retained at lowest possible cost and efficient scale.

IT needs to be more rigorous with data management and infrastructure technology choices and the resultant expenditures. Gone are the days where traditional relational or analytical environments are the only option to keep data secure, available and online for business query. There is no longer a one-size fits all approach to managing enterprise data. In the last decade, there has been tremendous innovation in the world of data management and we have witnessed rapid adoption of NoSQL, In-memory, Columnar and Hadoop/MapReduce as ways to corral the ever-growing volume of multi-structured enterprise data. Whilst IT is struggling to transform this data into actionable information for the business, it is very important to not lose sight of the overall cost of storing and retaining this data, which will become even more pronounced as volumes continue to escalate.

A right-tiering approach to how data is managed and stored is required and deploying best-of-breed purpose-built technologies to satisfy the specific business need is what IT needs to focus on.

Analysts continue to report that Big Data is on the rise. IDC says the amount of data will grow 44 times by 2020, and the amount of digital information created and replicated rose by 62 percent in 2010 to nearly 800,000 petabytes, which would fill a stack of DVDs reaching from the earth to the moon and back. By 2020, that pile of DVDs would stretch halfway to Mars.

In terms of RainStor’s rank in overall data center priorities, it’s high, given the speed of enterprise data growth. As our world continues to become more digital, the Big Data deluge will drive an increasing need for additional data center storage across all industries, including communications, healthcare, financial services, SmartGrid utilities, security, etc. This will place new levels of stress on our data centers, systems and infrastructures.

Central to RainStor’s unique product capabilities is the ability to compress and de-duplicate large data sets, enabling reduction ratios that are typically 40:1, rising to 100:1 with some data, through the use of four distinct, yet complementary, techniques. With RainStor’s data reduction capabilities, organizations can significantly reduce overall storage costs and enable a data center to run much more efficiently.

The four techniques include field level de-duplication, pattern level de-duplication, algorithmic and byte level compression. These don’t result in any loss of detail; instead, RainStor stores each record as a series of pointers to the location of a single instance of data value or pattern of data values.

RainStor offers a new class of Big Data repository, focused on long-term Big Data retention with continuous query access. With RainStor, data centers can go on a “Big Data Diet” or in other words, reduce the storage capacity and cost to keep large volumes of data online. For example, you can offload 180-day-old+ data from production to RainStor for your online archive, and retain query and analysis capabilities via standard SQL and various BI tools. RainStor achieves this at a much lower cost per terabyte stored. By having virtually unlimited amounts of data online and available, you eliminate the need for tape archive and therefore the time delay and manual effort to retrieve data from tape, which is risky especially if data sets are large and schemas have changed since the time the data was offloaded.

Data center and IT managers should carefully consider a tiered infrastructure and data management strategy to retain and store critical enterprise data for both business and external regulatory requirements. RainStor’s patented technology is primarily focused on reducing the amount of data stored, which also significantly reduces overall storage costs, and you can run on low-cost commodity hardware enabling you to lower overall total cost of retained data. Let’s look at the key benefits to RainStor’s unique capabilities.

RainStor benefits enterprise data centers in the following ways:

Dramatically reduces the cost and complexity of storing large volumes of historical structured and semi-structured data compared to traditional databases
Provides continuous access to historical data, which enables organizations to meet compliance regulations and to give business users access to broader data sets for ongoing analytics and BI
Allows organizations to retain historical data, on-premise, via public or private cloud and hybrid storage
Enables you to better control your data assets by auto-deleting records based on compliance retention rules.

Most large organizations today retain data for many years, and a 2011 DBTA survey reveals that data is retained forever. They will benefit from the following capabilities:

Specific use-cases would include compliance data retention, query and reporting and situations where you need to archive legacy application data on systems you are retiring due to consolidation or modernization efforts.
Continuous online access to larger and broader data sets that are query-able through standard SQL or BI tools whereby you can re-instate older data into production analytics environments for better results
Ability to compress or reduce data sets to a smaller, manageable footprint (~40 to 1 or greater) in order to reduce overall storage costs and scale as data volumes inevitably grow
Ability to retain specific data sets by pre-configured business rules, which allow organizations to easily purge data at exactly the right time. (Keeping data longer than required makes little sense and can in some cases be risky so automating this keeps data retention costs down.)
Ability to run on a broad range of hardware and operating systems, which ensures future flexibility
Compressing and reducing data to 95 percent means less storage footprint and provides not only significant savings for on-premise data center deployments but is even more economically attractive with cloud deployments.

Big Data presents a challenge for IT and is particularly pronounced in key industries including communications, financial services, utilities and healthcare because they are governed by external regulatory requirements for retaining and providing quick access to historical data for audits, reports and business analysis. IT must select the best technology solutions available to keep data for extended periods of time and more importantly, in the most cost effective way. For large global organizations, keeping and storing large volumes of data is a sunk cost, and doing so in the most efficient way is critical to staying ahead of the competition.

Investing in technology that compresses data at a high rate, satisfies stringent compliance and government regulations, provides ease-of scale and the fact that it’s query-able is critical for these organizations. RainStor solves this problem by delivering a unique technology capability that ultimately reduces the data footprint and makes the problem 10x less cost, when compared to a traditional database approach. Often operational systems become bloated over time with historical data sets, which can be offloaded to a RainStor archive for continuous data access. Additionally, instead of putting data on tape which is risky because you will have challenges with re-instating the data to the original system especially if it is voluminous. Data warehouse repositories can also be offloaded with large data sets to RainStor where that historical data can later be pulled back into the core BI system if deeper analysis is required in the future.

RainStor’s IP is on its unique compression capabilities where it uses a tree-based structure or a “binary tree” to store data that links the various instances of patterns together to establish data records. This means that the original records can be reconstituted at any time. This de-duplication process also means that the bigger the data set, the higher the probability that values and patterns will be repeated, and the greater the level of compression that can be achieved when loaded.

Take a look at this video by RainStor’s Chief Architect, which explains how extreme data compression is achieved to deliver significant reduction in storage footprint for cost-efficient Big Data retention:

Retaining Data At The Lowest Possible Cost And Efficiency Scale

Recent Posts

Archives