– Deirdre Mahon, vice president of marketing at Rainstor (www.rainstor.com), says:
Probably the single most challenging part of proactively managing the data center is the strategy and planning around IT infrastructure and how much capacity is required to retain existing enterprise data in addition to future storage capacity requirements. Most organizations today retain enterprise data for many years and in fact many never actually delete the data – once transacted, it is retained from “now on.” This places burden on IT that requires data to be online and available for continuous query and analysis in addition to providing fast access to external regulators that govern how long data be retained.
Typically, IT keeps the data in the systems it was originally transacted until such a time where that system is no longer used and becomes legacy but where the data still needs to be retained and accessed. Increasing demands from the business to query this data enforces IT to keep it in expensive systems that require costly DBA resources to maintain over time. However, more diligent information life-cycle data management is required which enforces policies around how long data is retained in enterprise production environments that will ultimately make IT much more efficient and satisfy both the business needs and additionally the IT budget. Offloading large volumes of transactional data from production to a dedicated online archive is key to enabling Big Data to be retained at lowest possible cost and efficient scale.
IT needs to be more rigorous with data management and infrastructure technology choices and the resultant expenditures. Gone are the days where traditional relational or analytical environments are the only option to keep data secure, available and online for business query. There is no longer a one-size fits all approach to managing enterprise data. In the last decade, there has been tremendous innovation in the world of data management and we have witnessed rapid adoption of NoSQL, In-memory, Columnar and Hadoop/MapReduce as ways to corral the ever-growing volume of multi-structured enterprise data. Whilst IT is struggling to transform this data into actionable information for the business, it is very important to not lose sight of the overall cost of storing and retaining this data, which will become even more pronounced as volumes continue to escalate.
A right-tiering approach to how data is managed and stored is required and deploying best-of-breed purpose-built technologies to satisfy the specific business need is what IT needs to focus on.
Analysts continue to report that Big Data is on the rise. IDC says the amount of data will grow 44 times by 2020, and the amount of digital information created and replicated rose by 62 percent in 2010 to nearly 800,000 petabytes, which would fill a stack of DVDs reaching from the earth to the moon and back. By 2020, that pile of DVDs would stretch halfway to Mars.
In terms of RainStor’s rank in overall data center priorities, it’s high, given the speed of enterprise data growth. As our world continues to become more digital, the Big Data deluge will drive an increasing need for additional data center storage across all industries, including communications, healthcare, financial services, SmartGrid utilities, security, etc. This will place new levels of stress on our data centers, systems and infrastructures.
Central to RainStor’s unique product capabilities is the ability to compress and de-duplicate large data sets, enabling reduction ratios that are typically 40:1, rising to 100:1 with some data, through the use of four distinct, yet complementary, techniques. With RainStor’s data reduction capabilities, organizations can significantly reduce overall storage costs and enable a data center to run much more efficiently.
The four techniques include field level de-duplication, pattern level de-duplication, algorithmic and byte level compression. These don’t result in any loss of detail; instead, RainStor stores each record as a series of pointers to the location of a single instance of data value or pattern of data values.
RainStor offers a new class of Big Data repository, focused on long-term Big Data retention with continuous query access. With RainStor, data centers can go on a “Big Data Diet” or in other words, reduce the storage capacity and cost to keep large volumes of data online. For example, you can offload 180-day-old+ data from production to RainStor for your online archive, and retain query and analysis capabilities via standard SQL and various BI tools. RainStor achieves this at a much lower cost per terabyte stored. By having virtually unlimited amounts of data online and available, you eliminate the need for tape archive and therefore the time delay and manual effort to retrieve data from tape, which is risky especially if data sets are large and schemas have changed since the time the data was offloaded.
Data center and IT managers should carefully consider a tiered infrastructure and data management strategy to retain and store critical enterprise data for both business and external regulatory requirements. RainStor’s patented technology is primarily focused on reducing the amount of data stored, which also significantly reduces overall storage costs, and you can run on low-cost commodity hardware enabling you to lower overall total cost of retained data. Let’s look at the key benefits to RainStor’s unique capabilities.
RainStor benefits enterprise data centers in the following ways:
- Dramatically reduces the cost and complexity of storing large volumes of historical structured and semi-structured data compared to traditional databases
- Provides continuous access to historical data, which enables organizations to meet compliance regulations and to give business users access to broader data sets for ongoing analytics and BI
- Allows organizations to retain historical data, on-premise, via public or private cloud and hybrid storage
- Enables you to better control your data assets by auto-deleting records based on compliance retention rules.
Most large organizations today retain data for many years, and a 2011 DBTA survey reveals that data is retained forever. They will benefit from the following capabilities:
- Specific use-cases would include compliance data retention, query and reporting and situations where you need to archive legacy application data on systems you are retiring due to consolidation or modernization efforts.
- Continuous online access to larger and broader data sets that are query-able through standard SQL or BI tools whereby you can re-instate older data into production analytics environments for better results
- Ability to compress or reduce data sets to a smaller, manageable footprint (~40 to 1 or greater) in order to reduce overall storage costs and scale as data volumes inevitably grow
- Ability to retain specific data sets by pre-configured business rules, which allow organizations to easily purge data at exactly the right time. (Keeping data longer than required makes little sense and can in some cases be risky so automating this keeps data retention costs down.)
- Ability to run on a broad range of hardware and operating systems, which ensures future flexibility
- Compressing and reducing data to 95 percent means less storage footprint and provides not only significant savings for on-premise data center deployments but is even more economically attractive with cloud deployments.