Monday, August 13, 2012

Bigger, Faster and Cheaper - PetaStore Case Study

In 2010, I architected the Petascale Active Archive solution for University of Oklahoma which was implemented as the PetaStore system in 2011. It is a combined disk-tape active archive solution. The system was in production for about a year and a case study was published by IBM today.

Since 1890, the University of Oklahoma (OU) has provided higher-level education and valuable research through its academic programs. With involvement in science, technology, engineering and mathematics, the university has increased its focus on high performance computing (HPC) to support data-centric research. In service of OU’s education and research mission, the OU Supercomputing Center for Education & Research (OSCER), a division of OU Information Technology, provides support for research projects, providing an HPC infrastructure for the university.
 
Rapid data growth in academic research
 
In a worldwide trend that spans the full spectrum of academic research, explosive data growth has directly affected university research programs. Including a diverse range of data sources, from gene sequencing to astronomy, datasets have rapidly grown to, in some cases, multiple petabytes (millions of gigabytes).
 
One ongoing research project that produces massive amounts of data is conducted by OU’s Center for Analysis and Prediction of Storms. Each year, this project becomes one of the world’s largest storm forecasting endeavors, frequently producing terabytes of data per day. Much of this real-time data is shared with professional forecasters, but a large amount is stored for later analysis. Long-term storage of this data holds strong scientific value, and in many cases, is required by research funding agencies. Understandably, storage space had become a major issue for the university.
 
Need for an onsite storage system
 
In the past, for projects like storm forecasting, OU did not have the capability to store large amounts of data on campus—much of the data had to be stored offsite at national supercomputing centers. This not only created issues for performance and management at the university, it also forced researchers to reduce the amounts of data for offsite storage, creating potential for loss of information that could be valuable for future analysis.
 
Henry Neeman, director of OSCER, realized that to continue supporting many of the university’s research projects—and to retain funding—OU would need a large scale archival storage system that enabled long term data storage while containing costs for deployment and operations.
 
With a clear vision for the new storage system, OU began reviewing bids from multiple vendors. Neeman noticed that while most proposed solutions were technically capable, the IBM solution was able to meet technical requirements and stay within budget. Ultimately, it offered the best value to the university and would go on to establish a powerful new business model for storage of research data.
 
High-capacity, cost-effective data archive
 
Implementing a combination of disk- and tape-based storage, OU was able to establish a storage system known as the Oklahoma PetaStore, which is capable of handling petabytes (PB) of data. For high-capacity disk storage, the IBM System Storage DCS9900 was selected—which is scalable up to 1.7 PB. For longer-term data storage, OU chose the System Storage TS3500 Tape Library—with an initial capacity up to 4.3 PB and expandable to over 60 PB. To run these storage systems, six IBM System x3650 class servers were selected, running IBM General Parallel File System (GPFS™) on the disk system and IBM Tivoli Storage Manager on the tape library to automatically move or copy data to tape.
 
Neeman says one of the main reasons they chose IBM was the cost effectiveness of the tape solution. Unlike the TS3500 and Tivoli Storage Manager, many other tape solutions impose additional cost, such as tape cartridge slot activation upcharges and per-capacity software upcharges—demands that could be prohibitive to researchers. The TS3500 Tape library offers a flexible upgrade path, enabling users to easily and affordably expand the initial capacity. These savings even enabled OU to implement a mechanism to access and manage backup data through extensible interfaces. OU has adopted an innovative business model under which storage costs are shared among stakeholders. In this model, a grant from the National Science Foundation pays for the hardware, software and initial support; OU covers the space, power, cooling, labor and longer-term support costs; and the researchers purchase storage media (tape cartridges and disk drives) to archive their datasets, which OSCER deploys and maintains without usage upcharges.
 
Storage that impresses on many levels
 
The PetaStore provides research teams with a hugely expandable archive system, allowing data to be stored through several duplication policy choices that are set by the researchers. The connectivity capabilities allow data to be accessible not only to the university, but to other institutions and collaborators.
 
Although capacity was more of a priority than speed when designing the PetaStore, this IBM solution has shown strong performance, with tape drives operating close to peak speed. Another key benefit to the solution is its cost-effectiveness—not only for hardware, but for the reduction of labor costs for the researchers. These benefits have been noticed by Neeman, who says, “Without the PetaStore, several very large scale, data-centric research projects would be considerably more difficult, time consuming and expensive to undertake—some of them so much so as to be impractical.”
 
Continued innovation with IBM
 
By choosing the IBM solution for the PetaStore project, the University of Oklahoma has ensured a future of continued innovation in academic research. The system not only facilitates storage for the entire lifecycle of research data, it ensures that the PetaStore can continue operating and expanding at very low cost. This is critical for the university to continue to receive funding—the solution’s built-in cost efficiency proves to research funding agencies that the university can continue to operate the storage system within budget.

Overall, the university and research teams have seen numerous advantages to the IBM solution, and plan for it to seamlessly expand along with their storage needs. According to Neeman, "We only needed three things: bigger, faster and cheaper," and the IBM solution was able to deliver on all fronts. Neeman predicts that data storage solutions like the Oklahoma PetaStore will become increasingly common at research institutions across the country and worldwide.

Source: 

No comments:

Post a Comment