Dr Frank Blog: Science Pushes the Limit of HPC & Cloud

Cutting-edge scientific research from high-energy physics to genomic medicines continued to push the frontier of high-performance computing and increasingly Big Data and cloud computing. This week, a collection of cloud computing and HPC resources and scientific applications were in the spot light at the Bio-IT World Cloud Summit in San Francisco.

The event was covered in this article from BioIT World.

Research Using Supercomputing

Miron Livny - discussed OSG and its application in the search for the Higgs boson.Future challenges, Livny said, included what he called the “portability challenge” and the “provisioning challenge.” The former was how to make sure a job running on a desktop can also run on as many “foreign” resources as possible. The latter was being addressed by using targeted spot instances in the Amazon cloud, with prices dropping below 2 cents/hour. “Use it when the price is right, get out as fast as possible when the price is wrong,” Livny advised.

Jason Stowe (Cycle Computing) reviewed Cycle’s successes in spinning up high-performance computers with 50,000 cores on Amazon, such as a project with Schrodinger and Nimbus Discovery to screen a cancer drug target.

Victor Ruotti (Morgridge Institute) is about halfway through his ambitious experiment using the cloud to conduct an extensive pairwise comparison of RNAseq signatures from 124 embryonic stem cell samples. By performing a total of some 15,000 alignments, Ruotti intends to create a sequence-based index to facilitate the precise identification of unknown ES cell samples.

HPC Cloud for Research

Mirko Buholzer (Complete Genomics) presented a new “centralized cloud solution” that Complete Genomics is developing to expedite the digital delivery of genome sequence data to customers, rather than the current system of shipping hard drives, fulfilled by Amazon via FedEx or UPS. 100 genomes sequenced to 40x coverage consumers about 35 TB data, or a minimum of 12 hard drives, said Buholzer. The ability to download those data was appealing in principle, but to where exactly? Who would have access? Complete plans to give customers direct access to their data in the cloud, providing information such as sample ID, quality control metrics, and a timeline or activity log. For a typical genome, the reads and mappings make up about 90% of the total data, or 315 GB. (Evidence and variants make up 31.5 GB and 3.5 GB, respectively.) Customers will be able to download the data or push it to an Amazon S3 bucket. The system is currently undergoing select testing, but Buholzer could not say whether anyone had agreed to forego their hard drives just yet.

Dealing with Data Challenge

Gary Stiehr (The Genome Institute at Washington University) described the construction of The Genome Institute’s new data center, required because of the unrelenting growth of next-generation sequencing data. “The scale of HPC wasn’t the challenge—but the time scale was caused by rapid, unrelenting growth,” said Stiehr. The new data center required more power and cooling capacity, and data transfers reaching 1 PB/week. The issue, said Stiehr, was whether to move the data to the compute nodes, or analyze the data already on the nodes by using internal data storage and processing the data stored there.

State-of-art for Supercomputing

Robert Sinkovits (San Diego Supercomputer Center) described Gordon, the supercomputer that makes extensive use of flash memory that is available to all academic users on a competitive basis. “It’s very good for lots of I/O,” said Sinkovits. A great Gordon application, said Sinkovits, will among other things, make good use of the flash storage for scratch/staging; require the large, logical shared memory (approximately 1 TB DRAM); should be a threaded app that scales to a large number of cores; and need a high-bandwidth, low latency inter-processor network. The Gordon team will turn away applications that don’t fully meet these requirements, he said, but singled out computational chemistry as one particularly good match. Gordon features 64 dual-socket I/O nodes (using Intel Westmere processors) and a total of 300 TB flash memory. Other features include a dual-rail 3D Torus InfiniBand (40Gbit/s) network and a 4-PB Lustre-based parallel file system, capable of delivering up to 100 GB/s into the computer.

Weijia Xu (Texas Advanced Computer Center/TACC) introduced the Stampede supercomputer which should be online early next year. It features100,000 conventional Intel processor cores and a total of 500,000 cores, along with 14 Petabytes disk, 272 TB+ of RAM, and a 56-Gbyte FDR InfiniBand Interconnect.

Nan Li (National Center for Supercomputing, Tianjin) described Tianhe-1A (TH-1A), the top-ranked supercomputer in China, with a peak performance of 4.7 PFlops, which is housed at the National Supercomputer Center in TianJin. (The computer was ranked the fastest in the world two years ago.) Applications range from geology, video rendering, and engineering, but include a number of biomedical research functions. Among users are BGI and a major medical institute in Shanghai. Li indicated this resource could also be made available for the pharmaceutical industry.

Makoto Taiji (Riken) highlighted Japan’s K Computer. The computer, which is located in Kobe, Japan, began in 2006. The cost has been estimated at $1.25 billion. For that, one gets 80,000 nodes (640,000 cores), memory capacity exceeding 1 PB (16 GB/node) and 10.51 PetaFlops (3.8 PFlops sustained performance). Using a 3D-Torus Network, bandwidth is 6 GB/s, bidirectional for each of six directions. Power efficiency is ranked at 20 MW, or about half of Blue Gene. Taiji said the special features of the K Computer include high bandwidth and low latency. Anyone can use the K computer—academics and industry—for free if results are published. Life sciences applications make up about 25% of K computer usage, with applications including protein dynamics in cellular environments, drug design, large-scale bioinformatics analysis, and integrated simulations for predictive medicine.

Update:

2012.09.14 - original post

Dr Frank Blog

Friday, September 14, 2012

Science Pushes the Limit of HPC & Cloud

No comments:

Post a Comment