Applying Hadoop framework and MapReduce programming model to research computing remains a challange despite its success and adoption by web 2.0 companies and other industries such as financial and retail. The developer of the following software comes from a research computing background dominated by traditional HPC tools such as MPI. Here they made clever observation of the shortcoming of Hadoop/MapReduce and found ways for improvement. The end result is a much enhanced too -- "Twister" -- that is well suited for research computing methods such as matrix multiplication.
"Twister," a software tool released by Indiana University, supports
faster execution of many data mining applications implemented as MapReduce
programs. Developed by researchers at Community Grids Lab from the Pervasive Technology Institute at
IU, the tool extends the functionality of MapReduce, a distributed programming
technique patented by Google for large-scale data processing in datacenter
environments.
Applications that initially use Twister include: K-means clustering, Google's
page rank, Breadth first graph search, Matrix multiplication, and
Multidimensional scaling. Twister also builds on the SALSA team's work related
to commercial MapReduce runtimes, including Microsoft Dryad software and open
source Hadoop software.
Twister provides the following features to support MapReduce computations
- Distinction on static and variable data
- Configurable long running (cacheable) map/reduce tasks
- Pub/sub messaging based communication/data transfers
- Efficient support for Iterative MapReduce computations (extremely faster than Hadoop or Dryad/DryadLINQ)
- Combine phase to collect all reduce outputs
- Data access via local disks
- Lightweight (~5600 lines of Java code)
- Support for typical MapReduce computations
- Tools to manage data
Link:
No comments:
Post a Comment