Thursday, April 5, 2012

jTool: "Twister" - MapReduce for Research Computing

Applying Hadoop framework and MapReduce programming model to research computing remains a challange despite its success and adoption by web 2.0 companies and other industries such as financial and retail. The developer of the following software comes from a research computing background dominated by traditional HPC tools such as MPI. Here they made clever observation of the shortcoming of Hadoop/MapReduce and found ways for improvement. The end result is a much enhanced too -- "Twister" -- that is well suited for research computing methods such as matrix multiplication.

"Twister," a software tool released by Indiana University, supports faster execution of many data mining applications implemented as MapReduce programs. Developed by researchers at Community Grids Lab from the Pervasive Technology Institute at IU, the tool extends the functionality of MapReduce, a distributed programming technique patented by Google for large-scale data processing in datacenter environments.



Applications that initially use Twister include: K-means clustering, Google's page rank, Breadth first graph search, Matrix multiplication, and Multidimensional scaling. Twister also builds on the SALSA team's work related to commercial MapReduce runtimes, including Microsoft Dryad software and open source Hadoop software.

Twister provides the following features to support MapReduce computations
  • Distinction on static and variable data
  • Configurable long running (cacheable) map/reduce tasks
  • Pub/sub messaging based communication/data transfers
  • Efficient support for Iterative MapReduce computations (extremely faster than Hadoop or Dryad/DryadLINQ)
  • Combine phase to collect all reduce outputs
  • Data access via local disks
  • Lightweight (~5600 lines of Java code)
  • Support for typical MapReduce computations
  • Tools to manage data
Twister is developed as part of Jaliya Ekanayake's Ph.D. research.


Link: 

No comments:

Post a Comment