XSEDE15 has ended
All dates, times and locations of tech program events or other scheduled plans are subject to change. Please check back regularly to ensure you view the most up-to-date version of the schedule.
Back To Schedule
Monday, July 27 • 1:30pm - 4:30pm
Tutorial: Accelerating Big Data Applications with Hadoop, Spark, and Memcached on Modern HPC Clusters

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The explosive growth of 'Big Data' has caused many industrial firms to adopt HPC technologies to meet the requirements of huge amount of data to be processed and stored. According to the IDC study in 2013, 67\% of high-performance computing systems were running High-Performance Data Analysis (HPDA) workloads. Apache Hadoop and Spark are increasingly being used on modern high-performance computing clusters to process HPDA workloads. Similarly, Memcached in Web 2.0 environment is becoming important for large-scale query processing.

Recent studies have shown that default Apache Hadoop, Spark, and Memcached can not leverage the features of modern high-performance computing clusters efficiently, like Remote Direct Memory Access (RDMA) enabled high-performance interconnects, high-throughput and large capacity parallel storage systems. These middleware are traditionally written with sockets and do not deliver the best performance on HPC clusters with modern high performance networks. In this tutorial, we will provide an in-depth overview of the architecture of Hadoop components (HDFS, MapReduce, RPC, HBase, etc.), Spark and Memcached. We will examine the challenges in re-designing the networking and I/O components of these middleware with modern interconnects, protocols (such as InfiniBand, iWARP, RoCE, and RSocket) with RDMA and storage architecture. Using the publicly available software packages in the High-Performance Big Data (HiBD, http://hibd.cse.ohio-state.edu) project, we will provide case studies of the new designs for several Hadoop/Spark/Memcached components and their associated benefits. Through these case studies, we will also examine the interplay between high performance interconnects, storage systems (HDD and SSD), and multi-core platforms to achieve the best solutions for these components and Big Data applications on modern HPC clusters.

Monday July 27, 2015 1:30pm - 4:30pm CDT
Majestic C

Attendees (0)