Wednesday, July 29 • 4:00pm - 4:30pm
Optimizing Codes on the Xeon Phi: A Case-study with LAMMPS

Abstract—Intel’s Xeon Phi co-processor has the potential to provide an impressive 4 GFlops/Watt while promising users that they need only to recompile their code to get it to run on the accelerator. This paper reports our experience on running LAMMPS, a widely-used molecular dynamics code, on the Xeon Phi and the steps we took to optimize its performance on the device. Using performance analysis tools to pinpoint bottlenecks in the code, we were able to achieve a speedup of 2.8x from running the original code on the host processors vs. the optimized code on the Xeon Phi. These optimizations also resulted in an improved LAMMPS’ performance on the host – speeding up the execution by 7x.

Majestic F

