XSEDE15 has ended
All dates, times and locations of tech program events or other scheduled plans are subject to change. Please check back regularly to ensure you view the most up-to-date version of the schedule.
Back To Schedule
Tuesday, July 28 • 11:00am - 11:30am
Storage Utilization in the Long Tail of Science

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The increasing expansion of computations in non-traditional domain sciences has resulted in an increasing demand for research cyberinfrastructure that is suitable for small- and mid-scale job size. The computational aspects of these emerging communities are coming into focus and being addressed through the deployment of several new XSEDE resources that feature easy on-ramps, customizable software environments through virtualization, and interconnects optimized for jobs that only use hundreds or thousands of cores; however, the data storage requirements for these emerging communities remains much less well characterized.
To this end, we examined the distribution of file sizes on two of the Lustre file systems within the Data Oasis storage system at the San Diego Supercomputer Center. We found that there is a very strong preference for small files among SDSC's users, with 90% of all files being less than 2 MB in size. Furthermore, 50% of all file system capacity is consumed by files under 2 GB in size, and these distributions are consistent on both scratch and projects storage file systems. Because parallel file systems like Lustre and GPFS are optimized for parallel I/O to large, wide-stripe files, these findings suggest that parallel file systems may not be the most suitable storage solutions when designing cyberinfrastructure to meet the needs of emerging communities.

Tuesday July 28, 2015 11:00am - 11:30am CDT
Majestic F

Attendees (0)