Loading…
XSEDE15 has ended
All dates, times and locations of tech program events or other scheduled plans are subject to change. Please check back regularly to ensure you view the most up-to-date version of the schedule.
technology [clear filter]
Tuesday, July 28
 

10:00am CDT

Publishing and Consuming GLUE v2.0 Resource Information in XSEDE
XSEDE users, science gateways, and services need a variety of accurate information about XSEDE resources so that they can use those resources effectively. They need information to decide which resources to use, to track their usage of resources, and to provide a variety of services to their users. To provide this information, XSEDE is deploying new software and services to gather and publish static and dynamic resource information. This paper describes the software XSEDE uses to gather resource information, the GLUE v2.0 schema used to format that information, the messaging system used to quickly distribute information, and provides several examples of using this resource information.


Tuesday July 28, 2015 10:00am - 10:30am CDT
Majestic F

10:30am CDT

Using Data Science to Understand Tape-Based Archive Workloads
Data storage needs continue to grow in most fields, and the cost per byte for tape remains lower than the cost for disk, making tape storage a good candidate for cost-effective long-term storage. However, the workloads suitable for tape archives differ from those for disk file systems, and archives must handle internally generated workloads that can be more demanding than those generated by end users (e.g., migration of data from an old tape technology to a new one). To better understand the variegated workloads, we have followed the first steps in the data science methodology. For anyone considering the use or deployment of a tape-based data archive or for anyone interested in details of data archives in the context of data science, this paper describes key aspects of data archive workloads.


Tuesday July 28, 2015 10:30am - 11:00am CDT
Majestic F

11:00am CDT

Storage Utilization in the Long Tail of Science
The increasing expansion of computations in non-traditional domain sciences has resulted in an increasing demand for research cyberinfrastructure that is suitable for small- and mid-scale job size. The computational aspects of these emerging communities are coming into focus and being addressed through the deployment of several new XSEDE resources that feature easy on-ramps, customizable software environments through virtualization, and interconnects optimized for jobs that only use hundreds or thousands of cores; however, the data storage requirements for these emerging communities remains much less well characterized.
 
To this end, we examined the distribution of file sizes on two of the Lustre file systems within the Data Oasis storage system at the San Diego Supercomputer Center. We found that there is a very strong preference for small files among SDSC's users, with 90% of all files being less than 2 MB in size. Furthermore, 50% of all file system capacity is consumed by files under 2 GB in size, and these distributions are consistent on both scratch and projects storage file systems. Because parallel file systems like Lustre and GPFS are optimized for parallel I/O to large, wide-stripe files, these findings suggest that parallel file systems may not be the most suitable storage solutions when designing cyberinfrastructure to meet the needs of emerging communities.


Tuesday July 28, 2015 11:00am - 11:30am CDT
Majestic F
 
Filter sessions
Apply filters to sessions.