The project is called the Aristotle Cloud Federation, and the principle investigator for the initiative hopes for it to act as a proof of concept for other colleges and universities to either join their cloud or build their own.
“We intend for this to be a model for academic institutions to share compute and data analysis resources over time,” says David Lifka, lead of the project and director of the Center for Advanced Computing at Cornell University in New York.
The news is exciting in the world of IaaS cloud computing because it represents a federated “community cloud” – a cloud built of shared resources for a specific subset of users.
+MORE AT NETWORK WORLD: What’s behind the odd-couple Microsoft-Red Hat partnership | 10 Most powerful supercomputers on earth +
The three universities initially involved are Cornell, the University of Buffalo and The University of California at Santa Barbara. All three have built a private cloud that they host based on software named Eucalyptus, which is both an open source project and a commercial product now owned by Hewlett Packard Enterprise.
“Let’s say a researcher has been using one of these local clouds, but they have a really big simulation,” explains Lifka, who has built Cornell’s Red Cloud. “They could use resources from across all three sites, or two sites.” Workloads and applications are able to be transferred across any site because they’re all running Eucalyptus.
A key tenet of Eucalyptus software is that it has strong federation with Amazon Web Service’s IaaS public cloud. So there could be opportunities to burst workloads into AWS’s cloud if needed.
One important part of the project is a system for tracking the use of compute and storage resources at each of the three sites. The University of Buffalo has a program named XD Metrics On Demand, while UCSB has a system that measures queuing of workloads. Researchers will collect empirical operational data to study overall cloud performance and individual workloads.
“Efficient use of federated clouds requires the ability to make predictions about where a workload will run best,” said Rich Wolski, professor of computer science at UCSB, and former Chief Technology Officer of Eucalyptus Systems, the company. “Using data and cloud-embedded performance monitors, (our system) will make it possible to predict the effects of federated work-sharing policies on user experience, both in the (on-site) clouds and in the Amazon Web Services cloud.”
The overall goal of creating Aristotle is to give scientists access to the compute and storage resources they need to get their research done as quickly as possible. One of the first use cases could be for helping to store and process astronomy data. No one site has enough storage to hold all of the data so researchers are working to create a data analysis tools that can be run at each of these sites. Another researcher at Cornell plans to use Aristotle to run simulations of drought conditions in various cities and towns localized to each location.
Lifka says he doesn't expect Aristotle to be a replacement for supercomputers that some colleges and universities have for crunching massive data sets. Many times those supercomputers are in high demand though, so researchers may have to wait for up to a week to use it. In a cloud environment, those resources could potentially be spun up on demand. While a batch-processing job in the cloud may take longer than a supercomputer, it could actually get done faster if researchers don’t have to wait for the job to be done. Lifka says the metrics that the Aristotle team are collecting will be critically important in helping them determine exactly what the best use cases for Aristotle is.