Top distributed computing projects still hard at work fighting the world's worst health issues

10.03.2015
This past fall saw the worst Ebola outbreak ever ravage western Africa, and while medical researchers are trying to find a drug to treat or prevent the disease, the process is long and complicated. That's because you don't just snap your fingers and produce a drug with a virus like Ebola. What's needed is a massive amount of trial and error to find chemical compounds that can bind with the proteins in the virus and inhibit replication. In labs, it can take years or decades.

Thanks to thousands of strangers, Ebola researchers are getting the help and computing power they need to shave off the time needed to find new drugs by a few years.

Distributed computing is not a new concept, but as it is constituted today, it's an idea born of the Internet. Contributors download a small app that runs in the background and uses spare PC compute cycles to perform a certain process.

When you are running a PC and using it for Word, Outlook and browsing, you are using a pittance of the compute power in a modern CPU, maybe 5% total, and that's only in bursts. Distributed computing programs use the other 95%, or less if you specify, and if you need more compute power for work, the computing clients dial back their work and let you have the CPU power you need. If you leave the PC on when not using it, the application goes full out.

There is a wide variety of programs, and one of them is aimed at finding drugs to help stop Ebola. It's part of the World Community Grid (WCG), run by IBM and using software developed at the University of California at Berkeley.

The WCG has almost 700,000 members with three million devices signed up to crunch away on those projects, according to Dr. Viktors Berstis, architect and chief scientist for the WCG at IBM. All told, WCG is running nearly 30 drug projects.

WCG uses software developed at Berkley called BOINC, or the Berkeley Open Infrastructure for Network Computing. Past distributed projects, and even current ones such as Folding@Home, use their own client to do the work. BOINC is a National Science Foundation-funded project to create one distributed computing client that any project can use, thus sparing researchers the effort of having to reinvent the wheel and just focus on their project and not the client.

The program simulates how a potential compound reacts vs. the target, such as a protein in a virus that is needed for the disease to survive. With distributed computing, WCG can go through millions of compounds for any given target and cut down dramatically on the research time it would take to do it in a lab.

"It's applicable to anything that takes a lot of CPU time and can be split up into millions of independent-running jobs. The only difference between World Community Grid and a supercomputer is the supercomputer processors can talk to each other," said Dr. Berstis.

The Ebola hunt

In the case of Ebola, WCG has partnered with The Scripps Institute, a biomedical research group in La Jolla, Calif., to launch Outsmart Ebola Together. The project will target multiple hemorrhagic viruses in the Ebola family, according to Dr. Erica Saphire, the researcher heading the program at Scripps.

The project will target one specific protein that is used to attach the virus to healthy cells in a human to then replicate. This protein is being targeted because unlike other proteins in the viruses, it can't mutate. "The site they are targeting is the way the virus finds its way into the cell. So if it changes too much in any way, it's not viable. It's one of the few places [in the virus] that can't change. It has to keep that the same. So that makes an ideal drug target," said Dr. Saphire.

Dr. Saphire said the FightAIDS@Home group at Scripps often gets done in a few months what would have taken 10 years otherwise and wants to put the three million devices of WCG to work. "With this massive computational power, we're asking what can we understand that we've never understood before," she said. "It's the most fundamentally important thing my lab has ever done. It's also the biggest."

Scripps is a large, well-funded institute and could easily afford supercomputers, but Dr. Saphire said WCG is a better option. "It turns out that having hundreds of thousands of computers in parallel accelerates things more than having a supercomputer here," she said.

Success stories

Dr. Art Olson, professor in a department of integrative computational and structural biology at Scripps, has used WCG for the FightAIDS@Home project since 2005, and before that with a now-defunct company called United Devices back in 2000 in one of the first distributed biomedical computing project.

The first papers published by Dr. Olson's group talked about the mutation of the HIV protease, the part of the HIV virus that handles replication, and how they can both target the protease to stop replication and how the protease develops drug resistance. "That gave us a set of targets to try and find drugs that could be effective against that spanning set of mutants," said Dr. Olson.

Like Dr. Saphire, he prefers the massive number of CPUs available via WCG over an in-house supercomputer. "We have very good computing resources here, but we're not the only people who use the computing resources at Scripps. We can only get 300 CPUs at any given time, whereas on the World Community Grid we can get tens of thousands of CPUs to use at any given time. So it's a major boost. We would never even try to do the scope of the kinds of dockings we do using just our local institutional resources," he said.

There are other WCG successes besides Scripps. Dr. Berstis said one success story was simulations of carbon nanotubes. Water flows through the tubes 10,000 times more efficiently than thought, so there are now experiments to find less expensive methods of filtering or desalinating water than using the very expensive reverse osmosis filters.

A recently disclosed project from the Help Fight Childhood Cancer group at WCG found compounds to cure childhood neuroblastoma, a cancer of the nervous system. Done in conjunction with a group in Japan, they found 7 drug candidates with a 95% likelihood of curing the cancer.

Finally, there was a cancer project that looked at images of biopsies with machine optical scanning. Eventually an algorithm was developed that helped analyze those images to determine if cancer cells are present. "They are as good as humans now so it will help identify if there is cancer present or not much faster," said Dr. Berstis.

DIY distributed computing

The concept of using idle CPU cycles instead of investing millions in supercomputers is not lost on IT departments or companies with big processing tasks. Anecdotal stories of firms setting up their own internal distributed computing networks have been around for several years, although most firms will not discuss them out of concern for giving away a competitive advantage.

CDx Diagnostics, which develops equipment to detect cancer in its earliest stage, was willing to discuss its efforts. It has built a data center of computers just to do processing, plus it utilizes idle CPU cycles on employee computers, to build its own internal grid computing environment to analyze digitized microscopic slide data for detection of cellular changes that would indicate cancerous and pre-cancerous cells.

CDx needed a cheap system that can process 590GB of image data generated per pathology slide, and patients can have multiple slides, in less than four minutes. On a single PC, such analysis would normally take four hours. And it's still no replacement for human eyes. Slides are still processed by humans, but the grid system can pick up on anomalies, or note there are none.

Employees leave their computers on when they go home at night. The client PCs tell the servers their computing capabilities and the servers decide which computers get what kinds of workloads. Faster computers get the higher priority in doing the next task, said Robert Tjon, vice president of engineering and developer of the grid.

Tjon said the best performance for price is from commodity hardware, which is robust, highly reconfigurable and scalable as long as there is a centralized system that can manage the external resources efficiently so the computers are constantly fed data.

"One hundred percent utilization of the computer resources will keep the cost of the overall grid down in terms of space, heat, power, and manpower to keep the system up. We also like the fact that Intel invests billions to make the computer cheaper and faster and we only have to pay the price of a regular, popular consumer item," he said.

So it could be that your idle PC may one day save your life.

(www.itworld.com)

Andy Patrizio