Aiming for SEC's big data project, Sungard and Google bet on the cloud
The work is being done to compete for a U.S. Securities and Exchange Commission contract, called the Consolidate Auditing Trail (CAT). The SEC's goal is to build a system that provides more transparency into financial markets, a response, in part, to the computer-driven 2010 "flash crash" that briefly cratered U.S. stock prices.
"The CAT is a huge undertaking," said Neil Palmer, Sungard's chief technology officer for its consulting services practice. "It is the biggest big data problem in the financial industry today."
Palmer described the prototype Friday at the Google Next user conference in New York. Sungard, a provider of financial software and services, is one of six finalists for the work, and has partnered with Google for technology infrastructure.
The flexibility of cloud computing provides the ability for Sungard to pursue such an ambitious job, Palmer told a group of reporters after the keynote.
With building a system in-house, "there are just too many unknowns," he said, referring to the intense hardware and operational demands that would come with building an on-premises system to execute this work.
The system will cost anywhere from $350 million to $1 billion to build, the SEC has estimated.
Once operational, CAT will generate a tremendous amount of data, Palmer said. The system must record every quote and every trade from every financial company participating in the public U.S. markets. The companies must submit their data on a daily basis, and the system must keep this data for six years.
Each day, the system will ingest about 50 terabytes of data, comprised of about 100 billion events. The six year window of when records will be actively kept will amount to about 30 petabytes of data, Sungard estimated.
All this data must be validated, indexed, and posted within four hours.
Tools must also be available to query all this data. "There is no point storing that much data and not being able to generate any actionable information from it," Palmer said.
In addition to the SEC, cloud accessible financial data could also be of immense value to the financial firms themselves, Palmer noted. A comprehensive centralized copy of all financial trading information would reduce the needs for such firms to store that data in house. They could test algorithms on the history market data, to see how will they can predict upcoming changes.
Sungard assembled the infrastructure for the prototype using a variety of Google Cloud Platform components.
The prototype uses Google Cloud Storage to hold the data, and Google BigTable to structure the information. A Google Dataflow service can validate the data. Google's Big Query can be used to publish the data and provide a way for users to analyze it, either directly or through third-part software such as Microsoft Excel or various business intelligence (BI) packages.
In the first round of tests, the prototype was able to process 10 billion events an hour, or about three gigabytes of data process a second.
Best of all, the prototype was built in six weeks.
"There is no way we could have done that, even with similar technologies, if we had to stand up our own infrastructure," Palmer said.
The team still has work to do, Palmer said. The system must be able to ingest data at four times the speed of the current prototype.
If Sungard wins the contract, or if another team with a cloud-based approach wins, it will represent a significant step towards the acceptance of the cloud computing model in the U.S. financial industry, which to date has been reluctant to embrace the approach.
Sungard's testimony was thematically in tune with the overall theme of the keynote, which was how cloud computing can give companies resources to forge into new markets that they otherwise wouldn't have had access to.
Carl Schachter, Google vice president of the cloud platform, said that companies like Uber and AirBnB have used cloud computing to disrupt traditional markets. "Markets that have been previously thought established are now re-inventable," he said.
The other finalists for the CAT contract are Epam systems; Thesys; the Financial Industry Regulatory Authority (FINRA); a team comprised of AxiomSL and Computer Sciences Corporation; and a consortium of companies including Hewlett Packard and Booz Allen.
The SEC has not offered a date when it expects to award the contract.
Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com