In-Memory Technology Speeds Up Data Analytics

26.06.2013
There's fast, and then there's mind-numbingly fast.

Just ask AdJuggler, an Alexandria, Va.-based company that runs a Software-as-a-Service ad-serving platform. The company's ad serving business was always fast-paced, but the advent of real-time bidding has taken speed to a new level.

In real-time bidding, a publisher sends an ad impression to an online exchange that puts out a request for bids. When a user arrives on a particular Web page, advertisers tender bids, and the highest bidder's ad is placed on the page. The digital ad sale happens quickly; according to Ben Lindquist, vice president of technology at AdJuggler, a buyer in a real-time bidding scenario has a 100 millisecond window to bid on a given impression.

It's that kind of speed requirement that led AdJuggler to purchase an in-memory data management product, Terracotta's BigMemory. The in-memory technology is set to debut in a limited use case later this month as part of AdJuggler's next-generation ad-serving platform.

Such deployments move the database from its traditional home in disk storage, placing it instead in memory. This approach boosts database query response times, since the trip from memory to processing core is much faster than searching for data housed on disk.

News: Software AG Goes All-out for In-memory Data Processing

Mike Allen, vice president of product management at Terracotta, a wholly owned subsidiary of Software AG, says that, as a rule of thumb, memory is 1,000 times faster than disk. For AdJuggler, with its ultra-narrow bidding window and high transaction volume, that speed difference tipped the balance in favor of in-memory.

"Bidders can't afford to spend a significant portion of that window doing disk seeks," Lindquist says. "That is what it has come down to."

Moving to In-memory From Disk Means Less Database Tuning

AdJuggler's current platform, which matches ads to places on Web pages at a clip of 20,000 transactions per second, includes a mySQL database. The database houses configuration data on customers' campaigns-essentially, ad placements on various websites. Lindquist says all that configuration data will move from the disk-based mySQL data store to Terracotta's in-memory technology. AdJuggler will also add multiple terabytes of anonymized audience data.

News: Microsoft Adding In-memory Technology to SQL Server

"We will end up with a record in there for every user who goes to a piece of content that can be served an ad through our system," Lindquist says, adding that the user data will amount to hundreds of millions of records.

That data store will further multiply since AdJuggler customers will be permitted to place their own, proprietary audience data into the Terracotta data management system. As for throughput, the new platform will be able to grow to support at least 1 million transactions per second, Lindquist notes.

The in-memory shift expands the possibilities for a database involved in real-time decision making, Lindquist says. Previously, getting a database to perform at the now-required level would call for a significant amount of tuning-configuring memory and carving out a data cache in RAM to improve performance.

A cache hit is quicker than going back to disk for data, but a cache typically represents a small portion of the data stored in a database. Lindquist notes that mySQL performance depends on having the right piece of data in memory at the right time.

Why not put all the crucial data there "We decided it's all got to be in memory," Lindquist explains, "so you don't have to worry about the tremendous amounts of database tuning you typically would have to do." AdJuggler will run a Terracotta cluster, using the distributed version of the company's BigMemory data management software."

In-memory Delivers Better Fraud Detection to USPS

The United States Postal Service, meanwhile, has made a similar decision when it comes to such tasks as detecting fraud and improving mail routing. USPS uses Silicon Graphics International hardware and Oracle TimesTen in-memory database software. The organization uses many instances of TimesTen, rather than a single image, to increase the parallelism of the data loads.

News: Oracle Rolls Out New In-memory Applications, Scaled-down Big Data Appliance

Moving to in-memory technology eliminates significant disk management software overhead, storage fabric latency and the limitation of disk spindle speeds, according to the Postal Service.

"The main business benefits of in-memory databases are in their ability to provide very quick, near real-time, answers while looking across vast amounts of data," says Dan Houston Jr., manager of data management services at USPS, and Scot Atkins, product information specialist at USPS, in an email.

In-memory helps the Postal Service flag fraud and quickly determines if a mail piece has correct postage. The spokesmen cite in-memory databases as one of technology that makes it possible for the postal service to take on such tasks as dynamic routing, same-day delivery and predictive routing.

"In-memory databases are allowing us to do things in real-time that would have taken hours or days in the past," Houston and Atkins say.

Interest Moving Beyond Finance, Telco Industries

In-memory technology isn't particularly new. Roger Gaskell, chief technology officer at Kognitio, an in-memory analytics platform vendor, says the first system was built in the late 1980s for the London brokerage firm Savory Milln. The broker wanted to be able to calculate trading exposure risk at virtually any point in time, Gaskell notes.

Before the in-memory system, the exposure calculations were done overnight. In-memory "was the only way we could get at the data fast enough to allow us to bring enough CPU power to bear to meet the use case criteria," Gaskell says.

What's new-nearly a quarter century later-is the sharp rise in interest now surrounding in-memory technology. "The last two years have been the biggest change for us. In-memory has become really hot and...in terms of applications, it has just exploded," Gaskell says.

Financial services and telecommunication firms had been Kognitio's bread and butter, but now in-memory demand is surfacing in markets such as retail, he notes.

Terracotta's Allen said he has seen interest in in-memory in financial services, logistics, ecommerce, government and healthcare, among other sectors. "That lightbulb is going off everywhere. People are saying, 'How do I leverage this'" he says.

As demand grows, the number of vendors offering in-memory technology has also increased. In May, for example, Teradata introduced its Intelligent Memory offering, which it says lets customers exploit memory through capabilities built into the company's data warehouses.

"There's no need for a separate appliance," said Alan Greenspan, a spokesman for Teradata. The technology tracks the temperate of data, he adds, moving hot, frequently used data into memory.

Processing, Indexing Touch With In-memory Technology

In-memory databases have the potential to produce dramatic results when organizations need to crunch a lot of data in short order. However, the field is not without a few wrinkles. Misconceptions regarding the nature of in-memory technology are among the issues.

Industry executives say an in-memory deployment calls for more than dumping data in memory. They say the data management software must be designed to work with memory.

"It's not just about putting all the data into memory," said Chris Hallenbeck, vice president of data warehouse solutions and the HANA platform at SAP. "It's about rewriting the entire database from the ground up to use memory as its primary method of storage, as opposed to disk." (The SAP HANA real-time platform includes an in-memory database.)

News: Startups Reportedly Flocking to SAP's HANA In-memory Database

Another issue: The speed of in-memory technology places heavier demands on processors. As a consequence, organizations must parallelize the code that will access the data and deploy load balancing across the cluster, Lindquist says. "Load balancing becomes a critical piece of your ability to take advantage of the in-memory database."

AdJuggler has created a pull-based load balancing system, using commodity hardware and in-house developed software. Each instance of AdJuggler's transaction processing engine will pull work from the load balancing component, complete the task and then go back for more work, Lindquist says. The system brings up more instances if additional capacity is needed.

Organizations with in-memory products must also take care when it comes to database indexes. Businesses using a traditional database can afford to devote a large amount of disk space for indexes. But in-memory databases call for greater precision.

"If you're using the in-memory store like a database-with searches-you have to index for performance," Lindquist says. "You have to be more precise with it, because RAM is more expensive and limited."

The volatile nature of RAM presents another issue for in-memory adopters. Should a system fail, the data must be reloaded. This can prove time-consuming.

At USPS, Houston and Atkins say data protection is one of the greatest challenges of using in-memory databases. USPS currently performs all its heavy processing in-memory, then feeds the relevant results back to a relational database. The Postal Service also maintains a checkpoint file of the transactions running in-memory, so some limited recovery may be performed should an outage occur.

"We have reasonable assurances that the most important data to us is protected," the officials say.

The task of recovering an in-memory system from the check point file, however, takes some doing.

"As you can imagine, reading back in 16TB can take considerable time from traditional storage media," Houston and Atkins note, referring to the size of the Postal Service's in-memory data store. "To address this issue, we are currently exploring adding flash card technology closer to processing in hopes of changing our reload time from hours to minutes."

John Moore has written on business and technology topics for more than 20 years. His areas of focus include mobile app development, health IT, cloud computing, government IT and distribution channels. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn.

Read more about big data in CIO's Big Data Drilldown.

(www.cio.com)

John Moore