How big data is changing the database landscape for good
A whole crop of new contenders are now vying for a piece of this key enterprise market, and while their approaches are diverse, most share one thing in common: a razor-sharp focus on big data.
Much of what's driving this new proliferation of alternatives is what's commonly referred to as the "three V's" underlying big data: volume, velocity and variety.
Essentially, data today is coming at us faster and in greater volumes than ever before; it's also more diverse. It's a new data world, in other words, and traditional relational database management systems weren't really designed for it.
"Basically, they cannot scale to big, or fast, or diverse data," said Gregory Piatetsky-Shapiro, president of KDnuggets, an analytics and data-science consultancy.
That's what Harte Hanks recently found. Up until 2013 or so, the marketing services agency was using a combination of different databases including Microsoft SQL Server and Oracle Real Application Clusters (RAC).
"We were noticing that with the growth of data over time, our systems couldn't process the information fast enough," said Sean Iannuzzi, the company's head of technology and development. "If you keep buying servers, you can only keep going so far. We wanted to make sure we had a platform that could scale outward."
Minimizing disruption was a key goal, Iannuzzi said, so "we couldn't just switch to Hadoop."
Instead, it chose Splice Machine, which essentially puts a full SQL database on top of the popular Hadoop big-data platform and allows existing applications to connect with it, he said.
Harte Hanks is now in the early stages of implementation, but it's already seeing benefits, Iannuzzi said, including improved fault tolerance, high availability, redundancy, stability and "performance gains overall."
There's a sort of perfect storm propelling the emergence of new database technologies, said Carl Olofson, a research vice president with IDC.
First, "the equipment we're using is much more capable of handling large data collections quickly and flexibly than in the past," Olofson noted.
In the old days, such collections "pretty much had to be put on spinning disk" and the data had to be structured in a particular way, he explained.
Now there's 64-bit addressability, making it possible to set up larger memory spaces, as well as much faster networks and the ability to string multiple computers together to act as single, large databases.
"Those things have opened up possibilities that weren't available before," Olofson said.
Workloads, meanwhile, have also changed. Whereas 10 years ago websites were largely static, for example, today we have live Web service environments and interactive shopping experiences. That, in turn, demands new levels of scalability, he said.
Companies are using data in new ways as well. Whereas traditionally most of our focus was on processing transactions -- recording how much we sold, for instance, and storing that data in place where it could be analyzed -- today we're doing more.
Application state management is one example.
Say you're playing an online game. The technology must record each session you have with the system and connect them together to present a continuous experience, even if you switch devices or the various moves you make are processed by different servers, Olofson explained.
That data must be made persistent so that companies can analyze questions such as "why no one ever crosses the crystal room," for example. In an online shopping context, a counterpart might be why more people aren't buying a particular brand of shoe after they click on the color choices.
"Before, we weren't trying to solve those problems, or -- if we were -- we were trying to squeeze them into a box that didn't quite fit," Olofson said.
Hadoop is a heavyweight among today's new contenders. Though it's not a database per se, it's grown to fill a key role for companies tackling big data. Essentially, Hadoop is a data-centric platform for running highly parallelized applications, and it's very scalable.
By allowing companies to scale "out" in distributed fashion rather than scaling "up" via additional expensive servers, "it makes it possible to very cheaply put together a large data collection and then see what you've got," Olofson said.
Among other new RDBMS alternatives are the NoSQL family of offerings, including MongoDB -- currently the fourth most popular database management system, according to DB-Engines -- and MarkLogic.
"Relational has been a great technology for 30 years, but it was built in a different era with different technological constraints and different market needs," said Joe Pasqua, MarkLogic's executive vice president for products.
Big data is not homogeneous, he said, yet in many traditional technologies, that's still a fundamental requirement.
"Imagine the only program you had on your laptop was Excel," Pasqua said. "Imagine you want to keep track of network of friends -- or you're writing a contract. Those don't fit into rows and columns."
Combining data sets can be particularly tricky.
"Relational says that before you bring all these data sets together, you have to decide how you're going to line up all the columns," he added. "We can take in any format or structure and start using it immediately."
NoSQL databases don't use a relational data model, and they typically have no SQL interface. Whereas many NoSQL stores compromise consistency in favor of speed and other factors, MarkLogic pitches its own offering as a more consistency-minded option tailored for enterprises.
There's considerable growth in store for the NoSQL market, according to Market Research Media, but not everyone thinks it's the right approach -- at least, not in all cases.
NoSQL systems "solved many problems with their scale-out architecture, but they threw out SQL," said Monte Zweben, Splice Machine's CEO. That, in turn, poses a problem for existing code.
Splice Machine is an example of a different class of alternatives known as NewSQL -- another category expecting strong growth in the years ahead.
"Our philosophy is to keep the SQL but add the scale-out architecture," Zweben said. "It's time for something new, but we're trying to make it so people don't have to rewrite their stuff."
Deep Information Sciences has also chosen to stick with SQL, but it takes yet another approach.
The company's DeepSQL database uses the same application programming interface (API) and relational model as MySQL, meaning that no application changes are required in order to use it. But it addresses data in a different way, using machine learning.
DeepSQL can automatically adapt for physical, virtual or cloud hosts using any workload combination, the company says, thereby eliminating the need for manual database optimization.
Among the results are greatly increased performance as well as the ability to scale "into the hundreds of billions of rows," said Chad Jones, the company's chief strategy officer.
An altogether different approach comes from Algebraix Data, which says it has developed the first truly mathematical foundation for data.
Whereas computer hardware is modeled mathematically before it's built, that's not the case with software, said Algebraix CEO Charles Silver.
"Software, and especially data, has never been built on a mathematical foundation," he said. "Software has largely been a matter of linguistics."
Following five years of R&D, Algebraix has created what it calls an "algebra of data" that taps mathematical set theory for "a universal language of data," Silver said.
"The dirty little secret of big data is that data still sits in little silos that don't mesh with other data," Silver explained. "We've proven it can all be represented mathematically, so it all integrates."
Equipped with a platform built on that foundation, Algebraix now offers companies business analytics as a service. Improved performance, capacity and speed are all among the benefits Algebraix promises.
Time will tell which new contenders succeed and which do not, but in the meantime, longtime leaders such as Oracle aren't exactly standing still.
"Software is a very fashion-conscious industry," said Andrew Mendelsohn, executive vice president for Oracle Database Server Technologies. "Things often go from popular to unpopular and back to popular again."
Many of today's startups are "bringing back the same old stuff with a little polish or spin on it," he said. "It's a new generation of kids coming out of school and reinventing things."
SQL is "the only language that lets business analysts ask questions and get answers -- they don't have to be programmers," Mendelsohn said. "The big market will always be relational."
As for new types of data, relational database products evolved to support unstructured data back in the 1990s, he said. In 2013, Oracle's namesake database added support for JSON (JavaScript Object Notation) in version 12c.
Rather than a need for a different kind of database, it's more a shift in business model that's driving change in the industry, Mendelsohn said.
"The cloud is where everybody is going, and it's going to disrupt these little guys," he said. "The big guys are all on the cloud already, so where is there room for these little guys
"Are they going to go on Amazon's cloud and compete with Amazon" he added. "That's going to be hard."
Oracle has "the broadest spectrum of cloud services," Mendelsohn said. "We're feeling good about where we're positioned today."
Rick Greenwald, a research director with Gartner, is inclined to take a similar view.
"The newer alternatives are not as fully functional and robust as traditional RDBMSes," Greenwald said. "Some use cases can be addressed with the new contenders, but not all, and not with one technology."
Looking ahead, Greenwald expects traditional RDBMS vendors to feel increasing price pressure, and to add new functionality to their products. "Some will freely bring new contenders into their overall ecosystem of data management," he said.
As for the new guys, a few will survive, he predicted, but "many will either be acquired or run out of funding."
Today's new technologies don't represent the end of traditional RDBMSes, "which are rapidly evolving themselves," agreed IDC's Olofson. "The RDBMS is needed for well-defined data -- there's always going to be a role for that."
But there will also be a role for some of the newer contenders, he said, particularly as the Internet of Things and emerging technologies such as Non-Volatile Dual In-line Memory Module (NVDIMM) take hold.
There will be numerous problems requiring numerous solutions, Olofson added. "There's plenty of interesting stuff to go around."