Big data gets runtime specification
"This is the culmination of this whole year's work," says John Mertic, senior manager of ODPi.
The nonprofit ODPi formed last year in an effort to reduce the amount of complexity surrounding the Hadoop and big data environment. The idea was to provide a big data kernel in the form of a tested reference core of Apache Hadoop, Apache Ambari and related Apache source artifacts.
The kernel, called ODPi Core, would be used to simplify upstream and downstream qualification efforts — a "test once, use everywhere" core platform that could eliminate the growing fragmentation in the space. Applications and tools built on the reference platform should integrate with and run on any compliant system.
In September of last year, ODPi officially became a collaborative project of the Linux Foundation.
Mertic explains ODPi is an effort to bring together constituents from all the various "party lines" with a stake in the big data ecosystem.
"What we really wanted to do was to make sure we could have the community well represented," he says. "The biggest feedback that we got was that each distro does things slightly differently; they name their files differently; their APIs behave differently."
The new runtime specification descends from Apache Hadoop 2.7 and features HDFS, YARN and MapReduce components. Mertic says the test framework and self-certification aligns closely with the Apache Software Foundation by leveraging Apache Bigtop for comprehensive packaging, testing and configuration. More than half the code in the latest Bigtop release originated in ODPi. The ODPi Runtime-Compliance tests are linked directly to lines in the ODPi Runtime Specification. To assist with compliance, ODPi has also provided a reference build.
The organization says the published specification includes rules and guidelines on how to incorporate additional, non-breaking features, which are allowed provided source code is made available through relevant Apache community processes.
"It was a little over a year ago that ODPi was formed, and we have already proved beneficial to upstream ASF projects (Hadoop, Bigtop, Ambari)," says Roman Shaposhnik, director of Open Source at Pivotal, and an Apache Hadoop and Bigtop committer and ASF member. "This is why the first release of the ODPi Runtime Specification and test suite is so exciting. It is a big step toward realizing our goal of accelerating the delivery of business outcomes through big data solutions by driving interoperability on an enterprise-ready core platform."
"Big data is the key to enterprises welcoming the cognitive era and there's a need across the board for advancements in the Hadoop ecosystem to ensure companies can get the most out of their deployments in the most efficient ways possible," Rob Thomas, vice president of product development, IBM Analytics, added in a statement Monday. "With the ODPi Runtime Specification, developers can write their application once and run it across a variety of distributions — ensuring more efficient applications that can generate the insights necessary for business change."
With the Runtime Specification out the door, Mertic says the next focus will be the ODPi Operations Specification to help enterprises improve installation and management of Hadoop and Hadoop-based applications. It covers Apache Ambari, which is used for provisioning, managing and monitoring Hadoop clusters. Mertic expects the Operations Specification will be ready this summer.
The ODPi is also getting ready to decide what it will focus on after that. Mertic explains that each ODPi member, regardless of size or investment, has exactly one vote. Some possibilities include work around Spark, Kafka, HBase and Hive.