Databricks unveils commercial support for Apache Spark 2.0
The company, founded out of the UC Berkeley AMPLab by the team that created Apache Spark, says this latest release builds on what the community has learned in the past two years. It marks the first major release of open source Spark since the Spark 1.6 release in 2015.
[ Related: Review: Databricks makes big data dreams come true ]
"Since the release of Spark 1.0, we've spent countless hours listening to members of the Spark community and Databricks users to learn from a mix of praises and complaints," Reynold Xin, Databricks' chief architect and co-founder, said in a statement Tuesday. "Spark 2.0 builds on what the community has learned, doubling down on what users love and improving on what users lament."
Spark, a top-level Apache project that has become an increasingly popular alternative compute engine to MapReduce for powering big data applications, leverages in-memory primitives to improve performance over MapReduce for certain applications. It is well-suited to machine learning algorithms and interactive analytics.
The company launched a preview release of Apache Spark 2.0 on Databricks two months ago, and says 10 percent of clusters on the platform are already using the latest release.
The company outlined some of the major new features:
"One of the things that's really exciting for me as a developer of Apache Spark is seeing how quickly users start to use new features and APIs we introduce, and in turn, offer almost instantaneous feeback, so that we can improve on them," Matei Zaharia, CTO and co-founder of Databricks and creator of Apache Spark, said in a statement Tuesday.
Spark 2.0 is immediately available to Databricks users. The company says users can create Spark 2.0 clusters by selecting the release from the Databricks menu. Additionally, Databricks says Spark 2.0 is compatible with Spark 1.6, meaning migrating code will require minimal effort.