Databricks Open Sources Unity Catalog, Creating the Industry’s Only Universal Catalog for Data and AI
SAN FRANCISCO, June 12, 2024 /PRNewswire/ — Databricks, the Data and AI company, today announced that it is open sourcing Unity Catalog, the industry’s only unified solution for data and artificial intelligence (AI) governance across clouds, data formats and data platforms. This initiative builds on Databricks’ commitment to open ecosystems, ensuring customers have the flexibility and control they need without vendor lock-in. Databricks is ushering in a new era for open catalog standards for data and AI with support from Amazon Web Services (AWS), Google Cloud, Microsoft, NVIDIA, Salesforce, and more.
Unity Catalog OSS offers a universal interface that supports any data format and compute engine, including the ability to read tables with Delta Lake, Apache Iceberg™, and Apache Hudi™ clients via Delta Lake UniForm. It also supports the Iceberg REST Catalog and Hive Metastore (HMS) interface standards. Additionally, Unity Catalog OSS provides for unified governance across tabular, non-tabular data, and AI assets, such as machine learning (ML) models and generative AI tools, letting organizations simplify management at scale.
Unity Catalog: The Leading Data and AI Catalog
Databricks introduced Unity Catalog in 2021 to meet customer demand: organizations need an interoperable catalog for their data and AI workloads. Historically, organizations relied on multiple different single-purpose solutions, creating silos between platforms and between data and AI assets. These silos made it difficult to build modern data and AI applications, which combine tabular data in multiple table formats, unstructured data, ML models, vector indices, and AI tools. Customers created complex webs to manage metadata silos, copied data into different places or different formats to enable access by various engines, or maintained DIY solutions to sync metadata between catalogs. Ultimately, this led to increased costs and complexity, as well as weak governance and fragmented access control. Unity Catalog breaks down those silos for over 10,000 organizations.
“Our customers love Unity Catalog. It lets them manage all their data objects — tabular data, unstructured data, and AI and ML assets — in a single source of truth within the Databricks Data Intelligence Platform, versus gluing together multiple single-purpose solutions,” said Ali Ghodsi, Co-founder and CEO at Databricks. “Our platform is the only major data platform in the industry where all data is in an open format by default — now, metadata and governance are open as well, giving enterprises the governance solution they need in today’s data and AI landscape. We’re excited to open source Unity Catalog and release the code. We’ll continue to evolve the open standard in close collaboration with our partners.”
Unity Catalog OSS is the industry’s only universal catalog for data and AI. Key features include:
Interoperability: Unity Catalog OSS offers a universal interface that supports any data format and compute engine, including the ability to read tables with Delta Lake, Apache Iceberg™, and Apache Hudi™ clients via Delta Lake UniForm. It also supports the Iceberg REST Catalog and Hive Metastore (HMS) interface standards. Unity Catalog OSS is interoperable with all major cloud platforms, including Microsoft Azure, AWS, GCP, and Salesforce; compute engines like Apache Spark™, Presto, Trino, DuckDB, Daft, PuppyGraph, and StarRocks; and data and AI platforms including dbt Labs, Confluent, Eventual, Fivetran, Granica, Immuta, Informatica, LanceDB, LangChain, Tecton, and Unstructured. Unified governance: Unity Catalog OSS enables unified governance across tabular data, non-tabular data, and AI assets, such as ML models and generative AI tools, letting organizations simplify management, discovery and development at scale. Openness: With its open APIs and Apache 2.0 licensed open source server, Unity Catalog OSS maximizes flexibility and customer choice by enabling broad interoperability across various engines, tools, and platforms.
“AT&T is committed to making our data interoperable with our platforms. With the announcement of Unity Catalog’s open sourcing, we are encouraged by Databricks’ step to make lakehouse governance and metadata management possible through open standards. The flexibility to utilize interoperable tools with our data and AI assets, with consistent governance, is core to the AT&T data platform strategy,” said Matt Dugan, VP Data Platforms, AT&T.
“Nasdaq is proud to leverage Databricks’ Unity Catalog as part of our holistic data management strategy,” said Lenny Rosenfeld, Vice President, Capital Access Platforms, Nasdaq. “Databricks’ decision to open source Unity Catalog provides a solution that helps eliminate data silos and we look forward to further scaling our platform, enhancing our governance, and modernizing our data applications as we continue to deliver for our clients.”
“At Rivian, the adoption of the Databricks Data Intelligence Platform has given us the ability to use Data and AI in building our next-gen EAVs. We are excited about Databricks open sourcing Unity Catalog and releasing Open APIs to bring interoperability across our data landscape without any concerns of vendor lock-in. Combined with support for all our data assets — structured and unstructured data, ML models, and Gen AI tools — it was an easy decision to standardize on Unity Catalog,” said Jason Shiverick, Director of AI Platforms, Rivian.
Supporting Cloud Partner Quotes
“AWS welcomes Databricks’ move to open source Unity Catalog. AWS is committed to working with the industry on open source solutions that enable choice and interoperability for customers,” said Chris Grusz, Managing Director of Technology Partnerships, AWS.
“Google is committed to open, flexible solutions that empower customers to maximize the value of their data. Databricks’ strategy to open up the Unity Catalog standard for data and AI aligns very well with our strategy,” said Ritika Suri, Director, Data and AI Technology Partnerships, Google Cloud.
“Microsoft is committed to the open-source community and empowering customers with choice. Databricks has been a strategic partner for years and it’s great to see them open-sourcing Unity Catalog. We believe truly open standards with broad industry participation are in customers’ best interests. Our collaboration with Databricks continues to elevate Microsoft Azure as the best choice for data and AI workloads,” said Jessica Hawk, CVP, Data, AI, Digital Applications, Microsoft.
“Salesforce Data Cloud is built from the ground up on Open Standards with Apache Parquet and Apache Iceberg. Our zero copy innovations enable customers to unlock data, derive insights and orchestrate actions across the Customer 360. Databricks’ embrace of Apache Iceberg via UniForm and Unity Catalog addresses key interoperability challenges between Delta Lake and Iceberg. We are excited to have Databricks as a member of our Zero Copy Partner Network and look forward to joint innovations with the new open Unity Catalog, delivering compelling customer value in structured data, unstructured data and AI models,” said Ravi Loganathan, EVP, Salesforce.
Supporting Data and AI Partner Quotes
“Confluent’s mission is to set data in motion and enable organizations to take advantage of their data everywhere. We’re excited to see Databricks make a significant contribution to an open data ecosystem with Unity Catalog becoming open sourced. Tableflow on Confluent Cloud will enable easy delivery of real-time data to places like a data lake by turning data streams into Iceberg tables with a single click. By combining our industry-leading streaming capabilities with Databricks’ robust data management solutions, customers will be able to put their data to work more effectively than ever,” said Shaun Clowes, Chief Product Officer, Confluent.
“Together, Databricks and dbt Cloud help users break down data silos to collaborate effectively, simplify ETL to lower TCO with Delta Lake, and unify governance with Unity Catalog. We are thrilled to announce our support for Unity Catalog OSS and the open APIs. This partnership underscores our commitment to providing a unified data experience, empowering our community to achieve greater insights and drive innovation,” said Mark Porter, Chief Technology Officer at dbt Labs.
“Delta Kernel has greatly simplified building the DuckDB Delta Extension, enabling easy access to Delta Lake from DuckDB. We are thrilled to partner with Databricks on Delta Kernel and the Unity Catalog open standard for data and AI. This collaboration represents a significant step forward in open source innovation and the development of open data lakehouses,” said Hannes Mühleisen, CEO at DuckDB Labs.
“At Eventual, we have built Daft, the leading open source distributed query engine for multimodal data. We believe that unifying compute for tabular and unstructured data is not enough and that a multimodal catalog is crucial to build GenAI data lakehouses. We are excited to partner with Databricks and other AI innovators to develop the Unity Catalog open standard for modern data and AI workloads,” said Sammy Sidhu, CEO at Eventual.
“We are thrilled to see Databricks open source Unity Catalog as an open standard for data and AI. This move will provide our customers with greater choice and flexibility in their data ecosystem, ensuring seamless integration and maximizing interoperability with Fivetran’s platform as they ingest critical data to Databricks,” said Anjan Kundavaram, Chief Product Officer at FiveTran.
“At Granica, we champion data democratization and freedom from vendor lock-in. Our Safe Room technology ensures privacy, trust, and safety in generative AI workflows while supporting open standards like Unity Catalog and Apache Iceberg. Unity Catalog’s vendor-neutral architecture and robust governance solutions align with our vision of providing customers with flexibility and control over their data. We are excited to contribute to this open ecosystem, driving innovation and enabling customers to seamlessly work with their data across best-of-breed platforms,” said Rahul Ponnala, Co-founder and CEO at Granica.
“The exposure of native access patterns within Unity Catalog has transformed how our business is able to streamline access to data and apply governance rules at scale — with no performance impact. Databricks’ continued investment in a community to accelerate services to make data controls easier to build allows our customers to govern with greater ease and manage the massive volume of new data consumers being onboarded in the age of AI,” said Matthew Carroll, CEO at Immuta.
“We are excited to see the opportunity for our joint customers as Databricks open-sources Unity Catalog as an open standard for data and AI. With Unity Catalog OSS and the Informatica intelligent Data Management Cloud, customers can gain greater choice, flexibility and interoperability in their data ecosystems,” said Brett Roscoe, GM and SVP Cloud Data Governance and Cloud Operations at Informatica.
“Databricks’s decision to open source Unity Catalog is an exciting development for the data and AI community. We’re excited to partner with Databricks to integrate Unity Catalog with LangChain, which allows our shared users to build advanced agents using Unity Catalog functions as tools,” said Harrison Chase, CEO at Langchain.
“Enterprise data is essential to developing accurate generative AI applications. NVIDIA works closely with our partner ecosystem to support open-source offerings like Databricks Unity Catalog, which can help customers curate efficient and powerful development pipelines,” said Pat Lee, VP of Strategic Enterprise Partnerships at NVIDIA.
“Open sourcing Unity Catalog is a pivotal step towards a more collaborative and innovative data ecosystem. By making this technology accessible, Databricks is fostering an environment where the entire community can contribute to and benefit from enhanced data governance and management capabilities. This move aligns with our vision at Onehouse and Apache XTable (Incubating) to support open format interoperability that drives progress and innovation for all,” said Vinoth Chandar, Founder and CEO at OneHouse.
“Unstructured is the leading unstructured data ETL solution for LLMs – helping organizations transform their data from raw to RAG-ready. Our partnership with Unity Catalog OSS makes perfect sense, as we break down data silos and accelerate AI/ML development in enterprises. We are excited to partner with Databricks to develop this open standard for AI use cases and to standardize metadata for unstructured data – helping our customers operate at the cutting edge of AI,” said Brian Raymond, CEO at Unstructured.
With today’s announcement, Databricks continues to lead the way in data and AI governance, encouraging an ecosystem of interoperable tools, universal support for data and AI assets, and built-in security.
Availability
Unity Catalog OSS will be available at the Data + AI Summit.
About Databricks
Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Condé Nast, Rivian, Shell and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on LinkedIn, X and Facebook.
Contact: [email protected]