As attendees at both events, we wanted to summarize some of the key emerging trends we saw come up repeatedly across these in-person gatherings.
Before we dive in, let’s take a brief look at the two companies. Both Databricks and Snowflake offer data lakehouses – combining the features of data warehouses and data lakes to help companies grapple with efficiently storing, processing, and harnessing their massive influx of data. Databricks started with processing and analyzing the big data side of the lakehouse. Conversely, Snowflake began by modernizing the approach to data storage and warehousing: providing a large-scale, high-performance SQL-based query platform for reporting and analysis. Now, they are both converging toward the hybrid data lakehouse model, providing a unified large-scale data ecosystem where analytics, AI and machine learning can be trained, developed, and deployed.1 We will compare these platforms in more detail in an upcoming article.
At their respective summits, there was one clear and unified message: the importance of democratizing data, AI, and machine learning to build fit-for-future organizations. Spotlighted below are a selection of themes that particularly stood out for our attendees under this umbrella.
Generative AI for All
Generative AI is a type of artificial intelligence that can create new content, like images, text, or music. It does this by learning from existing data and then using that data to generate new things that are similar to its corpus of training data. And, with the advent of neural network architecture known as transformers2, we can now more easily than ever tap into unstructured datasets, enabling us to uncover more insights than we could have ever imagined.
Both conferences focused heavily on the potential of generative AI for enterprises. At the Databricks Summit, speakers drew a helpful analogy; the first computer was developed over a hundred years ago, but it wasn’t until a PC was in everyone’s home in the 80s that it truly permeated society. Similarly, while the internet was invented in the early 70s, it was only when the web browser allowed for widespread public access in the 90s that it became commonplace across the globe. Now, in 2023, we are beginning to democratize the neural networks that were first developed in 2012 through foundation models (reusable AI models that can be applied to just about any domain or industry task)3 like ChatGPT and Google’s Bard. The Snowflake Summit also held many thought-provoking discussions on what the future holds for Gen AI and how it will impact organizations, including unleashing creativity and analyzing unstructured data.
With the advent of generative AI, people can plug and play large language models (LLM) without needing large computing power, representing a significant new opportunity for enterprises. For example, while ChatGPT was trained on internet data until 2021, now it can be trained on individual platforms based on unique delineations and documents rather than a generic model; in the past it took a whole team of data scientists and engineers to build these models, but now these pre-trained models can be used by the public – adding immense business value. For example, enterprises can train a LLM on its process diagrams, org charts and customer profiles along with traditional structured data to enable a model that has a deep understanding of the business’ operations.
Both summits focused on this ability to process customer data. Snowflake announced their new partnership with NVIDIA, which will allow data science tools to be run on their customers’ data – in the past, the data had to be sent out to a service to be processed or computed. Similarly, Databricks revealed its recent acquisition of MosaicML, also allowing customers to deploy LLMs on proprietary datasets. By bringing processing power to the data available in their platforms, customers are empowered to train their own models, imbued with their own data, wisdom, and creativity, rather than have this capability centralized in a few generic models.
The Rise of Data Marketplaces
The concept of monetizing data has become increasingly popular in the last decade, and many industry leaders are prioritizing their data as an organizational asset for revenue generation. However, there are challenges for sellers to ensure that their valuable datasets are being utilized by the right people, and for buyers who want to ensure the quality of their purchased data. A data marketplace is an online transactional location or store that facilitates the buying and selling of data to solve these challenges.4
Snowflake Marketplace connects customers to over 430 providers, offering more than 1,800 live, ready-to-use data, services, and Snowflake Native Apps.5 With this, it enables customers to discover, evaluate and purchase data, data services and applications from some of the world’s leading data and solution providers; meaning more development, more ways to leverage your own data as a service, or to buy from others.
Databricks Marketplace is an open marketplace for exchanging data products such as datasets, notebooks, and dashboards. Databricks announced last month that it has also updated its Marketplace to allow enterprises to share ML/AI models while monetizing them. This move to allow AI model sharing on the marketplace echoes what Snowflake is doing in its own marketplace.6
As regular attendees at these annual conferences, it was interesting for us to see the shift towards “data observability”. At the Snowflake Summit five years ago, exhibitors were mainly showcasing extract, transform and load (ETL) platforms and the easy integration of data from legacy platforms to Snowflake. Now that data is already in the platform, the majority of partners are working inside the customer’s Snowflake environment – including data observability companies. Data observability is a term that covers an umbrella of activities and technologies that allow businesses to identify and resolve data issues in real time. For example, data observability company Monte Carlo announced that it had achieved elite tier partner status. We also saw this trend at Databricks, where there were a range of breakout sessions dedicated to driving data observability, such as “simplifying Lakehouse Observability: Databricks Key Design Goals and Strategies”
It’s clear that applied observability will help businesses in the future to democratize their data, enabling everyone to make an informed choice based on accurate and timely information.
In the coming weeks, we will do a deep dive comparison article on the Snowflake and Databricks platforms to help organizations choose the right solution for their business needs. In the meantime, for further discussion on key takeaways from either event, please get in touch!