Evaluating Snowflake Competitors - Which to Choose in 2024?

Snowflake is one of the leading cloud data warehouse platforms in 2024. It was one of the first to feature decoupled storage and compute architectures, offering nearly unlimited compute power that can scale both vertically and horizontally to meet query complexity and concurrency demands, irrespective of your storage requirements. Its virtual warehouses, isolated compute clusters required for running queries, can be 'T-shirt sized' (ranging from XS up to 6XL). The virtual warehouses can be auto-scaled, started, or stopped instantly. Snowflake is extremely easy to start with its intuitive interface and offers a broad set of features, such as Snowpark, Time Travel, Data Sharing, Zero Copy Cloning, Apache Iceberg support, or Dynamic Tables. Snowflake runs on all three major cloud infrastructure platforms: AWS, Microsoft Azure, and Google Cloud (GCP).

Snowflake is also one of the fastest-growing database engines according to db-engines.com:

Top 10 database ranking according to db-engine.

Image source: db-engines

The red line represents Snowflake, currently ranked 9th across all databases (not only analytics). You can read more about the ranking method here (tl;dr: mentions on websites, Google searches, job offerings, LinkedIn profiles, number of tweets).

The 8 most popular Snowflake competitors and alternatives in 2024 are:

Cloud Data Warehouses
Query Engines
- Trino / Starburst
- Amazon Athena

It is worth mentioning that several other Snowflake alternatives are available and are not covered in this article, such as Teradata, Firebolt, Rockset, SingleStore, Imply, Dremio, StarRocks, Pinot, or Oracle.

Cloud data warehouses

Google BigQuery

Image source: k21academy.com

Snowflake and Google BigQuery are two of the leading cloud data warehouses. The main difference is in terms of architecture, which also impacts pricing. Snowflake uses separate storage and compute, while BigQuery is serverless. This means that Snowflake charges for execution time, while BigQuery charges for data returned by queries.

One advantage of BigQuery is that users don’t need to worry about managing compute resources. BigQuery’s compute layer is based on a slot-based model. Users are allocated a certain number of slots and can run as many queries as they want as long as they have enough slots available. Slots are automatically scaled up or down as needed.

Both platforms offer a wide range of features, such as support for structured, semi-structured, and unstructured data, transactions, data sharing, and time travel.

Snowflake is available on all major platforms, while BigQuery is available only on Google Cloud. However, using BigQuery Omni, you can also run analytics on Amazon S3 or Azure Blob Storage.

💡 Decision: If you are already running on the Google Cloud Platform, BigQuery may be a good choice, as it integrates seamlessly with other GCP services. However, if your data is on Amazon S3, Azure Blob Storage, or exclusively on these platforms, Snowflake may be a better choice. Performance-wise, they are pretty similar and vary based on the use case.

Databricks

Image source: Datashift

Databricks and Snowflake have a long history together – starting as partners focused on different types of workloads: Snowflake on Data Warehousing and Databricks on enabling organizations to use Apache Spark for ML workloads. Since their beginnings 10 years ago, circa 2012-2013, they have started converging on their use cases, offering similar features. Snowflake added support for Snowpark to migrate Spark workloads and Iceberg, so you could query a Data Lake directly from Snowflake without ingesting the data. On the other hand, Databricks added Photon (query engine) and SQL support to expand its data warehousing capabilities. Both platforms offer a wide range of data ingestion, transformation, governance, and dashboarding capabilities that are native and integrated with third-party tools.

💡Decision: If you are migrating from an on-prem SQL-first data warehouse, Snowflake might be a better choice due to its low learning curve and intuitive user interface. Both offer excellent governance capabilities (Snowflake: Snowflake Horizon; Databricks: Unity Catalog). Currently, Databricks has a slight edge for ML/AI use cases, but Snowflake is quickly catching up with features like Snowpark or Cortex for GenAI. If you are already using Spark and are wondering about traditional data warehouse use cases, you might want to stick to Databricks, given the expansion and investment in the data warehouse space in the last few years with the Databricks SQL serverless built on top of the lakehouse architecture.

One thing to keep in mind is that Snowflake abstracts away many of the configuration knobs, like setting up indexes, with great under-the-hood query optimizations and reasonable defaults. On the other hand, if you are part of an engineering-driven organization that likes to fine-tune jobs and handle very large pipelines, Spark/Databricks offers more control and might suit your needs better. However, this usually also comes at a greater total cost of ownership when you factor in engineering costs as well.

Amazon Redshift

Image source: AWS

Amazon Redshift was the first cloud data warehouse. Unlike Snowflake, Redshift didn’t initially offer separate storage and compute. Its newer RA3 nodes now allow for some compute scaling and selective data caching, yet all compute tasks remain unified. Redshift also required more maintenance. While Redshift pioneered the movement to the cloud for the data warehouse in 2012 and enabled companies to launch data warehouse instances in minutes, a growing number of companies have migrated to Snowflake due to its elastic nature and ease of use.

Azure Synapse

Image source: Azure

Synapse is Azure’s data warehouse platform and is tightly integrated within the Microsoft ecosystem, similar to Redshift's integration within the Amazon ecosystem. Although it might seem like a natural fit if you are already a Microsoft and moving from SQL Server, Synapse is a more complicated platform compared to Snowflake's ease of use.

What do we mean by complexity? This comment sums it up perfectly:

Having worked with both Synapse and Snowflake, the following is the checklist you need to follow to receive optimal performance on both the data warehouses-

Synapse:

Decide the type of partitioning for the table.

Decide the type of indexing for the table.

Turn table stats on/off.

Create workload groups in workload management in Azure Portal for the Synapse DWH.

Assign resource class to each workload group.

Snowflake : Create a table.

One other popular option besides Snowflake on Azure is Databricks.

Clickhouse

Image source: Clickhouse

Clickhouse is open source and, as such, can be deployed anywhere. The Clickhouse Cloud (along with others, such as Tinybird) offers a managed version of Clickhouse, a SaaS product like Snowflake. Clickhouse Cloud also features a decoupled compute and storage architecture, writes to immutable files, and supports semi-structured data with core integrations such as Kafka or S3.

Clickhouse offers more flexibility in terms of configuration by allowing users to specify primary and secondary indexes, which can lead to better data pruning and lower query latency. Although Clickhouse Cloud makes it easier to leverage Clickhouse OSS, there is a steeper learning curve compared to the Snowflake (ANSI) SQL syntax.

Clickhouse is a great option for log and event data analysis, time series data, or real-time analytics.

💡 Decision: If you are willing to learn some of the SQL nuances introduced by Clickhouse to fine-tune your queries, such as choosing the compression codec or setting up indexes, choosing Clickhouse might lead to better results at scale – massive datasets. On the other hand, if you prioritize code portability, less maintenance, an easier ramp-up for your analysts with good defaults, and a richer ecosystem, Snowflake is the clear choice.

Motherduck

Image source: Motherduck

Motherduck turns DuckDB multiplayer. DuckDB is an in-process open-source analytical database that gives you blazingly fast results. It can run anywhere, in-process, with zero dependencies, and it’s extremely fast for analytical queries. If you haven’t tried DuckDB, you should definitely give it a go. With over 16k GitHub stars and 5k Discord members, DuckDB is one of the hottest upcoming data warehouse alternative in the space.

Motherduck built a serverless cloud-based platform around DuckDB. Through Hybrid Execution, Motherduck lets you combine data locally (DuckDB) with data in the cloud to take advantage of (powerful and unused) personal laptops to crunch data and be cost-efficient.

Query engines

Trino / Starburst

Image source: Starburst docs

Trino is a distributed SQL query engine, initially named PrestoSQL and developed at Facebook. Trino doesn’t have a storage layer like Snowflake, BigQuery, or other data warehouses. Instead, it is designed to query large data sets distributed over one or more data sources. Trino is also used at LinkedIn, Airbnb, Netflix, Salesforce, and others.

Advantages of Trino:

Simplicity: Trino can query data from multiple sources without moving data into a central repository.
Open source.
Runs on-prem and in cloud environments.
It is a distributed query engine built from the ground up for efficient, low-latency analytics.

The primary use cases for Trino are:

Interactive data analytics.
Querying object storage with SQL, such as HDFS or S3.
Query Federation – querying many disparate data sources, such as relational databases, object storage, or NoSQL systems in the same system, with SQL.
Speeding up ETL processes.

Starburst is the enterprise, managed version of Trino.

Amazon Athena

Image source: AWS

Amazon Athena, similar to Trino, is a query engine that can help you analyze data in Amazon S3 using SQL without the need to copy data. Athena is serverless, so there is no infrastructure to set up or manage. This makes Athena an excellent solution for running quick ad-hoc queries on S3. Athena supports several data formats, including CSV, JSON, ORC, Parquet, and more. However, unlike Trino, which can query various data sources and is open source, Athena can only query a subset of data sources (primarily hosted on AWS). Furthermore, Athena, an Amazon-managed service, cannot be self-hosted like Trino or customized.

⏩ The simplicity of using a query engine to analyze data without moving it comes at the expense of performance. However, both Trino and Athena perform well in benchmarks.

Conclusions

The data warehouse space is becoming more competitive, which leads to great innovation. What was once a clear choice between Snowflake vs Databricks has become less clear over time as both are pushing into each other's initial territories. Choosing the right data warehouse vendor for your analytics may sometimes require selecting multiple vendors and using each one's strengths for specific use cases.

If all your operations are on the Google Cloud Platform, sticking to BigQuery might make sense, given its seamless integration with the other GCP services. If you have already invested heavily in Spark and Databricks, it makes sense to continue that investment. If you don’t want to move data around and need an analytical solution to access data directly from the transactional databases or filesystems, consider Trino, or Amazon Athena. If you are dealing with massive datasets and want more control over compression or the option to add indexes, exploring Clickhouse might be worthwhile. Choose Motherduck if you want to benefit from DuckDB's raw power and have a production-ready serverless platform.

However, if you are looking for a solution that just works, is easy to learn, offers features like Time Travel, Zero Copy Cloning, and Dynamic Tables, is cloud-agnostic, and is actively expanding into the ML space through Snowpark, Snowflake remains the top choice in 2024.

Evaluating Snowflake Competitors - Which to Choose in 2024?

Cloud data warehouses

Google BigQuery

Databricks

Amazon Redshift

Azure Synapse

Clickhouse

Motherduck

Query engines

Trino / Starburst

Amazon Athena

Conclusions

Be the first to know