Warehouse Public Preview Launch
Links
Intro - an ode to the past
Data Warehouses suck.
If you’re a developer, you’ve probably heard the names - Databricks, Snowflake, BigQuery. If you’re new here, the data warehouse allows you to do three things.
- Load (maybe big?) data
- Transform data into something humans (or agents) can understand using SQL
- Monetize your now useful data on a marketplace
Warehouses are among the largest data businesses in public / private markets and yet most consumers of these products complain about the same things.
- Costs - “my Snowflake bill is going to be $300,000 this quarter, and I barely scaled new jobs”
- Lock-In - “I wish I could take my data into / out of platform more easily, but Snowflake makes this SO hard”
- Autonomy - “I just got locked out of my account and have no recourse”
- Discovery / Access - “The datasets I care about are not on the marketplace - I have to manually ingest data from my vendor and pipe it into the warehouse”
We built Chakra to address these customer pain points and while the job is certainly not finished, we’re happy to share our first version of our product with the public.
We hope to build the first data warehouse that doesn’t suck.
Costs
If you don’t believe me on costs - go check out Snowflake’s Gartner Report and Ctrl+F “expensive”
You’ll find a laundry list of startups complaining about how expensive the product is.
But why is it so expensive?
- The infrastructure itself is expensive and there’s no way to pick the cheapest option dynamically.
- Snowflake charges 76% product gross margins on top of the infrastructure cost.
- Snowflake assumes everyone has “big” data.
On (1) and (2), we’ve built Chakra to be flexible.
Shopping for storage and compute should be like booking flights. Compare all options in one place and book your favorite option.
We’ve got your centralized options covered and if you’re interested in charting into the land of community (dare I say decentralized), we’ve got even cheaper options for you.
(3) is the most interesting property. Snowflake and Databricks are powerful tools at extreme scale because they employ “massively parallel processing” or MPP. Ultimately, this means your query gets cut up into chunks and then fed to several different machines instead of one.
Is this good? For huge datasets in the order of terabyes (TBs) and petabytes (PBs) - yes!
But at what cost?
The truth is most data is not big. I mean no offense.
Our friends at Motherduck did an excellent study and found that enterprise data size follows a power law: the vast majority of customers are operating at <1TB of aggregate data.
Chakra is built to serve the enterprise and serve it without the exponential scaling costs of MPP.
We’re built on DuckDB from the ground up - the fastest growing data engine primitive. DuckDB runs all its queries on a single beefy box and as a consequence is materially computationally cheaperthan running MPP.
How much cheaper? Chakra is 75% cheaper than competitive products.
We’re confident that for 97% of customers, they will find our performance more than sufficient and in many cases better than the incumbents.
Here’s uploading 8M rows into the Chakra DB in ~8 seconds.
It is that easy to go from a pandas DataFrame to a table in our warehouse.
Here’s selecting rows from the 8M row table with <50ms latency. It is so fast, most people don’t believe us.
Obviously, these benchmarks are primitive, but we’re continuing to invest in performance and will share more sophisticated results when we’re ready.
Lock-In / Autonomy
Not your keys, not your data.
One of the biggest concerns when CTO’s make decisions on data platforms is what is my “vendor lock-in.”
Annual commitments and proprietary storage formats serve to entrench the cost problem and while this has served Snowflake well in terms of stickiness, it comes at a cost: your customers lose trust.
With Chakra, you have full custodial keys to your storage environments without any of the maintenance complexity.
We provision object storage for you, but if you ever want access to pull all of your data, you have the keys to do so.
Discovery / Access
I’ve been harsh to the incumbents - they are great (albeit expensive) products. And one of the best part about them really is the marketplace. Snowflake’s latest earning report shows that 36% of customers use the data marketplace to consume or share data.
That has grown precipitously over the last few years - developers love having access to all the data they need in one place.
Except - that’s not always the case. Provisioning a Snowflake account takes time, effort, and often speaking to a Snowflake rep. Setting up a datashare requires domain expertise and many engineering hours.
The time to value is too high. Here’s how fast it is to get a data share live and running on Chakra.
In addition to just decreasing time to value, what we’re excited to experiment with in Chakra is what an incentivized data marketplace could look like. A place where developers need not choose between top-of-funnel and revenue, but can grow with our platform.
Okay, but what does this mean for AI?
Next time.
For now, try out our product - it’s free-to-use. Give us feedback in Discord, and yell at us if something doesn’t match your expectations or you want new features.
Chakra Labs is here to serve you.