What does a database have to do with AI, anyway?After a truly new technology arrives, it makes the future a lot harder to predict. The one thing you can be sure of is that you’re probably not going to continue in the same straight line that you’ve been traveling. The truly impactful destinations are often just on the other side of a mountain that you can’t yet see the top of. This is also what makes technology so terrifying: once the mist clears you might find yourself in a totally new landscape without a map.

At MotherDuck, we’re excited about ways that AI can be used to help give people superpowers to understand their data. Someone with access to modern Google search would have looked like a wizard to people just a few decades ago; now we take it for granted that you can instantly settle any bet about how old is Morgan Freeman or when was the last time the Seattle Mariners won the World Series. Similarly, AI has the potential to divide the world into “things you did before AI” and “things you did afterwards.”

It was pretty clear to us that AI was already changing how people interact with their data when one of our early users mentioned they were spending a lot of their time cutting and pasting between ChatGPT and the MotherDuck query UI. That seems super inefficient, and since then we’ve been trying to figure out how to shorten feedback loops and make data practitioners better at their jobs. Any time you have to leave the query you’re writing to check documentation, it distracts you from all of the details you’re keeping track of in your head.

Two weeks ago, in order to help analysts stay focused on their SQL, we launched “FixIt,” a feature that can pinpoint which line in your query has an error and suggest a fix. While “FixIt” is pretty simple, it can be surprisingly helpful. Instead of having to look up syntax for things like window functions with trailing averages or timestamp differencing, I can just write the SQL I think should work; if I get the ordering of arguments wrong, misspell something, or use the wrong quote type, “FixIt” will automatically write it correctly.

This week we’re taking the next step; in conjunction with Numbers Station, we’re open sourcing a DuckDB specific text-to-SQL LLM. Our goal here is to give back to the DuckDB community and help seed interesting DuckDB applications. For the moment, we’ve chosen to trade off some expressivity for faster and less expensive inference by using a small-ish model size. If this turns out to be an interesting area we will follow up more.

We hope that you’ll come along with us as we continue to explore the ways that AI can make it easier to solve problems with data.

About DuckDB-NSQLWe currently provide text-to-SQL functionality within MotherDuck, using OpenAI’s most powerful models, that are doing exceptionally well on text-to-SQL benchmarks and have been proven useful in practice. We do, however, see a need for more lightweight models that enable DucKDB SQL assistance features at lower latency. Upon reviewing existing open models for text-to-SQL, we came to the realization that existing models and benchmarks primarily focus on analytical queries / SELECT statements.

Beyond fast analytical querying using regular SQL, a significant part of DuckDB’s appeal lies in its friendly SQL syntax, support for nested types, varied data import options, and its diverse ecosystem of extensions. Among others, extensions for querying Postgres, SQLite, and Iceberg tables, and support for JSON and GeoSpatial types.

We believe that text-to-SQL in the context of DuckDB is particularly useful if the model can help users leverage the full power of DuckDB, without having to go forth-and-back between the DuckDB documentation and the SQL shell. We’ve all been there!

With DuckDB-NSQL-7B, we’re now releasing a text-to-SQL model that is aware of all documented features in DuckDB 0.9.2, including official extensions! Think
Read More