ksbeking's comments

ksbeking · on June 4, 2024

TL;DR: Sqlfluff rewritten in Rust, about 10x speed improvement and portable

At[Quary](https://www.quary.dev/), we're big fans of [SQLFluff](https://sqlfluff.com/)! It's the most comprehensive formatter/linter about! It outputs great-looking code and has great checks for writing high-quality SQL.

That said, it can often be slow, and in some CI pipelines, we've seen it be the slowest step. To help us and our customers, we decided to rewrite it in Rust to get faster performance and portability so that we could run it anywhere.

Sqruff currently supports the following dialects: ANSI, BigQuery, and Postgres, and we are working on the next Snowflake and Clickhouse.

In terms of performance, we tend to see about 10x speed improvement for a single file when run in the sqruff repo:

    time sqruff lint crates/lib/test/fixtures/dialects/ansi/drop_index_if_exists.sql
    0.01s user 0.01s system 42% cpu 0.041 total
            
    time sqlfluff lint crates/lib/test/fixtures/dialects/ansi/drop_index_if_exists.sql
    0.23s user 0.06s system 74% cpu 0.398 total

And for a whole list of files, we see about 9x improvement depending on what you measure:

    time sqruff lint crates/lib/test/fixtures/dialects/ansi    
    4.23s user 1.53s system 735% cpu 0.784 total
        
    time sqlfluff lint crates/lib/test/fixtures/dialects/ansi
    5.44s user 0.43s system 93% cpu 6.312 total

Both tests were run on an M1 Mac.

ksbeking · on May 16, 2024

Appreciate the feedback! We'll keep this in mind.

There is nothing to host/provision, so it's simple in that sense. You just run it locally with your credentials and connect directly to your database.

It is definitely not the easiest to set up especially when thinking as a team so we'll keep that in mind.

ksbeking · on May 15, 2024

Superset is a beautiful tool focused on self-serve with amazing visualizations. I won't take anything away from them!

Our thesis is that self-serve is much less important than people think, and we find people often make a mess of never-ending dashboards. Current BI tools struggle to prevent that. We solve this problem with a core of software engineering practices.

code_biologist · on May 15, 2024

If you're targeting use within software and engineering teams, that thesis may be right. If you're targeting adoption across whole businesses, I think the thesis is pretty wrong and will end up hampering adoption. To broadly bucket BI challenges, there's first the challenge of getting people to use the thing, then the challenges that come when everyone is using the thing. Tech types seem to underrate the challenge of getting people to even use a BI tool in the first place.

I've found self serve to be a really effective tool in getting engagement with BI. My onboarding for new non-tech BI users was always to have them build a basic dashboard for the business process they were most focused on. Maybe set an alert or create a scheduled report delivery. By the end of a 15 or 30 minute onboarding session you'd see the click as they realized what they could do with it.

That mess of never ending dashboards has another name: BI engagement. Though a product can help, having core dashboards and KPIs is a social and analytics leadership problem and not a technical one.

Though I have issues with Looker (their dev experience is crappy), their approach to this is effective: make it difficult for self-serve users to get incorrect or nonsense answers, and make it easy for analytics admins to designate core dashboards and jockey a few hundred custom dashboards and reports as the underlying data models change. Every business unit got pretty attached to what they'd built for themselves.

louisjoejordan · on May 15, 2024

You're spot on that BI adoption is largely a social challenge. Our thesis is that by defining the entire journey from source to viz as code, we create a structured foundation that LLMs can build upon, democratizing access to the transformation layer for non-engineers in a way that point-and-click BI tools can't.

ringobingo · on May 15, 2024

Can you please elaborate on how you see LLMs could build upon this model/journey?

beardedwizard · on May 16, 2024

Llms would generate the code/definitions underlying these dashboards, presumably a model could be trained for the task. I'll argue it trades one version of the sprawl problem for another. Unless this generated code is easy to debugs and comprehends other generated code, it will still be a spaghetti mess at scale.

ksbeking · on May 15, 2024

While that's somewhat true, our CLI can push transformations back to your warehouse. We and some of our customer use Quary for their "data warehouse purposes" also. We think the integrated flow makes the E2E experience very quick.

ksbeking · on May 15, 2024

Ben here from Quary.

We love Grafana! It's fab for building dashboards, but it's focused on dashboarding/alerts and on pulling from various data sources, not just SQL.

Quary is purely focused on SQL, and crucially, it allows you to build up and develop more complex transformations.

ksbeking · on May 15, 2024

thanks!

ksbeking · on May 15, 2024

Hey, Ben here from Quary; very valid comments like the one below copied meant we rethought our strategy it a little. We want to be open source but think we need a little protection.

"Hate to derail the conversation, but is Quary something I could easily whitelabel to embed BI into my product for my customers? (Passively) looking for solutions in that that don't feel dumbed down."

jsiepkes · on May 15, 2024

You mean protection as in protection from intellectual property (patent) lawsuits?

ksbeking · on May 15, 2024

Yep, I meant protection in terms of intellectual property.