Hacker Newsnew | past | comments | ask | show | jobs | submit | kalendos's commentslogin

You can use your ChatGPT subscription with Pi!


Oh wow! No way! Thank you!


Totally! I've been using this pattern a lot and recently wrote about it.

https://davidgasquez.com/community-level-open-data-infrastru...


Thank you for this excellent post! I've been developing [my own platform](https://github.com/MattTriano/analytics_data_where_house) that curates a data warehouse mostly of census and socrata datasets but I haven't really had a good way to share the products with anyone as it's a bit too heavyweight. I've been trying to find alternate solutions to that issue (I'm currently building out a much smaller [platform](https://github.com/MattTriano/fbi_cde_data) to process the FBI's NIBRS datasets), and your post has given me a few great implementations to study and experiment with.

Thanks!


Nice! I know a couple of projects that have been using this pattern.

- https://bsky.app/profile/jakthom.bsky.social/post/3lbarcvzrc...

- https://bsky.app/profile/jakthom.bsky.social/post/3lb4y65z24...

- https://skyfirehose.com

Love this distribution pattern. Users can go to the Parquet files or attach to your "curated views" on a small DuckDB database file.


Perhaps the most interesting thing I've done is to teach Cursor Agent some tricks like using `uvx` / `npx`, or searching the web with `ddgs`.

Wrote about it recently (https://davidgasquez.com/cursor-agent-tricks/) although is now outdated! You can find the latest setup on my Dotfiles repository (https://github.com/davidgasquez/dotfiles/tree/main/cursor).


You might need to adjust filters to do an apple to apple comparison.

https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...


Not clear why someone need to give up on native duckdb format if it is much faster.


Because it means you need to keep another copy of your data in a special format just for DuckDb. The point of Parquet is that it’s an open format queryable by multiple tools. You don’t need to wait to load every table into a new format, you don’t need to retain multiple copies, and you don’t need to keep them in sync.

If DuckDb is the only query engine in your analytics stack, then it makes sense to use its specialized format. But that’s not the typical Lakehouse use case.


> But that’s not the typical Lakehouse use case.

that benchmark is also not typical lakehouse use case, since all data is hosted locally, so they don't test significant component of the stack.


Yeah, that’s one of many issues with Clickbench. It’s also one table so it can’t test joins.

TPC-H is okay but not Lakehouse specific. I’m not aware of any benchmarks that specifically test performance of engines under common setups like external storage or scalable compute. It would be hard to design one that’s easily reproducible. (And in fairness to Clickbench, it’s intentionally simple for that exact reason - to generate a baseline score for any query engine that can query tabular data).


I can only imagine. Many ETLs are already messy in companies with better tooling and processes.

Would love to read more about your experience with Open Data. Any place where I can reach out?


Here's something about shotspotter data in Chicago: https://x.com/foiachap/status/1775296597850480663

And this one makes some rounds: https://mchap.io/that-time-the-city-of-seattle-accidentally-...

Feel free to reach out!


I've been working with this stack (building Open Data Portals¹) for a few months and am super happy with how well everything plays together.

¹ https://github.com/davidgasquez/gitcoin-grants-data-portal


Hi, i am working on something similar and was looking for ways how i can host my open data. the approach seems interesting, can i reach out to you to discuss more on this somewhere?


Sure! You can find my contact details on GitHub: https://github.com/davidgasquez.


I keep a list of public company handbooks here: https://publish.obsidian.md/davidgasquez/Company+Handbooks.


David, curiosity what is the link about? ( besides handbooks).

Is is just a collection of random stuff? Apologies, I did not dig around sufficiently.

p.s: I found this interesting: https://publish.obsidian.md/davidgasquez/Rationality . Somebody tried to explain what 'rationality' is, without tossing around the word loosely as most people do.


> Is is just a collection of random stuff?

Indeed. Is mostly a way for me to keep track of interesting things and have a quick way to re-learn something.

> I found this interesting: https://publish.obsidian.md/davidgasquez/Rationality . Somebody tried to explain what 'rationality' is, without tossing around the word loosely as most people do.

That note should be named "Thinking" to be honest. The goal is to collect a bunch of bullet points that I can read at any given time to remind me how to think better or common gotchas.


Not oc but this appears to be a reference to rationality as practiced by the "rationalist movement". You can read more about it at these places:

https://www.lesswrong.com/tag/rationalist-movement https://www.lesswrong.com/posts/46qnWRSR7L2eyNbMA/the-lens-t...

I like the definition "rationalism is the belief that Eliezer Yudkowsky is the rightful caliph."


Great link. Highly recommend adding Material UI's handbook - it's exceptionally well put together. https://mui-org.notion.site/Handbook-f086d47e10794d5e839aef9...


This whole site has a lot of fascinating links generally. Thanks for sharing this


It really does. I may have to start doing something like this myself, it's got a different feel than a blog.


David, going into the rabbit hole of this today. And already discovered some cool nuggets. Thanks for sharing.


Wow thanks for sharing! What a well-organized note.


Thank you so much!


They are called multiple ways. I like to call mine a Personal Handbook[1] but I've also seen this being referred as Digital Garden or Personal Knowledge Base.

I keep a list of open knowledge bases here: https://publish.obsidian.md/davidgasquez/Personal+Handbooks

[1] https://publish.obsidian.md/davidgasquez/


This is mine: https://publish.obsidian.md/davidgasquez

Recently moved to Obsidian.md and couldn't be happier!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: