More

pgroves · 2025-11-18T17:38:00 1763487480

I was hoping Bash would go away or get replaced at some point. It's starting to look like it's going to be another 20 years of Bash but with AI doodads.

__MatrixMan__ · 2025-11-18T17:50:14 1763488214

Nushell scratches the itch for me 95% of the time. I haven't yet convinced anybody else to make the switch, but I'm trying. Haven't yet fixed the most problematic bug for my useage, but I'm trying.

What are you doing to help kill bash?

pgroves · on July 27, 2024

How fast is it?

pgroves · on June 7, 2022

I've never had that great of a memory. The upside is that you can have a bad memory and good note taking skills and be more effective than the 'good memory' people. Really it's just that I forget in a day what other people forget in a week so it's not that big of a gap. But some considerations:

1. Put everything in the issue tracker that you can. This includes notes on what actually happened when you did the work. Include technical details.

2. Try to push everyone else to use the issue tracker. Also makes you sound like the professional in the room.

3. Have a very lightweight note taking mechanism and use it as much as possible. I am gud at vim so I use the Voom plugin (which just treats markdown headings as an outline but it's enough to store a ton of notes in a single .md file). Don't try to make these notes good enough to share as that adds too much overhead.

4. Always take your own notes in a meeting.

5. I will revisit my notes on a project from time to time, and sometimes walk through all of them, but I'm not really treating them like flashcards to memorize. I'm just looking for things that might need some renewed attention. Same with the backlog.

6. In general, I don't try to improve my memory because I don't know what I need to know for a week vs. what I won't look at again for a year. So I focus on being systematic about having good-enough notes on everything and don't really expect to remember anything. (I do remember some things but it's random.)

webel0 · on June 8, 2022

> Have a very lightweight note taking mechanism and use it as much as possible... Don't try to make these notes good enough to share as that adds too much overhead.

Second this. I use sublime text almost exclusively for this purpose. I have one file called daily_notes.md that has everything from meeting notes to formal writing to pasted error messages and code.

Each day gets an h1 but that is the extent of formal organization. I’m actually decently organized (at work, at least) but the simplicity is all about lowering the overhead of jotting stuff down. Keeping everything in one doc makes for very easy search.

Otherwise, I try to write reminders right away with whatever is handy. Mainly: Post-its, slack reminders, and Gmail scheduled sends to myself.

specialist · on June 8, 2022

Yes and: My life mgmt project notebook also has a habit tracker section, for all the life maintenance stuff.

Inspired by Seinfeld's "don't break the chain" calendar, but a lot more information dense. It's a big grid, tasks and day of month.

I make a hash mark for every completed task. The boxes are big enough for multiple hashes (eg walking dog 2x daily) and entering values (eg body weight).

pgroves · on March 31, 2022

And the implication is the 'quality' of engineers at the companies is actually reversed - the top performers at Dropbox are struggling and leaving while the under performers at FANG are struggling and leaving.

pgroves · on Jan 5, 2022

Another nuisance is that unencrypted port 80 must be open to the outside world to do the acme negotiation (LE servers must be able to talk to your acme client running at the subdomain that wants a cert). They also intentionally don't publish a list of IPs that LetsEncrypt might be coming from [1]. So opening firewall ports on machines that are specifically internal hosts has to be a part of any renewal scripts that run every X days. Kinda sucks IMO.

[1]https://letsencrypt.org/docs/faq/#what-ip-addresses-does-let...

UPDATE: Apparently there is a DNS based solution that I wasn't aware of.

nybble41 · on Jan 5, 2022

As these are internal hostnames, you're probably doing a DNS-01 challenge rather than HTTP-01. With DNS-01 you don't need to open up any ports for incoming HTTP connections; you just need to place a TXT record in the DNS for the domain.

wielebny · on Jan 5, 2022

That's not true. You can validate domains using dns-01, without exposing hosts.

detaro · on Jan 5, 2022

and even with HTTP challenge you don't have to expose the host directly, but e.g. can copy the challenge response to a public webserver from the internal host or from a coordinator server.

rad_gruchalski · on Jan 5, 2022

But it is possible to have initial certificates without opening anything: https://gruchalski.com/posts/2021-06-04-letsencrypt-certific...

From there, it’s possible to use HTTPS negotiation.

pgroves · on Jan 5, 2022

This looks kind of interesting. I might try this. Thanks.

duskwuff · on Jan 5, 2022

Only true if you're using HTTP validation. Use DNS validation instead and this isn't an issue.

pgroves · on Jan 5, 2022

Fair enough. Although that seems rather complicated for those of us just trying to get a quick cert for an internal host. The LetsEncrypt forums are full of this discussion:

[1] https://community.letsencrypt.org/t/whitelisting-le-ip-addre... [2] https://community.letsencrypt.org/t/whitelist-hostnames-for-... [3]https://community.letsencrypt.org/t/letsencrypt-ip-addresses...

pgroves · on Dec 1, 2021

There are lots of simple things that are normally easier to do in the web framework that are suddenly easier to do in the database (with the side effect that you can do DB optimizations much easier).

But the other consideration is that you likely need to do a lot with a reverse-proxy like traefik to have much control of what you are really exposing to the outside world. PostgREST is not Spring, it doesn't have explicit control over every little thing so you're likely to need something in front of it. Anyway, point is that having a simple Flask server with a few endpoints running wouldn't complicate the architecture very much b/c you are better off with something in front of it doing routing already (and ssl termination, etc).

pgroves · on Dec 1, 2021

I'm on a POC project that's using PostgREST and it's been extremely fast to get a big complicated data model working with an API in front of it. But I guess I don't get how to really use this thing in reality? What does devops look like? Do you have sophisticated db migrations with every deploy? Is all the SQL in version control?

I also don't really get where the users get created in postgres that have all the row-level permissions. The docs are all about auth for users that are already in there.

michelpp · on Dec 1, 2021

This is my personal experience with using PostgREST (I haven't had the full supabase experience yet):

> What does devops look like?

I usually spin PostgREST workers up in some kind of managed container service, like Google Compute Engine. PostgREST is stateless, so other than upgrades, you never really need to cycle the services. As for resources PostgREST is extremely lean, I usually try to run 4 to 8 workers per gigabyte of RAM.

> Do you have sophisticated db migrations with every deploy?

You can use whatever migration tool your want. Sqitch is quite popular. I've even worked on projects that were migrated by Django but PostgREST did the API service.

> Is all the SQL in version control?

Yes this is a good approach, but it means needing a migration tool to apply the migrations in the right order, this is what Sqitch does and many ORMy libraries have migration sort of half-baked in.

It's worth noting that because many of the objects that PostgREST deals with are views, which have no persistent state, the migration of the views can be decoupled from the migration of the persistent objects like tables. Replacing a view (with CREATE OR REPLACE VIEW) can be done very quickly without locking tables as long as you don't change the view's schema.

kiwicopple · on Dec 1, 2021

In Supabase we use a separate Auth server [0]. This stores the user in an `auth` schema, and these users can login to receive a JWT. Inside the JWT is a "role", which is, in fact, a PostgreSQL role ("authenticated") that has certain grants associated to it, and the user ID (a UUID).

Inside your RLS Policies you can use anything stored inside the JWT. My cofounder made a video [1] on this which is quite concise. Our way of handling this is just an extension of the PostgREST Auth recommendations: https://postgrest.org/en/v9.0/auth.html

[0] Auth server: https://github.com/supabase/gotrue

[1] RLS Video: https://supabase.com/docs/learn/auth-deep-dive/auth-row-leve...

pgroves · on Sept 6, 2021

That's what I want... this would force me to make a different account for every topic I might comment/post on, and they can have their own local networks. If it's a topic that I know a lot about (eg what I do at my day job), it would force a fresh start every few years.

This is in contrast to my twitter account, which is such a mess that I don't like posting b/c "most" people who will see it followed me for some other topic.

pgroves · on Sept 1, 2021

Ok but how do I know I should trust Cure53?

Psychotherapist · on Sept 1, 2021

cure53 has a pretty solid track record [1] and has some smart people working for them [2]. I get your concern, but considering this, I'd trust them.

[1]: https://cure53.de/#publications

[2]: https://cure53.de/#team

no_time · on Sept 2, 2021

Slight tangent but that's a cool looking website. Gives off a mid 00's private tracker vibe. Loads pretty fast too.

lucb1e · on Sept 1, 2021

Read the report and see if the findings are super basic or super advanced. If you can't tell, then this audit report is not of value to you, similar to how my mom has no use for open source software yet I would still say it's valuable to have open source software in the world.

rsj_hn · on Sept 1, 2021

Because they have an excellent reputation and do good work.

genewitch · on Sept 1, 2021

It's turtles all the way down.

pgroves · on Aug 29, 2021

How to make png encoding much faster? I'm working with large medical images and after a bit of work we can do all the needed processing in under a second (numpy/scipy methods). But then the encoding to png is taking 9-15secs. As a result we have to pre-render all possible configurations and put them on S3 b/c we can't do the processing on demand in a web request.

Is there a way to use multiple threads or GPU to encode pngs? I haven't been able to find anything. The images are 3500x3500px and compress from roughly 50mb to 15mb with maximum compression (so don't say to use lower compression).

gred · on Aug 29, 2021

I've spent some time on this problem -- classic space vs. time tradeoff. Usually if you're spending a lot of time on PNG encoding, you're spending it compressing the image content. PNG compression uses the DEFLATE format, and many software stacks leverage zlib here. It sounds like you're not simply looking to adjust the compression level (space vs. time balance), so we'll skip that.

Now zlib specifically is focused on correctness and stability, to the point of ignoring some fairly obvious opportunities to improve performance. This has led to frustration, and this frustration has led performance-focused zlib forks. The guys at AWS published a performance-focused survey [1] of the zlib fork landscape fairly recently. If your stack uses zlib, you may be able to find a way to swap in a different (faster) fork. If your stack does not use zlib, you may at least be able to find a few ideas for next steps.

[1] https://aws.amazon.com/blogs/opensource/improving-zlib-cloud...

lightcatcher · on Aug 29, 2021

I have no experience in PNG encoding, but found https://github.com/brion/mtpng The author mentions "It takes about 1.25s to save a 7680×2160 desktop screenshot PNG on this machine; 0.75s on my faster laptop." which makes me think your slower performance on smaller images either comes using the max compression setting or using hardware with worse single threaded performance.

Although these don't directly solve the PNG encoding performance problem, maybe some of these ideas could help?

* if users will be using the app in an environment with plenty of bandwidth and you don't mind paying for server bandwidth, could you serve up PNGs with less compression? Max compression takes 15s and saves 35MB's. If the users have 50mbit internet, then it only takes 5.6s to transmit the extra 35MB, so you could come out 10s ahead by not compressing. (yes, I see your comment about "don't say to use lower compression", but no reason to be killed by compression CPU cost if the bandwidth is available).

* initially show the user a lossy image (could be a downsized png) that can be quickly generated. You could then upgrade to a full quality once you finish encoding the PNG, or if server bandwidth/CPU usage is an issue then you could only upgrade if the user clicks a "high-quality" button or something. If server CPU usage is an issue, the low then high quality approach could let you turn down the compression setting and save some CPU at the cost of bandwidth and user latency.

minhmeoke · on Aug 29, 2021

Are you required to use PNG or could you save the files in an alternative lossless format like TIFF [1]? If you're stuck with PNG, mtpng [2] mentioned earlier seems to be significantly faster with multithreading (>40% reduction in encoding times). If you're publishing for web, TIFF or cwebp might also be possibilities with -mt (multithreading) and -q 25 (lower compression and larger filesize but faster) flags, or an experimental GPU implementation [3].

[1] https://blender.stackexchange.com/questions/148231/what-imag...

[2] https://github.com/brion/mtpng

[3] https://emmaliu.info/15418-Final-Project/

Const-me · on Aug 29, 2021

GPGPU is the way to go.

Not terribly hard if you only need 1-2 formats supported, e.g. RGBA8 only. You don't need to port the complete codec, only some initial portion of the pipeline and stream data back from GPUs, the last steps with lossless compression of the stream ain't a good fit for GPUs.

If you want the code to run on a web server, after you'll debug the encoder your next problem is where to deploy. NVidia teslas are frickin expensive. If you wanna run on public clouds, I'd consider their VMs with AMD GPUs.

pgroves · on Aug 29, 2021

Thanks, I hadn't heard of that and I will look into it. This is a research setting with plenty of hardware we can request and not a huge number of users so that part doesn't worry me.

Const-me · on Aug 29, 2021

> This is a research setting with plenty of hardware we can request and not a huge number of users

If you don’t care about cost of ownership, use CUDA. It only runs on nVidia GPUs, but the API is nice. I like it better than vendor-agnostic equivalents like DirectCompute, OpenCL, or Vulkan Compute.

physicles · on Aug 30, 2021

I solved a similar problem last year. As others have said, your bottleneck is the compression scheme that PNG uses. Turning down the level of compression will help. If you can build a custom intermediate format, you'll see huge gains.

Here's what that custom format might look like.

(I'm guessing these images are gray scale, so the "raw" format is uint16 or uint32)

First, take the raw data and delta encode it. This is similar to PNG's concept of "filters" -- little processors that massage the data a bit to make it more compressible. Then, since most of the compression algorithms operate on unsigned ints, you'll need to apply zigzag encoding (this is superior to allowing integer underflow, as benchmarks will show).

Then, take a look at some of the dedicated integer compression algorithms. Examples: FastPFor (or TurboPFor), BP32, snappy, simple8b, and good ol' run length encoding. These are blazing fast compared to gzip.

In my use case, I didn't care how slow compression was, so I wrote an adaptive compressor that would try all compression profiles and select the smallest one.

Of course, benchmark everything.

bjornlouser · on Aug 29, 2021

> Is there a way to use multiple threads or GPU

Maybe you could write the png without compression, compress chunks of the image in parallel using 7z, then reconstitute and decompress on the client side.

pgroves · on Aug 29, 2021

This is on our list of possibilities. It would take a little more time than I'd like to spend on this problem but it would work.

primitivesuave · on Aug 29, 2021

I would also be interested in knowing the answer to this. Currently we use OpenSeadragon to generate a map tiling of whole slide images (~4 GB per image), then stitch together and crop tiles of a particular zoom layer to produce PNGs of the desired resolution.

yboris · on Aug 29, 2021

I'm unsure if this will help, but the new image format JPEG XL (.jxl) is coming soon to replace JPEG. It will have a lossless and a lossy abilities. It claims to be faster than JPEG.

Another neat feature is that it's designed to be progressive, so you could host a single 10mb original file, and the client can download just the first 1mb (up to the quality they are comfortable with).

Take a look: https://jpegxl.info/

pgroves · on Aug 29, 2021

This is a research university that moves very slow, so waiting two years for something better is actually a possibility (and prerendering to S3 works ok for now). I'll keep this bookmarked.

dehrmann · on Aug 29, 2021

Since this is Python, which encoder are you using? I'd make sure it's in C, not Python. You might also be spending a lot of time converting numpy arrays to Python arrays.

pella · on Aug 29, 2021

also check the FPGA cards (ask the Xilinx; Altera/Intel, ...)