More

iknownothow · 2025-09-09T11:01:26 1757415686

S3 has much bigger fish in its sight than the measely vector db space. If you see the subtle improvements in features of S3 in recent years, it is clear as day, at least to me, that they're going after the whale that is Databricks. And they're doing it the best way possible - slowly and silently eating away at their moat.

AWS Athena hasn't received as much love for some reason. In the next two years I expect major updates and/or improvements. They should kill off Redshift.

antonvs · 2025-09-09T11:52:47 1757418767

> … going after the whale that is Databricks.

Databricks is tiny compared to AWS, maybe 1/50th the revenue. But they’re both chasing a big and fast-growing market. I don’t think it’s so much that AWS is going after Databricks as that Databricks happens to be in a market that AWS is interested in.

iknownothow · 2025-09-09T13:40:11 1757425211

I agree, Databricks is one of many in the space. If S3 makes Databricks redundant, then they also make others like Databricks redundant too.

iknownothow · 2025-07-04T20:05:19 1751659519

I just did a wget of the site and noticed the following line at the end.

> <script async src="https://www.googletagmanager.com/gtag/js?xxxxxxx"></script>

I am going to use this for sure, but it is a little ironic.

iknownothow · 2025-06-21T18:47:26 1750531646

I'd say that's where we're headed. A big model that's trained from the start to use tools and know when to use certain tools and how to use tools. Like us :)

I wouldn't be surprised if someone's building a dataset for tool use examples.

The newer gen reasoning models are especially good at knowing when to do web search. I imagine they'll slowly get better at other tools.

At current levels of performance, LLMs having the ability to get well curated information by themselves would increase their scores by a lot.

iknownothow · 2025-06-21T18:40:23 1750531223

To be fair, I'd put finding literal string diffs in the category of asking LLMs to do rote arithmetic.

The attention mechanism does far too much complex thinking for such a dumb task. This is precisely where you need to dumb down and focus and be disciplined rather than do high level next token prediction.

You'd benefit from actually asking the LLM to list the full document and compare, kind of like reasoning, and similar to how LLMs perform better when they break down arithmetic or algebra tasks into smaller steps.

Also my guess would be that the models that perform well are MoE models where there may be an Expert or two that does well on tasks that needs focus rather than intuition. So without knowing anything about Gemini Flash, my guess would be that it's an MoE model.

iknownothow · 2025-06-21T18:24:19 1750530259

As far as I can tell, the paper covers text documents only. Therefore your example doesn't quite apply.

It is well known that LLMs have a ways to go when it comes to processing images like they process text or audio.

I don't think there's any good performing multimodal model that accepts image pixels directly. Most vision capabilities are hacks or engineered in. An image undergoes several processing steps and each processor's outputs are fed to the transformer as tokens. This may happen in one network but there's non-transformer networks involved. Examples of preprocessing:

* OCR * CNNs (2D pattern recognizers) with different zooms, angles, slices etc * Others maybe too?

iknownothow · 2025-06-07T20:25:19 1749327919

> Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states

Does this mean the models can be smaller too (on top of the primary benefit of being faster)?

Lerc · 2025-06-07T21:10:20 1749330620

Reduced memory consumption for context perhaps, but hidden state is different from weights. I don't think this would improve the model's capability per model parameter (but as with everything with ML, I wouldn't bet against it until it's been tested)

iknownothow · 2025-05-30T20:34:16 1748637256

Don't take this the wrong way, your opinion is also vibes.

Let's ground that a bit.

Have a look at ARC AGI 1 challenge/benchmark. Solve a problem or two yourself. Know that ARC AGI 1 is practically solved by a few LLMs as of Q1 2025.

Then have a look at the ARC AGI 2 challenge. Solve a problem or two yourself. Note that as of today, it is unsolved by LLMs.

Then observe that the "difficulty" of ARC AGI 1 and 2 for a human are relatively the same but challenge 2 is much harder for LLMs than 1.

ARC AGI 2 is going to be solved *within* 12 months (my bet is on 6 months). If it's not, I'll never post about AI on HN again.

There's only one problem to solve, i.e. "how to make LLMs truly see like humans do". Right now, any vision based features that the models exhibit comes from maximizing the use of engineering (i.e. applying CNNs on image slices, chunks, maybe zooming and applying ocr, vector search etc), it isn't vision like ours and isn't a native feature for these models.

Once that's solved, then LLMs or new Algo will be able to use a computer perfectly by feeding it screen capture. End of white collar jobs 2-5 years after (as we know it).

Edit - added "(as we know it)". And fixed missing word.

codr7 · 2025-05-31T05:30:20 1748669420

Speaking of vibes.

As long as AI is guessing answers based on what it has seen before, it's not happening.

I'm sorry. It doesn't matter how many bazillions you would cash in if it did, still not happening.

It's all wishful thinking.

mnky9800n · 2025-05-31T15:09:31 1748704171

I thought to myself, imagine something you’ve never imagined before. My first thought was what if there is a universe inside of every vegetable that is vegetable themed with anthropomorphic vegetable characters and all the atoms and molecules are some how veggified and everything is a vegetable. And then I wondered if an AI could ever come up with that with infinite time and resources without a prompt and then I thought about monkeys and typewriters.

artificialprint · 2025-05-30T21:39:07 1748641147

If you listen interview with Francois it'll be clear to you that "vision" in the way you refer it, has very little do to with solving ARC.

And more to do with "fluid, adaptable intelligence, that learns on the fly"

iknownothow · 2025-05-31T09:21:50 1748683310

That's fair. I care about the end result.

The problem is about taking information in 2D/3D space and solving the problem. Humans solve these things through vision. LLMs or AI can do it using another algorithm and internal representation that's way better.

I spent a long time thinking about how to solve the ARC AGI 2 puzzles "if I were an LLM" and I just couldn't think of a non-hacky way.

People who're blind use braille or touch to extract 2D/3D information. I don't know how blind people represent 2D/3D info once it's in their brain.

jplusequalt · 2025-06-02T02:21:55 1748830915

>AI can do it using another algorithm and internal representation that's way better

AI famously needs a boat load of energy and computation to work. How would you describe that as "way better" than a human brain that will be able to solve them faster, with practically zero energy expenditure?

jplusequalt · 2025-05-30T22:49:18 1748645358

>I'll never post about AI on HN again

Saving this. One less overconfident AI zealot, the better.

iknownothow · 2025-04-13T18:10:30 1744567830

I've read the link and the GitHub readme page.

I'm sure I'm in the top 1% of software devs for the most number of timestamps parsed. [1]

DST is not a problem in Python. It's parsing string timestamps. All libraries are bad, including this one, except Pandas. Pandas does great at DST too btw.

And I'm not shilling for Pandas either. I'm a Polars user who helicopters Pandas in whenever there's a timestamp that needs to be parsed.

Pandas has great defaults. Here's string timestamps I expect to be paesed by default. I'm willing to pass timezone in case of naive timestamps:

* All ISO 8601 formats and all its weird mutant children that differ by a tiny bit.

* 2025-05-01 (parsed not as date, but as timestamp)

* 2025-05-01 00:00:00 (or 00.0 or 00.000 or 0.000000 etc)

* 2025-05-01 00:00:00z (or uppercase Z or 00.0z or 00.000z or 0.000000z)

* 2025-05-01 00:00:00+02:00 (I don't need this converted to some time zone. Store offset if you must or convert to UTC. It should be comparable to other non naive timestamps).

* 2025-03-30 02:30:00+02:00 (This is a non existent timestamp wrt European DST but a legitimate timestamp in timestamp representation, therefore it should be allowed unless I specify CET or Europe/Berlin whatever)

* There's other timestamps formats that are non standard but are obvious. Allow for a Boolean parameter called accept_sensible_string_parsing and then parse the following:

  \* 2025-05-01 00:00 (HH:mm format)

  \* 2025-05-01 00:00+01:00 (HH:mm format)

[1] It's not a real statistic, it's just that I work with a lot of time series and customer data.

Disclaimer: I'm on the phone and on the couch so I wasn't able to test the lib for its string parsing before posting this comment.

ariebovenberg · 2025-04-13T18:44:40 1744569880

Author here. It's indeed a hard problem to parse "All ISO 8601 formats and all its weird mutant children that differ by a tiny bit." Since the ISO standard is so expansive, every library needs to decide for itself what to support. The ISO standard allows all sorts of weird things, like 2-digit years, fractional months, disallowing -00:00 offset, ordinal days, etc.

Javascript's big datetime redesign (Temporal) has an interesting overview of the decisions they made [1]. Whenever is currently undergoing an expansion of ISO support as well, if you'd like to chime in [2].

[1] https://tc39.es/proposal-temporal/#sec-temporal-iso8601gramm... [2] https://github.com/ariebovenberg/whenever/issues/204#issueco...

iknownothow · 2025-04-13T22:39:15 1744583955

Thanks for the reply and apologies for the general cynicism. It's not lost on me that it's people like you that build tools that make the work tick. I'm just a loud potential customer and I'm just forwarding the frustration that I have with my own customers onto you :)

Your customers are software devs like me. When we're in control of generating timestamps, we know we must use standard ISO formatting.

However, what do I do when my customers give me access to an S3 bucket with 1 billion timestamps in an arbitrary (yet decipherable) format?

In the GitHub issue you seem to have undergone an evolution from purity to pragmatism. I support this 100%.

What I've also noticed is that you seem to try to find grounding or motivation for "where to draw the line" from what's already been done in Temporal or Python stdlib etc. This is where I'd like to challenge your intuitions and ask you instead to open the flood gates and accept any format that is theoretically sensible under ISO format.

Why? The damage has already been done. Any format you can think of, already exists out there. You just haven't realized it yet.

You know who has accepted this? Pandas devs (I assume, I don't them). The following are legitimate timestamps under Pandas (22.2.x):

* 2025-03-30T (nope, not a typo)

* 2025-03-30T01 (HH)

* 2025-03-30 01 (same as above)

* 2025-03-30 01 (two or more spaces is also acceptable)

In my opinion Pandas doesn't go far enough. Here's an example from real customer data I've seen in the past that Pandas doesn't parse.

* 2025-03-30+00:00 (this is very sensible in my opinion. Unless there's a deeper theoretical regex pattern conflicts with other parts of the ISO format)

Here's an example that isn't decipherable under a flexible ISO interpretation and shouldn't be supported.

* 2025-30-03 (theoretically you can infer that 30 is a day, and 03 is month. BUT you shouldn't accept this. Pandas used to allow such things. I believe they no longer do)

I understand writing these flexible regexes or if-else statements will hurt your benchmarks and will be painful to maintain. Maybe release them under an new call like `parse_best_effort` (or even `youre_welcome`) and document pitfalls and performance degradation. Trust me, I'd rather use a reliable generic but slow parser than spend hours writing a write a god awful regex that I will only use once (I've spent literal weeks writing regexes and fixes in the last decade).

Pandas has been around since 2012 dealing with customer data. They have seen it all and you can learn a lot from them. ISOs and RFCs when it comes to timestamps don't mean squat. If possible try to make Whenever useful rather than fast or pure. I'd rather use a slimmer faster alternative to pandas for parsing Timestamps if one is available but there aren't any at the moment.

If time permits I'll try to compile a non exhaustive list of real world timestamp formats and post in the issue.

Thank you for your work!

P.S. seeing BurntSushi in the GitHub issue gives me imposter syndrome :)

burntsushi · 2025-04-13T22:52:17 1744584737

Because you pinged me... Jiff also generally follows in Temporal's footsteps here. Your broader point of supporting things beyond the specs (ISO 8601, RFC 3339, RFC 9557, RFC 2822 and so on) has already been absorbed into the Temporal ISO 8601 extensions. And that's what Jiff supports (and presumably, whenever, although I don't know enough about whenever to be absolutely precise in what it supports). So I think the philosophical point has already been conceded by the Temporal project itself. What's left, it seems, is a measure of degree. How far do you go in supporting oddball formats?

I honestly do not know the answer to that question myself. But I wouldn't necessarily look to Pandas as the shining beacon on a hill here. Not because Pandas is doing anything wrong per se, but because it's a totally different domain and use case. On the one hand, you have a general purpose library that needs to consider all of its users for all general purpose datetime use cases. On the other hand, you have a data scienc-y library designed for trying to slurp up and make sense of messy data at scale. There may be things that make sense in the latter that don't in the former.

In particular, a major gap in your reasoning, from what I can see, is that constraints beget better error reporting. I don't know how to precisely weigh error reporting versus flexible parsing, but there ought to be some deliberation there. The more flexible your format, the harder it is to give good error messages when you get invalid data.

Moreover, "flexible parsing" doesn't actually have to be in the datetime library. The task of flexible parsing is not, in and of itself, overtly challenging. It's a tedious task that can be build on top of the foundation of a good datetime library. I grant that this is a bit of a cop-out, but it's part of the calculus when designing ecosystem libraries like this.

Speaking for me personally (in the context of Jiff), something I wouldn't mind so much is adding a dedicated "flexible" parsing mode that one can opt into. But I don't think I'd want to make it the default.

iknownothow · 2025-02-10T09:19:41 1739179181

Is there a terminology battle happening in some circles? And if so, what are the consequences of being wrong and using the wrong terminology?

I follow the rnd and progress in this space and I haven't heard anyone make a fuss about it. They are all LLMs or transformers or neural nets but they can be trained or optimized to do different things. For sure, there's terms like Reasoning models or Chat models or Instruct models and yes they're all LLMs.

But you can now start combining them to have hybrid models too. Are Omni models that handle audio and visual data still "language" models? This question is interesting in its own right for many reasons, but not to justify or bemoan the use of term LLM.

LLM is a good term, it's a cultural term too. If you start getting pedantic, you'll miss the bigger picture and possibly even the singularity ;)

bluejay2387 · 2025-02-10T14:49:05 1739198945

So there is a language war going on in the industry and some of its justified and some of its not. Take 'agents' as an example. I have seen an example of where a low code / no code service dropped in a LLM node in a 10+ year old product, started calling themselves an 'agent platform' and jacked up their price by a large margin. This is probably a case where a debate as to what qualifies as an 'agent' is appropriate.

Alternatively I have seen debates as to what counts as a 'Small Language Model' that probably are nonsensical. Particularly because in my personal language war the term 'small language model' shouldn't even exist (no one knows that the threshold is, and our 'small' language models are bigger than the 'large' language models from just a few years ago).

This is fairly typical of new technology. Marketing departments will constantly come up with new terms or try to take over existing terms to push agendas. Terms with defined meaning will get abused by casual participants and loose all real meaning. Individuals new to the field will latch on to popular misuses of terms as they try to figure out what everyone is talking about and perpetuate definition creep. Old hands will overly focus on hair splitting exercises that no one else really cares about and sigh in dismay as their carefully cultured taxonomies collapse under expansion of interest in their field.

It will all work itself out in 10 years or so.

BoiledCabbage · 2025-02-10T16:27:45 1739204865

There is a reason why cars and computers are sold with specs. 0-60 time, fuel efficiency...

People need to know the performance they can expect from LLMs or agents. What are they capable of?

graypegg · 2025-02-10T18:16:03 1739211363

A 2009 honda civic can get an under-5 seconds 0-60 easily... however it does involve high a cliff.

Result Specs (as in measuring output/experimental results) need strict definitions to be useful and I think the current ones with have for LLMs are pretty weak. (mostly benchmarks that model one kind of interaction, and usually not any sort of useful interaction)

janalsncm · 2025-02-10T23:42:46 1739230966

Seems like a textbook example of https://en.m.wikipedia.org/wiki/No_true_Scotsman

absolutelastone · 2025-02-10T19:10:00 1739214600

Well i don't see why we need to mangle the jargon. "Language model" has an old meaning from NLP (which still applies), as a computer model of language itself. Most commonly, a joint probability distribution over words or sequences of words, which is what LLMs are too. Prompted replies are literally conditional probability densities conditioned on the context you give it. "Foundation model" is a more general term I see a lot.

To say a model is "just a LLM" is to presumably complain that it has no added bells or whistles that someone thinks is required beyond the above statistical model. And maybe I missed the point, but the author seems to be saying "yes it's just a LLM, but LLMs are all you need".

iknownothow · on July 9, 2024

How does it compare with s5cmd [1]? s5cmd is my goto tool for fast s3 sync and they have the following at the top of their Github page:

> For uploads, s5cmd is 32x faster than s3cmd and 12x faster than aws-cli. For downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s), whereas s3cmd and aws-cli can only reach 85 MB/s and 375 MB/s respectively.

[1] https://github.com/peak/s5cmd

Starofall · on July 9, 2024

I have not yet compared it against this tool, but the given numbers are for download and not for syncing files.

For a S3->S3 sync using an C6gn.8xlarge instance, I got up to 800MB/sec using 64 workers, but the files were only in average around 50MB. And the bigger the file the higher the MB/sec.

Also from my short look into it, s5cmd does not support syncing between S3 providers (S3->CloudFlare).