More

jfim · 2026-06-08T23:06:59 1780960019

A pile of various tools:

A self hosted web archiving tool with support for extendible processing pipelines (eg. extract article -> translate -> summarize -> generate tags, download video -> split audio track -> transcribe -> summarize), which led me to make a managed chromium browser with extensions and warc support for archiving, and a RSS feed synthesizer (take random article listing page that doesn't have RSS and generate a feed for it) so that I can plug it into my archiver. An active learning loop for a model to clean up articles by removing junk like native ads and sponsored blocks.

A tabbed terminal with project management features like launching the database, app server, and claude code in different tabs with one click, and split browser/terminal panes (eg. opening a browser automatically at the correct URL when the terminal reads http://localhost:4000/).

A modular MCP server with a MCP proxy and OAuth2 dcr so that I can easily add new random ideas for MCP servers in a few minutes with Claude and deploy them such that it's available to Claude by refreshing the tool list.

A small tool to render Claude conversations so that I can link to them from my obsidian vault with something like convo://claude-code/-home-jfim-projects-foo/<guide>

And overall just deploying docker containers for my self hosted setup

Most of it is on GitHub, in various states of readiness.

flutas · 2026-06-08T23:12:44 1780960364

Several of these sound interesting to me, gonna check them out tonight!

jfim · 2026-06-08T23:37:29 1780961849

Cool! Just a heads-up that some of this is in a pretty rough state, but shoot me an email if you have any questions or issues.

seriocomic · 2026-06-08T23:59:05 1780963145

Sounds like we're walking the same path - but moved most of the self-hosted stuff to self-hosted Forgejo vs Github...

jfim · 2026-06-07T09:43:18 1780825398

The Epic store is horrendously slow though. I bought a few games there but in practice the client is just so slow that I avoid it if I can.

jfim · 2026-06-06T22:40:13 1780785613

The article makes a lot of points about cost viability, but says nothing about what happens at the end of life for space datacenters.

On Earth, the materials and equipment in the datacenter can be repurposed, recycled, or properly disposed of. In space, EOL'ed stuff either stays in orbit, burns in the atmosphere on reentry, or moved out of useful orbits.

I'm not sure I'm thrilled at the idea of more space junk in orbit or more aerosolized metals in the stratosphere.

jfim · 2026-06-06T05:06:47 1780722407

Indeed. It's pretty interesting to realize after implementing GPT-2 that the frontier models are scaled up versions of that, with various tweaks to improve performance, model-wise.

The secret sauce though is all the datasets, RL training, knowledge of what works from doing all kinds of ablation experiments, and a massive compute moat.

gobdovan · 2026-06-06T07:00:31 1780729231

The secret sauce is also having the necessary 'creativity' to not get ceased and desisted into oblivion and jail from all the copyrighted material you trained your model on. Btw, not making a moral judgement, [0] shows Michael and Dalton from YC discussing why Ilya Sutskever had to leave Google to pursue what's now ChatGPT

[0] https://youtu.be/E8pvgN1j-Ck?t=748

root-parent · 2026-06-06T13:53:53 1780754033

There is a whole moral judgement to be made here...lets hope Ilya wont get too pissed off if somebody leaks the work of his new initiative...information wants to be free and all that...

Also would love to know if the same Legal team advised on Gemini...

miltonlost · 2026-06-06T15:02:18 1780758138

He's a massive massive thief that people who have stolen far less from a convenience store have gone to prison for. The man is a villain.

someguyiguess · 2026-06-06T14:27:30 1780756050

And to make anyone who threatened to expose them “commit suicide”

achrono · 2026-06-06T06:16:44 1780726604

How do we know that today's frontier models are merely scaled up versions of that? Genuine question, since the labs have narrowed what they share over the years to now almost nothing, in terms of how the model was trained and how it works under the hood.

HarHarVeryFunny · 2026-06-06T14:16:09 1780755369

We know for sure the architecture of the open weights models since llama.cpp understands the architecture it needs to build to plug the weights into to run them. It's always possible that the latest closed model is doing something architecturally different than the open weights ones we know about, but judging by how close the large open weight models such as DeepSeek are to SOTA performance, this seems unlikely. When OpenAI first came out with their near-mythical "Strawberry" (aka "o1") thinking model there was all sorts of speculation that they had made some sort of architectural breakthough, but then DeepSeek replicated the capability and published how they did it, proving that it was just better training, not any architectural change.

There have been minor changes to the architecture over the years, but these are basically all efficiency tweaks such as various types of attention (some pioneered in the open by DeepSeek) that better scale to large context lengths, and the confusingly named "mixture of experts" architecture, but what's more notable really is how little the architecture has changed. The capability gains have been coming from better training and better data.

gobdovan · 2026-06-06T07:10:00 1780729800

DeepSeek research:

- V3 https://arxiv.org/abs/2412.19437

- V2 https://arxiv.org/abs/2405.04434

- R1 https://arxiv.org/abs/2501.12948 (RL applied to ML models was well-known beforehand, but they show it in the open, at scale, on big models)

Then, there's the incentive analysis. If you can see that these models empirically get better with scale, why would you swap the main architecture? Those events will be pretty rare. I'm not saying there's noone cooking a new architecture, just that it is a pretty rare event. And it would have to come from some researchers that would be happy to not publish their findings, which is not really what a sizable portion of elite researchers (obviously not all) are incentivized to do.

Of course, it's a bit of a verbal compression to claim simply 'scaled up'. They are recognisable scaled up transformers, but most new models come with a few tricks, but we're at the point where those usually are not an architectural rewrite and added to solve an explicit problem, like hallucination, not for big new capability gains.

swyx · 2026-06-06T18:57:29 1780772249

> If you can see that these models empirically get better with scale, why would you swap the main architecture? Those events will be pretty rare

c.f. hardware lotter https://arxiv.org/abs/2009.06489

matusp · 2026-06-06T07:11:59 1780729919

There are thousands of people working in top level labs. Somebody would leak it

ai_slop_hater · 2026-06-06T06:27:15 1780727235

No they are clearly not just scaled up versions of gpt 2; there are different LLM architectures like mixture of experts etc that appeared relatively recently. I am not an expert though, far from it.

otabdeveloper4 · 2026-06-06T06:35:53 1780727753

MoE and such are basically performance enhancements, they don't make the model smarter.

yababa_y · 2026-06-06T07:23:30 1780730610

separately trained experts can surpass performance in their activated regime and DOES result in a smarter model, the Claude system cards talk about this and eg there is https://openreview.net/forum?id=iydmH9boLb to read...

jmalicki · 2026-06-06T13:16:03 1780751763

Performance enhancements are huge though.

If you can make the existing model faster, you can then save your inference budget to then make your model bigger, which then makes it smarter.

A lot of how smart the models can be comes down to budget. If you can make your existing thing cheaper, you can instead make it bigger for the same price.

TheHalfDeafChef · 2026-06-06T13:42:13 1780753333

Not really “smarter” though? It’s just a big probability engine.

(Not trying to flame bait or anything. I just wouldn’t call LLM as exhibiting intelligence. It is great at making connections based on probability but doesn’t have a semantic understanding of what it is doing)

stevenhuang · 2026-06-07T17:29:29 1780853369

You do realize modern neuroscience considers the human brain as "just" a probability engine and that intelligence may well be the ability for an organism to predict well.

> doesn’t have a semantic understanding of what it is doing

I hope you realize this is an area of open, active research.

Chu4eeno · 2026-06-08T08:06:19 1780905979

Didn't neuroscience some big scandals about bad statistics and overstating their findings (in addition to normal issues like replication)? Look up at least the "dead salmon study" (hint: it's related to fMRI, and you can probably guess its conclusions from its nickname). The "Voodoo Correlations" and "Cluster Failure" papers are also a bit eye-opening.

In general we (humans) need to be humble about the limitations of our knowledge about how we function, it's an insanely complicated problem.

jmalicki · 2026-06-08T19:46:39 1780947999

> In general we (humans) need to be humble about the limitations of our knowledge about how we function, it's an insanely complicated problem.

We do.

Which is why we shouldn't be assuming we're more than just probability engines, or be assuming we have more consciousness than a neural network.

otabdeveloper4 · 2026-06-06T14:15:04 1780755304

> to then make your model bigger, which then makes it smarter

There's diminishing returns and at some point making a model bigger makes it dumber.

lobocinza · 2026-06-08T18:11:32 1780942292

Maybe due to lack of data and dimensions other than words.

fizx · 2026-06-06T20:59:00 1780779540

Performance enhancements are what allow you to train a bigger model.

locknitpicker · 2026-06-06T15:01:30 1780758090

> The secret sauce though is all the datasets, RL training, knowledge of what works from doing all kinds of ablation experiments, and a massive compute moat.

ReAct loops and tool-calling are the critical development feature. They turn a model from something that generates text into something that can independently influence the world around them.

Without agent features, you have just a chatbot.

galaxyLogic · 2026-06-06T19:59:55 1780775995

The big breakthrough is we can interact with the agents using natural language - because of the LLM.

It is the combination of LLM and agent-harnesses that make it look really smart. Agent-harness is a programmatic device that lets us tap into the vast knowledge in the LLM.

It is probabaly true that many TV-commentators fail to appreciate this fact and therefore think LLMs are super-intelligent. No, it is the combination of LLM and the programmatic agent-haness that is the breakthrough.

An interesting thought is that the LLM could in theory code the agent-harrness, start it running every time we interact with it. Currently the agent-harrness I think is pretty static I think. In theory it could be dynamically created for every task. Would that make it better don't know.

locknitpicker · 2026-06-07T10:27:21 1780828041

> The big breakthrough is we can interact with the agents using natural language - because of the LLM.

Without ReAct and tool calling, all you have is a chatbot. That's useful, but it's just a chatbot.

ReAct loops and tool calling is what unblocks high value usecases. It enables systems to actually address free-form problem statements, gather data that is not a part of their training set, inspect the current state of services,and trigger actions in external systems. This goes well beyond mere chatbots.

> It is the combination of LLM and agent-harnesses that make it look really smart.

It's really not about "smart". It's about autonomous systems, and being able to consume and analyze new data, and trigger actions in external systems.

Chu4eeno · 2026-06-08T08:18:31 1780906711

It's not very novel, though, it's a fairly obvious step once you get something that can operate iteratively and largely independent, there were a ton of people trying to get LLMs to loop on their own even before deepseek r1.

And I remember talking about goal directed behavior (which what people are calling "agents" now don't seem to properly have) and autonomous operation decades ago in the intelligent agent course at uni, including react loops.

So no, the huge step with LLMs really was just that attention mechanism from that translation paper everyone forgot until Google brought its marketing to it, everything else is either just optimization/scaling, more money or old ideas suddenly relevant.

locknitpicker · 2026-06-08T15:49:28 1780933768

> It's not very novel, though (...)

I completely disagree. The rollout of agentic tools, and even support for agent mode in IDEs, is the whole value proposition of AI code assistant services.

Otherwise you'd just have a glorified search engine in a chat window.

> (...) it's a fairly obvious step once you get something that can operate iteratively and largely independent,

There's some confusion in your reply. ReAct loops is exactly what this "operate iteratively and largely independently" represents.

jfim · 2026-06-05T22:42:46 1780699366

The good thing is since the feature was cheap to implement, you can just say "this was a bad idea" and remove it, as long as adding that feature wasn't a one way decision. People are typically more reticent to remove things that were hard to implement, even if that's the right thing to do.

overgard · 2026-06-06T01:28:58 1780709338

That's the problem, even with an LLM, removing a feature two weeks later can be a nightmare because things have grown to depend on it. In a way it's even harder because the velocity of stuff piling in is much greater.

fg137 · 2026-06-06T12:11:32 1780747892

> you can just say "this was a bad idea" and remove it

Have you ever actually done that in a "serious" product, or just made it up?

In any product with actual customers, especially those from other (big) companies, features don't just go away at the snap of a finger. Otherwise, have fun discovering users moving to your competitors.

Anecdotally, a product from another company had removed or made significant changes to important features our users rely on every day multiple times with short notice, several times. We didn't hesitate to migrate to another service, which completed within about a month.

kibwen · 2026-06-05T23:34:26 1780702466

> People are typically more reticent to remove things that were hard to implement, even if that's the right thing to do.

Careful. The sunk cost fallacy isn't just about time, it's also about money, and people may naturally be reluctant to remove bad features that cost them a lot of tokens, especially if the act of removal itself is going to cost even more tokens.

jfim · 2026-06-06T05:10:09 1780722609

That's a pretty good point, and I assume at $work they wouldn't appreciate throwing away $n dollars worth of code.

jfim · 2026-06-01T05:57:41 1780293461

That's what they're working on, in theory, with Windows K2.

fhn · 2026-06-01T06:24:54 1780295094

I would never trust Microsoft. Their next drama is revoking Office 2019 perpetual licenses https://www.youtube.com/watch?v=KRnno9VIZx0. It never ends with them because they know they have you by the balls.

twilo · 2026-06-01T06:59:55 1780297195

I trust them on a daily basis. No issues thus far..

jfim · 2026-05-31T19:47:17 1780256837

Prototypes aren't only for UX though, sometimes they're for exploring whether something is technically possible, or what are the unknown unknowns in a particular area.

For example, for personal projects, I've been wondering if it's possible to automatically create RSS feeds for pages that don't have them (yes), what are the challenges when building an archive-style page dumping system (need to dump CSSOM alongside getOuterHTML, remove/rewrite remote content, walk iframes, automate Chrome, scroll to load lazily loaded content, etc.), and if training a model to remove native ads from markdown coming from readability is possible (no, at least not with my current approach, but using the dom might work).

dakolli · 2026-05-31T22:21:02 1780266062

Why wouldn't you use Archive Box?

https://github.com/archivebox/archivebox

jfim · 2026-06-01T05:09:14 1780290554

A few reasons. Learning is one of them, since I don't normally deal much with browser and web related technologies, so it's a good way to learn more about them.

I also think there are a few interesting things you can explore that go beyond a simple carbon copy of what's on the Internet. Ideas that I've implemented are things like automatic extraction of audio tracks, transcription, and summarization, loading a page or podcast transcript into the context window of a LLM to discuss the arguments or factuality of the claims being made, automatically turning articles to reader view using readability/trafilatura, etc.

Directions I'd like to explore would be things like multimodal search ("that page I read six months ago about computer security with neon green text on a black background", or give me a list of fitness related pages I've read in the last twelve months), personal statistics (how is the mix of topics I've been reading about changing over time), annotating pages instead of just passively reading them, maybe even P2P archiving or discussions about pages, and all kinds of other things.

But installing archivebox would be easier indeed.

theshrike79 · 2026-06-04T19:42:58 1780602178

Mostly because I only need to get a site into an RSS feed, I don't need a massive archival solution to do that.

abalashov · 2026-06-01T02:05:48 1780279548

I was today years old when I learned about this. Thank you!

jfim · 2026-05-28T06:55:42 1779951342

Claude is https://claude.ai/new?q=%s

jfim · 2026-05-26T14:40:02 1779806402

Archived version This repository is a mirror of the version currently found on source forge, duplicated here in an effort to no help preserve this tool into the future. I, Julia Desmazes, have not contributed in any way to the development of this tool, all credit belongs to the original author.

In addition to being a command line tool, this tool also used to be available interactively though the now defuncts OutputLogic website.

jfim · 2026-05-26T05:20:26 1779772826

It's bumping to manager level, except without the 1:1s, quarterly/yearly planning, headcount and budget reviews, org/reorg discussions, performance calibration, and OKR planning. No complaints about the last review cycle or about the upcoming one.

baq · 2026-05-26T07:44:24 1779781464

All the ceremony must be replaced with process optimization, skill extraction, harness development and new model evals.

Still better than dealing with people, but only just.

darkwater · 2026-05-26T06:00:52 1779775252

Totally! But you know what? There are many, oh so many developers that are not ready, don't like and probably are not even cut for this kind of position.