More

ramesh1994 · on April 8, 2024

Hey this sounds pretty cool! If this is public would you mind sharing a link?

iskandery · on April 8, 2024

Not the OP, but I've actually made something like that to help people figure out where to live in London based on the criteria they care about -- here it is https://findmyarea.co.uk/?search_type=areas

Would be great to see similar tools for other cities!

petargyurov · on April 8, 2024

This is amazing! I am curious what algorithm is behind the scenes?

And how easy would it be to make this work for any city?

iskandery · on April 8, 2024

Thanks! It's a bit outdated now but I wrote up an explanation a while back that gives the general idea behind the scoring algorithm: https://findmyarea.co.uk/blog/how-findmyarea-works/

In principle it should be possible to do the same for any city. The hardest part is getting the data and massaging it into shape, which can be somewhat tedious. London is a fairly "easy" city for this due to the large amount of open data that's readily available from the ONS and the like.

ramesh1994 · on April 11, 2024

This is really cool! Appreciate sharing this work and the explanation.

You mentioned that massaging the data into shape as one of the problems, which I think is possibly one of the best applications of LLMs in my opinion. Creating a pipeline of a data feed (still hard) into an LLM which outputs JSON with the fields of interest would be amazing.

dagw · on April 8, 2024

The code never advanced beyond some hack scripts and a Jupyter notebook. Cleaning it up and perhaps making it into web app has been on my todo list for the past 2-3 years. Maybe this reminder will be the kick I need to actually do something about it.

ramesh1994 · on April 11, 2024

Sorry I don't know how hackernews notifications work, didn't see this reply. But if you see this here's another reminder for you to clean this up into a public demo :)

ramesh1994 · on March 8, 2024

I just found about this project from this comment, absolutely excited to try this out.

As someone who's never used any of the infrastructure tools, I'm thinking of pyinfra as a way to run shell commands + install dependencies on hosts (declaratively?) on a bunch of hosts via ssh.

Inventory is to sort of take a self-defined inventory on a bunch of hosts?

One final question on usage, would it be possible to sync or reference files from the machine running pyinfra on the remote hosts? Or would that have to be done indirectly via running shell commands to sync?

Fizzadar · on March 16, 2024

Missed this comment apologies!

- ops are (mostly) declarative, but some (server.shell) will always execute the command given

- inventory is just that, basically a list of hosts to target plus associated data, docs page: https://docs.pyinfra.com/en/2.x/inventory-data.html

- absolutely for syncing files, check out the files.put and files.template operations (and the files ops in general): https://docs.pyinfra.com/en/2.x/operations/files.html

ramesh1994 · on Jan 4, 2024

I think distillation in the original sense isn't being done anymore but finetuning on outputs from larger models like GPT-4 is a form of distillation (top-1 logit vs all logits and a curated synthetic data instead of the original dataset)

On quantization though its still weird how just the weights are quantized in methods like gptq / int8 while there are other methods which quantize the activations as well. There's also the matter of KV cache still being in original 16bit precision regardless which is also unsolved here. Do you have any thoughts or insights into this?

fnbr · on Jan 4, 2024

It’s not clear to me what’s happening on the distillation front. I agree no one is doing it externally, but I suspect that the foundation model companies are doing it internally, performance is just too good.

There’s a bunch of recent work that quantizes the activations as well, like fp8-LM. I think that this will come. Quantization support in PyTorch is pretty experimental right now, so I think we’ll see a lot of improvements as it gets better support.

The KV cache piece is tied to the activations imo- once those start getting quantized effectively, the KV cache will follow.

sheikheddy · on Jan 5, 2024

1) Any particular reasoning behind estimating OpenAI’s margins are 60%?

2) How much does human preference diverge from benchmark scores in your experience?

3) Do woodpeckers stop attacking houses when it’s winter in Alberta?

fnbr · on Jan 5, 2024

1) i actually think that’s too high, i bet it’s more like 30%. My logic is that they have to have _some_ margin, but LLMs are too expensive to have typical software margins. Total speculation though.

2) It generally tracks pretty well unless the model is gaming the metric (training on the test set, overfit to the specific source of data, etc). The relative rankings will typically match in both.

3) alas, not with the mild winter North America’s having. They only stop below -5C or so. I am lucky though. The woodpecker stopped attacking my house and started attacking my neighbor’s. Even worse, it used to be a downy woodpecker,and it’s now been replaced by a pileated one (think: Woody).

ramesh1994 · on June 1, 2023

Head over to the elueuther.ai discord and discuss with some of the folks there. Tons of small experiments with LLMs can use the $10k in compute

ramesh1994 · on May 25, 2023

It prohibits anything that competes with OpenAI services i.e as long as you're not literally providing an LLM API commercially you should be fine

bagels · on May 25, 2023

Does it compete with them if you stop paying for their API?

ramesh1994 · on May 17, 2023

I think parts of the write-up are great.

There are some unique assumptions being made in parts of the gist

> 10: Cost Ratio of OpenAI embedding to Self-Hosted embedding

> 1: Cost Ratio of Self-Hosted base vs fine-tuned model queries

I don't know how useful these numbers are if you take away the assumptions that self-hosted will work as well as API.

> 10x: Throughput improvement from batching LLM requests

I see that the write up mentions memory being a caveat to this, but it also depends on the card specs as well. Memory Bandwidth / TFLOPs offered by say 4090 is superior while having the same amount of VRAM as 3090. The caveat mentioned with token length in the gist itself makes the 10x claim not a useful rule of thumb.

ramesh1994 · on May 17, 2023

> This means it is way cheaper to look something up in a vector store than to ask an LLM to generate it. E.g. “What is the capital of Delaware?” when looked up in an neural information retrieval system costs about 5x4 less than if you asked GPT-3.5-Turbo. The cost difference compared to GPT-4 is a whopping 250x!

In a narrow use-case of a strict look-up. This seems to exaggerate the cost difference while having completely different trade-offs.

ramesh1994 · on April 30, 2023

I've been looking for a course like this! Especially great given how much of the recent progress in training large models is made possible with the aid of flash attention and fused kernels

ramesh1994 · on April 26, 2023

Was it for seeding/hosting torrents or from just downloading them? How long did the whole thing take to play out?

I've always assumed that consuming torrents has been low stakes to the point where its not worth any enforcement

tough · on April 26, 2023

They go after the site operators if they can.

In my case It was link sharing, no torrents, and at that time that didnt constitute -public publishing- as the link linked to some 3rd party host, and it was also user generated content.

FWIW they've only won some fringe cases where they scared the person into accepting the charges.

But Im not 15 anymore and wouldn't risk it tbh

the_gipsy · on April 26, 2023

Yea the key point is link sharing. Spanish torrent sites have been closed down on ISP level and sued on the grounds of profiting from ads.

ramesh1994 · on April 14, 2023

It is a pretty fun game https://wiki-race.com/

ramesh1994 · on March 28, 2023

The term "chinchilla" predates llama/alpaca. It doesn't directly map to a specific model, rather a family of compute-optimal models.