Ask HN: Devs using LLMs, how are you keeping costs low for LLM calls locally?

throwaway888abc · on July 18, 2024

Use The Cache Luke...

Langchaing exemples:

[1] Caching https://python.langchain.com/v0.1/docs/modules/model_io/llms...

[2] Fake LLM https://js.langchain.com/v0.1/docs/integrations/llms/fake/

spruce_tips · on July 18, 2024

options if im not using langchain?

ricktdotorg · on July 18, 2024

Cloudflare AI gateway [0]

[0] https://developers.cloudflare.com/ai-gateway/configuration/c...

spruce_tips · on July 18, 2024

options if im not using cloudflare?

retrovrv · on July 19, 2024

there's an open source ai gateway - https://github.com/Portkey-AI/gateway

throwaway888abc · on July 18, 2024

You can do it easily with object caching / function memorization patterns in any modern language which should fit your desired solution. Best of luck!

roh26it · on July 19, 2024

Here's a mega guide on keeping costs low with LLMs - https://portkey.ai/blog/implementing-frugalgpt-smarter-llm-u...

tl;dr: - Keep prompts short, combine prompts or make more detailed prompts but go to a smaller model - Simple and semantic cache lookups - Classify tasks and route to the best LLM using an AI gateway

Portkey.ai could help with a lot of this

retrovrv · on July 19, 2024

came across this guide earlier - valuable insights. thanks for sharing!