Hacker Newsnew | past | comments | ask | show | jobs | submit | thenameless7741's commentslogin

A recent example: a law firm hired this person [0] to build a private AI system for document summarization and Q&A.

[0] https://xcancel.com/glitchphoton/status/1927682018772672950


If you install llama.cpp via Homebrew, llama-mtmd-cli is already included. So you can simply run `llama-mtmd-cli <args>`


Oh even better!!


> it'll probably take a year for the FOSS community to implement and digest it completely

The local community seems to have converged on a few wrappers: Open WebUI (general-purpose), LM Studio (proprietary), and SillyTavern (for role-playing). Now that llama.cpp has an OpenAI-compatible server (llama-server), there's a lot more options to choose from.

I've noticed there really aren't many active FOSS wrappers these days - most of them have either been abandoned or aren't being released with the frequency we saw when OpenAI API first launched. So it would be awesome if you could share your wrapper with us at some point.


I think OP means that FOSS didn't digest many multimodals of phi4-mini-multimodal such as Audio Input (STT) and Audio Output (TTS), also Image Input also not much supported in many FOSS.


AFAIK, Phi-4-multimodal doesn't support TTS, but I understand OP's point.

The recent Qwen's release is an excellent example of model providers collaborating with the local community (which include inference engine developers and model quantizers?). It would be nice if this collaboration extended to wrapper developers as well, so that end-users can enjoy a great UX from day one of any model release.


Hah, ty, I badly misunderstood the release materials


I've been happier with LibreChat over Open WebUI. Mostly because I wasn't a fan of the `pipelines` stuff in Open WebUI and lack of MCP support (probably has changed now?). But then I don't love how LibreChat wants to push its (expensive) code runner service.


Kobold.cpp is still my preference for a gui. Single portable exe with good flexibility in configuration if you want it, no need if not.


Oobabooga is still good as a Swiss Army knife sort of wrapper for a single user trying out new models


it's mentioned in the main thread: https://nitter.net/athyuttamre/status/1899511569274347908

> [Q] Does the Agents SDK support MCP connections? So can we easily give certain agents tools via MCP client server connections?

> [A] You're able to define any tools you want, so you could implement MCP tools via function calling

in short, we need to do some plumbing work.

relevant issue in the repo: https://github.com/openai/openai-agents-python/issues/23


Interesting.. In the official API [1], there's no way to prefill the reasoning_content:

> Please note that if the reasoning_content field is included in the sequence of input messages, the API will return a 400 error. Therefore, you should remove the reasoning_content field from the API response before making the API request

So the best I can do is pass the reasoning as part of the context (which means starting over from the beginning).

[1] https://api-docs.deepseek.com/guides/reasoning_model



That's different text.


I've just started experimenting on an AI wrapper that blends companion and assistant into one (think Replika meets Claude), but with an anime-style avatar for the main interface.

As I'm still very early (still in the ideation and prototyping phase), I'd love to hear about experiences that have stuck with you, or any works that got you excited about the possibilities.


Before anyone reads too much into this, here's what an Anthropic staff said on Discord:

> i don't write the docs, no clue

> afaik opus plan same as its ever been


Maybe he's not high level enough employee to have any say in the product roadmap, and he's behind on leadership planning?


As a technical person who recently taught myself frontend from scratch, I found https://web.dev/learn way more structured and thorough. The CSS lesson covers all the essentials and actually made me enjoy working with CSS.

web.dev doesn't get as much love as MDN, but it totally should!


Blog updates:

- Introducing the Realtime API: https://openai.com/index/introducing-the-realtime-api/

- Introducing vision to the fine-tuning API: https://openai.com/index/introducing-vision-to-the-fine-tuni...

- Prompt Caching in the API: https://openai.com/index/api-prompt-caching/

- Model Distillation in the API: https://openai.com/index/api-model-distillation/

Docs updates:

- Realtime API: https://platform.openai.com/docs/guides/realtime

- Vision fine-tuning: https://platform.openai.com/docs/guides/fine-tuning/vision

- Prompt Caching: https://platform.openai.com/docs/guides/prompt-caching

- Model Distillation: https://platform.openai.com/docs/guides/distillation

- Evaluating model performance: https://platform.openai.com/docs/guides/evals

Additional updates from @OpenAIDevs: https://x.com/OpenAIDevs/status/1841175537060102396

- New prompt generator on https://playground.openai.com

- Access to the o1 model is expanded to developers on usage tier 3, and rate limits are increased (to the same limits as GPT-4o)

Additional updates from @OpenAI: https://x.com/OpenAI/status/1841179938642411582

- Advanced Voice is rolling out globally to ChatGPT Enterprise, Edu, and Team users. Free users will get a sneak peak of it (except EU).


> Advanced Voice is rolling out globally to ChatGPT Enterprise, Edu, and Team users. Free users will get a sneak peak of it.

So regular paying users from EU are still left out in the cold.


It's probably stuck in legal limbo in the EU. The recently passed EU AI Act prohibits "AI systems aiming to identify or infer emotions", and Advanced Voice does definitely infer the user's emotions.

(There is an exemption for "AI systems placed on the market strictly for medical or safety reasons, such as systems intended for therapeutical use", but Advanced Voice probably doesn't benefit from that exemption.)


Apparently this prohibition only applies to "situations related to the workplace and education", and, in this context, "That prohibition should not cover AI systems placed on the market strictly for medical or safety reasons"

So it seems to be possible to use this in a personal context.

https://artificialintelligenceact.eu/recital/44/

> Therefore, the placing on the market, the putting into service, or the use of AI systems intended to be used to detect the emotional state of individuals in situations related to the workplace and education should be prohibited. That prohibition should not cover AI systems placed on the market strictly for medical or safety reasons, such as systems intended for therapeutical use.


This is true, though it may not make sense commercially for them to offer an API that can't be used for workplace (business) applications or education.


I see what you mean, but I think that "workplace" specifically refers to the context of the workplace, so that an employer cannot use AI to monitor the employees, even if they have been pressured to agree to such a monitoring. I think this is unrelated to "commercially offering services which can detect emotions".

But then I don't get the spirit of that limitation, as it should be just as applicable to TVs listening in on your conversations and trying to infer your emotions. Then again, I guess that for these cases there are other rules in place which prohibit doing this without the explicit consent of the user.


> I think that

> I think this

> I don't get the spirit of that limitation

> I guess that

In a nutshell, this uncertainty is why firms are going to slow-roll EU rollout of AI and, for designated gatekeepers, other features. Until there is a body of litigated cases to use as reference, companies would be placing themselves on the hook for tremendous fines, not to mention the distraction of the executives.

Which, not making any value judgement here, is the point of these laws. To slow down innovation so that society, government, regulation, can digest new technologies. This is the intended effect, and the laws are working.


Companies like OpenAI definitely have the resources to let some lawyers analyze the situation and at this point it should be clear to them if they can or can't do this. It's far more likely that they're holding back because of limitations in hardware resources.

I use those words because I've never read any of the points in the EU AIA.


They definitely do have the resources, but laws and regulations are frequently ambiguous. This is one reason the outcome of litigation is often unpredictable.

I would wager this -- OpenAI lawyers have looked that the situation. They have not been able to credibly say "yes, this is okay" and so management makes the decision to wait. Obviously, they would prefer to compete in Europe if it were a no-brainer decision.

It may be possible that the path to get to "yes, definitely" includes some amount of discussion with the relevant EU authorities and/or product modification. These things will take time.


Yes, but it works with a vpn and the change in latency isn’t big enough to have a noticeable impact on usability.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: