Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I work at Microsoft, though not in AI. This describes Copilot to a T. The demos are spectacular and get you so excited to go use it, but the reality is so underwhelming.


Copilot isn't underwhelming, it's shit. What's impressive is how Microsoft managed to gut GPT-4 to the point of near-uselessness. It refuses to do work even more than OpenAI models refuse to advise on criminal behavior. In my experience, the only thing it does well is scan documents on corporate SharePoint. For anything else, it's better to copy-paste to a proper GPT-4 yourself.

(Ask Office Copilot in PowerPoint to create you a slide. I dare you! I double dare you!!)

The problem with demos is that they're staged, they showcase integrations that are never delivered, and probably never existed. But you know what's not hype and fluff? The models themselves. You could hack a more useful Copilot with AutoHotkey, today.

I have GPT-4o hooked up as a voice assistant via Home Assistant, and what a breeze that is. Sure, every interaction costs me some $0.03 due to inefficient use of context (HA generates too much noise by default in its map of available devices and their state), but I can walk around the house and turn devices on and off by casually chatting with my watch, and it work, works well, and works faster than it takes to turn on Google Assistant.

So no, I honestly don't think AI advances are oversold. It's just that companies large and small race to deploy "AI-enabled" features, no matter how badly made they are.


Basically, functional AI interactions are prohibitively resource intensive and expensive. Microsoft's non-coding Copilots are shit due to resource constraints.


Basically, yes. My last 4 days of playing with this voice assistant cost me some $3.60 for 215 requests to GPT-4o, amounting to a little under 700 000 tokens. It's something I can afford[0], but with costs like this, you can't exactly give GPT-4 access out to people for free. This cost structure doesn't work. It doesn't with GPT-4o, so it more than twice as much didn't with earlier model iterations. And yet, that is what you need if you want a general-purpose Copilot or Assistant-like system. GPT-3.5-Turbo ain't gonna cut it. Llamas ain't gonna cut it either[1].

In a large sense, Microsoft lied. But they didn't lie about capability of the technology itself - they just lied about being able to afford to deliver it for free.

--

[0] - Extrapolated to a hypothetical subscription, this would be ~$27 per month. I've seen more expensive and worse subscriptions. Still, it's a big motivator to go dig into the code of that integration and make it use ~2-4x fewer tokens by encoding "exposed entities" differently, and much more concisely.

[1] - Maybe Llama 3 could, but IIRC license prevents it, plus it's how many days old now?


> they just lied about being able to afford to deliver it for free.

But they never said it'll be free - I'm pretty sure it was always advertised as a paid add-on subscription. With that being the case, why would they not just offer multiple tiers to Copilot, using different models or credit limits?


Contrary to what the corporations want you to believe -- no, you can't buy your way out of every problem. Most of the modern AI tools are mostly oversold and underwhelming, sadly.


whoa that's very cool. can you share some info about how you set up the integration in ha? would love to explore doing something like this for myself


With the most recent update, it's actually very simple. You need three things:

1) Add OpenAI Conversation integration - https://www.home-assistant.io/integrations/openai_conversati... - and configure it with your OpenAI API key. In there, you can control part of the system prompt (HA will add some stuff around it) and configure model to use. With the newest HA, there's now an option to enable "Assist" mode (under "Control Home Assistant" header). Enable this.

2) Go to "Settings/Voice assistants". Under "Assist", you can add a new assistant. You'll be asked to pick a name, language to use, then choose a conversation model - here you pick the one you configured in step 1) - and Speech-to-Text and Text-to-Speech models. I have a subscription to Home Assistant Cloud, so I can choose "Home Assistant Cloud" models for STT and TTS; it would be great to integrate third party ones here, but I'm not sure if and how.

3) Still in "Settings/Voice assistants", look for a line saying "${some number} entities exposed", under "Add assistant" button. Click that, and curate the list of devices and sensors you want "exposed" to the assistant - "exposed" here means that HA will make a large YAML dump out of selected entities and paste that into the conversation for you[0]. There's also other stuff (I heard docs mentioning "intents") that you can expose, but I haven't look into it yet[1].

That's it. You can press the Assist button and start typing. Or, for much better experience, install HA's mobile app (and if you have a smartwatch, the watch companion app), and configure Home Assistant as your voice assistant on the device(s). That's how you get the full experience of randomly talking to your watch, "oh hey, make the home feel more like a Borg cube", and witnessing lights turning green and climate control pumping heat.

I really recommend everyone who can to try that. It's a night-and-day difference compared to Siri, Alexa or Google Now. It finally fulfills those promises of voice-activated interfaces.

(I'm seriously considering making a Home Assistant to Tasker bridge via HA app notification, just to enable the assistant to do things on my phone - experience is just that good, that I bet it'll, out of the box, work better than Google stuff.)

--

[0] - That's the inefficient token waster I mentioned in the previous comment. I have some 60 entities exposed, and best I can tell, it generates a couple thousand token's worth of YAML, most of which is noise like entity IDs and YAML structure. This could be cut down significantly if you named your devices and entities cleverly (and concisely), but I think my best bet is to dig into the code and trim it down. And/or create a synthetic entities that stand for multiple entities representing a single device or device group, like e.g. one "A/C" entity that combines multiple sensor entities from all A/C units.

[1] - Outside the YAML dump that goes with each message (and a preamble with current date/time), which is how the Assistant know current state of every exposed entity, there's also an extra schema exposing controls via "function calling" mechanism of OpenAI API, which is how the assistant is able to control devices at home. I assume those "intents" go there. I'll be looking into it today, because there's a bunch of interactions I could simplify if I could expose automation scripts to the assistant.


lol I can’t help but assume that people who think copilot is shit have no idea what they are doing.


I have it enabled company-wide at enterprise level, so I know what it can and can't do in day-to-day practice.

Here's an example: I mentioned PowerPoint in my earlier comment. You know what's the correct way to use AI to make you PowerPoint slides? A way that works? It's to not use the O365 Copilot inside PowerPoint, but rather, ask GPT-4o in ChatGPT app to use Python and pandoc to make you a PowerPoint.

I literally demoed that to a colleague the other day. The difference is like night and day.


I've gone back to using GitHub Copilot with reveal.js [0]. It's much nicer to work with, and I'd recommended it unless you specifically need something from PowerPoint's advanced features.

[0] https://revealjs.com/


GitHub (which is owned by Microsoft) Copilot or Microsoft Copilot?


It's a lot like AR before Vision Pro. The situation for the demo and reality didn't meet. I'm not trying to claim Vision Pro is perfect but it seems to do AR in the real world without the circumstances needing to be absolutely ideal.


The Vision Pro is not doing well. Apple has cancelled the next version.[1] As Carmack says, AR/VR will be a small niche until the headgear gets down to swim goggle size, and will not go mainstream until it gets down to eyeglass size.

[1] https://www.msn.com/en-us/lifestyle/shopping/apple-shelves-n...


It was always the plan for Apple to release a cheaper version of the Vision Pro next. That the next version of the PRO has been postponed isn't a huge sign. It just seems that the technology isn't evolving quickly enough to warrant a new version any time soon.


> swim google size

The "Bigscreen Beyond" [0] is quite close, but doesn't have cameras - so at this stage it's only really good for watching movies and the like.

[0] https://store.bigscreenvr.com/products/bigscreen-beyond


That one does have 6DoF tracking, it's just based on the Valve Lighthouse system. Upside of that system is it's more privacy respecting.


Which it probably won't, because real life physics are not aware about roadmaps and corporate ads.


What physics are you talking about? Limits on power? Display? Sensor size? I ask because I’ve had similar feelings about things like high speed mobile Internet or mobile device screen size (over a couple of decades) and lived to see all my intuition blown away, so I really don’t believe in limits that don’t have explicit physical constraints behind them.


Lens diffraction limits. VR needs lenses that are small and thin enough while still being powerful enough to bend the needed light towards the eyes. Modern lenses need more distance between the screen and the eyes and they’re quite thick.

Theoretically future lenses may make it possible, but the visible light metamaterials needed are still very early research stage.


Apple approved ALVR few days ago too, clearly they're having issues at least wrt getting developer attention.

1: https://apps.apple.com/us/app/alvr/id6479728026


Your article states this differently. The development has not been canceled fully but re focused.

“and now hopes to release a more standard headset with fewer abilities by the end of next year.


That's marketing-speak for "cancelled".


I think both hardware and software in AR have to become unobtrusive for people to adopt it. And then it will be a specialized tool for stuff like maintenance. Keeping large amounts of information in context without requiring frequent changes in context. But I also think that the information overload will put a premium on non-AR time. Once it becomes a common work tool, people using it will be very keen to touch grass and watch clouds afterwards.

I don't think it will ever become the mainstream everyday carry proponents want it to be. But only time will tell...


Until there is an interface for it that allows you to effectively touch type (or equivalent) then 99% of jobs won't be able to use it away from a desk anyway. Speech to text would be good enough for writing (non technical) documentation but probably not for things like filling spreadsheets or programming.


But does what Apple has shown in its demos of the Vision Pro actually meet reality? Does it provide any value at all?

In my eyes, it's exactly the same as AI. The demos work. You can play around with it, and its impressive for an hour. But there's just very little value.


The value would come if it was something you would feel comfortable wearing all day. So it would need perfect pass through, be much much lighter and more comfortable. If they achieved that and can do multiple high resolution virtual displays then people would use it.

The R&D required to get to that point is vast though.


> can do multiple high resolution virtual displays

In most applications, it then would need to compete on price with multiple high resolution displays, and undercut them quite significantly to break the inertia of the old tech (and other various advantages - like not wearing something all day and being able to allow other people to look at what you have on your screen).


I take your point but living in a London flat I don't have the room for multiple high resolution displays. Nor are they very portable, I have a MBP rather than an iMac because mobility is important.

I do think we're 4+ years until it gets to the 'iPhone 1' level of utility though, so we'll see how committed Apple are to it.


That's what all these companies are peddling though. The question is - do humans actually NEED a display before their eyes for all awake time? Or even most of it? Maybe, but today I have some doubts.


Given how we as a society are now having significant second thoughts as to the net utility for everybody having a display in their pocket for all awake time, I also have some doubts.


it's very sad because it's sort of so near but so far kind of situation

It would be valuable if it could do multimonitor, but it can't. It would be valuable if it could run real apps but it only runs iPad apps. It would be valuable if Apple opened up the ecosystem, and let it easily and openly run existing VR apps, including controllers - but they won't.

In fact the hardware itself crosses the threshold to where the value could be had, which is something that couldn't be said before. But Apple deliberately crimped it based on their ideology, so we are still waiting. There is light at the end of the tunnel though.


> But Apple deliberately crimped it based on their ideology

It's in a strange place, because Apple definitely also crimped it by not even writing enough software for it inhouse.

Why can't it run Mac apps? Why can't you share your "screen configuration" and its contents with other people wearing a Vision Pro in the same room as you?


It is not really AR. Reality is not just augmented but captured first with camera. It can make someone dizzy.


It's the opposite of AR, it's VR augmented with real imagery.


I never considered this angle. (Yeah, I am a sucker -- I know.) Are you saying that they cherry pick the best samples for the demo? Damn. I _still_ have high hopes for something like Copilot. I work on CRUD apps. There are so many cases where I want Copilot to provide some sample code to do X.


Sorry I didn’t mean GitHub Copilot. Code generation is definitely one of the better use cases for AI. I meant the “Copilot” brand that Microsoft has trotted out into pretty much everyone of its products and rolled together in this generic “Copilot” app on windows.


They absolutely do. Check out this video https://youtu.be/tNmgmwEtoWE




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: