Hacker Newsnew | past | comments | ask | show | jobs | submit | russ's commentslogin

This is very cool! I’ve wanted something like CodeMic for a long time.

Back when I was at Twitter, we used Review Board for code reviews (this was in 2009, before GH was a thing for most companies). It was tough to thoughtfully review large branches, especially for parts of the codebase that I wasn’t familiar with. I remember thinking, if I could somehow record the development process for a PR I was reviewing, it would be easier to understand what the submitter was trying to accomplish and how they went about doing so. I found myself more so reviewing code style instead of functionality, architecture, or design.

I watched most of the intro video, but didn’t go deeper on the site. Does CM integrate easily into the code review/PR process? I suppose I could just attach a link in any PR description?

Great work!


Thanks a lot! I have thought about it being useful for learning, for fun, as a new kind of documentation, or even for onboarding new hires but the use case for code review didn't occur to me. That's great. I can think of 3 ways to share sessions:

- attach a link as you said

- once the web player is ready, it could perhaps be integrated into the code review tool. It'll be like embedding a youtube video

- the entire recorded session can be exported as a zip file and attached to the PR or shared privately


As the reviewer, I would like to annotate the replay session along the way and have those annotations available to other reviewers and the PR author.


Yeah, that would be really handy too.


Haven’t played Codenames in a long while, but made this 8 years ago to play with family and friends on TVs. Right in time for the holidays!

demo: https://dsa.github.io

code: https://github.com/dsa/dsa.github.io


Amazing! Thanks for sharing


You got it! Hope you have some fun with it. :)


This is super cool. One neat idea: when I'm in offline mode, I can clone my voice, provide some context data/sources, and have my AI clone answer calls for me. It can give me a summary of conversations it had each day and allow me to follow up.


We’re definitely heading in that direction and currently experimenting with LiveKit’s Agent framework. I’m guessing you’re Russ from LiveKit? If so, I’m a huge fan of what you’re doing! Would love to connect and explore ideas further [email protected]


Haha yup, I’m that Russ. Really appreciate your kind words. <3

I’ll shoot you an email and let’s chat!


Field CTO — hi @Sean-Der :wave:

Fractional CTO sounds like a disaster lol


My bad, he was Field CTO.


Which components feel ad hoc?

In most real applications, the agent has additional logic (function calling, RAG, etc) than simply relaying a stream to the model server. In those cases, you want it to be a separate service/component that can be independently scaled.


Essentially I think the Livekit value is a SFU that works, with signalling, and the SDKs exist. My experience is people radically overstate how hard signalling is, and underestimate SFU complexity, especially with fast failover.

In terms of being a higher level API arguably it is doomed to failure, thanks to the madness of the domain. (The part that sticks in my mind is audio device switching on Android.) WebRTC products seem to always end up with the consumer needing to know way more of the internals than is healthy. As such I think once you are sufficiently good at using LiveKit you are less likely to pick it for your next product because you will be able to roll your own far more easily. That is unless the value you were getting from it actually was the SFU infrastructure and not the SDKs.

The OpenAI case is so point-to-point that doing WebRTC for that is, honestly, really not hard at all.


You really don’t need to know about WebRTC at all when you use LiveKit. That’s largely thanks to the SDKs abstracting away all the complexity. Having good SDKs that work across every platform with consistent APIs is more valuable than the SFU imo. There are other options for SFUs and folks like Signal have rolled their own. Try to get WebRTC running on Apple Vision Pro or tvOS and let me know if that’s no big deal.


> Try to get WebRTC running on Apple Vision Pro or tvOS and let me know if that’s no big deal.

[EDIT: I probably shouldn't mention that]. I have some experience of getting webrtc up on new platforms, and it's not as bad as all that. libwebrtc is a remarkably solid library, especially given the domain it's in.

I obviously do not share your opinion of the SDKs.


Heh, actually I'm pretty sure I've come across your X profile before. :) You're definitely in a small minority of folks with a deep(er) understanding of WebRTC.


There’s Ultravox as well (from one of the creators of WebRTC): https://github.com/fixie-ai/ultravox

Their model builds a speech-to-speech layer into Llama. Last I checked they have the audio-in part working and they’re working on the audio-out piece.


There’s a ton of complexity under the “relatively simple use case” when you get to a global, 200M+ user scale.


80% of the times I’m experiencing choppy audio on my iPhone 15 Pro Max (18.1b) on Voice Mode (Standard and Advanced). My internet connection is FTTH and WiFi 7 state of the art router.

I wonder if this is because bugs or the crazy load livekit may be going through given the popularity in ChatGPT voice modes right now.


Doesn’t sound right. I’d love to dig into this some more. Would you mind shooting me a DM on X? @dsa


We had our playground (https://playground.livekit.io) up for a few days using our key. Def racked up a $$$$ bill!


How much is it per minute of talking?


50% human speaking at $0.06/minute of tokens

50% AI speaking at $0.24/minute of tokens

we (LiveKit Cloud) charge ~$0.0005/minute for each participant (in this case there would be 2)

So blended is $0.151/minute


It shakes out to around $0.15 per minute for an average conversation. If history is a guide though, this will get a lot cheaper pretty quickly.


This is cheaper than old cellular calls, inflation adjusted


I had no idea! <3 Thank you for sharing this, made my weekend.


It's using the same model/engine. I don't have knowledge of the internals, but a different subsystem/set of dedicated resources though for API traffic versus first-party apps.

One thing to note is there is no separate TTS-phase here, it's happening internally within GPT-4o, in the Realtime API and Advanced Voice.


Thanks


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: