This is very cool! I’ve wanted something like CodeMic for a long time.
Back when I was at Twitter, we used Review Board for code reviews (this was in 2009, before GH was a thing for most companies). It was tough to thoughtfully review large branches, especially for parts of the codebase that I wasn’t familiar with. I remember thinking, if I could somehow record the development process for a PR I was reviewing, it would be easier to understand what the submitter was trying to accomplish and how they went about doing so. I found myself more so reviewing code style instead of functionality, architecture, or design.
I watched most of the intro video, but didn’t go deeper on the site. Does CM integrate easily into the code review/PR process? I suppose I could just attach a link in any PR description?
Thanks a lot! I have thought about it being useful for learning, for fun, as a new kind of documentation, or even for onboarding new hires but the use case for code review didn't occur to me. That's great. I can think of 3 ways to share sessions:
- attach a link as you said
- once the web player is ready, it could perhaps be integrated into the code review tool. It'll be like embedding a youtube video
- the entire recorded session can be exported as a zip file and attached to the PR or shared privately
This is super cool. One neat idea: when I'm in offline mode, I can clone my voice, provide some context data/sources, and have my AI clone answer calls for me. It can give me a summary of conversations it had each day and allow me to follow up.
We’re definitely heading in that direction and currently experimenting with LiveKit’s Agent framework. I’m guessing you’re Russ from LiveKit? If so, I’m a huge fan of what you’re doing! Would love to connect and explore ideas further [email protected]
In most real applications, the agent has additional logic (function calling, RAG, etc) than simply relaying a stream to the model server. In those cases, you want it to be a separate service/component that can be independently scaled.
Essentially I think the Livekit value is a SFU that works, with signalling, and the SDKs exist. My experience is people radically overstate how hard signalling is, and underestimate SFU complexity, especially with fast failover.
In terms of being a higher level API arguably it is doomed to failure, thanks to the madness of the domain. (The part that sticks in my mind is audio device switching on Android.) WebRTC products seem to always end up with the consumer needing to know way more of the internals than is healthy. As such I think once you are sufficiently good at using LiveKit you are less likely to pick it for your next product because you will be able to roll your own far more easily. That is unless the value you were getting from it actually was the SFU infrastructure and not the SDKs.
The OpenAI case is so point-to-point that doing WebRTC for that is, honestly, really not hard at all.
You really don’t need to know about WebRTC at all when you use LiveKit. That’s largely thanks to the SDKs abstracting away all the complexity. Having good SDKs that work across every platform with consistent APIs is more valuable than the SFU imo. There are other options for SFUs and folks like Signal have rolled their own. Try to get WebRTC running on Apple Vision Pro or tvOS and let me know if that’s no big deal.
> Try to get WebRTC running on Apple Vision Pro or tvOS and let me know if that’s no big deal.
[EDIT: I probably shouldn't mention that]. I have some experience of getting webrtc up on new platforms, and it's not as bad as all that. libwebrtc is a remarkably solid library, especially given the domain it's in.
I obviously do not share your opinion of the SDKs.
Heh, actually I'm pretty sure I've come across your X profile before. :) You're definitely in a small minority of folks with a deep(er) understanding of WebRTC.
80% of the times I’m experiencing choppy audio on my iPhone 15 Pro Max (18.1b) on Voice Mode (Standard and Advanced). My internet connection is FTTH and WiFi 7 state of the art router.
I wonder if this is because bugs or the crazy load livekit may be going through given the popularity in ChatGPT voice modes right now.
It's using the same model/engine. I don't have knowledge of the internals, but a different subsystem/set of dedicated resources though for API traffic versus first-party apps.
One thing to note is there is no separate TTS-phase here, it's happening internally within GPT-4o, in the Realtime API and Advanced Voice.
Back when I was at Twitter, we used Review Board for code reviews (this was in 2009, before GH was a thing for most companies). It was tough to thoughtfully review large branches, especially for parts of the codebase that I wasn’t familiar with. I remember thinking, if I could somehow record the development process for a PR I was reviewing, it would be easier to understand what the submitter was trying to accomplish and how they went about doing so. I found myself more so reviewing code style instead of functionality, architecture, or design.
I watched most of the intro video, but didn’t go deeper on the site. Does CM integrate easily into the code review/PR process? I suppose I could just attach a link in any PR description?
Great work!