> How do you break down the segments/sections? Is it just fixed time? What happens if there are more than one topic discussed in the segment?
Currently it's just a dumb fixed time rule, based on max video length (3 or 5mn segments). I played around a bit and it's the easiest way to implement things that works remarkably well. If there are multiple topics, there a few branching paths in the code, but a lot of it comes down to believing in the LLM's ability to make sense of it. I've got some ideas to improve, but would need a bunch of work to implement well.
> Are you using both chatgpt and mistral? Do you use them for different tasks?
There's a degree of A/B testing (well, "A/B testing", since we're not collecting feedback) where some of the summaries are GPT, some of them are mistral, mixed together for the same video. Mistral being superbly fast means it's also really useful to support the branching coding logic (e.g. something I'm working on right now is having an entirely different summarisation style if a video is about sports, and while a logistic regression would do that pretty well, it's not particularly robust, and won't tell me what sport it is if the transcript is full of typo) or to clean up the video transcripts.
Are you using both chatgpt and mistral? Do you use them for different tasks?