I could go pretty deep here, so let me know if I should elaborate on anything.
The backend is a Ruby on Rails application that serves the frontend app's API. This interfaces with the user tables, database, and handles all the "state" of the app.
The serverless stuff has changed over the months, but primarily it handles the stuff I don't want Rails to handle: file uploads, video processing and transcription.
First, huge props to the Mux (https://mux.com) team and product. I can not express how easy it has been to build video (and audio) products. File uploads are handled to AWS/GCP (depending on a few things) and then trigger a serverless callback to Mux.com. Mux was the fastest way we found to turn an arbitrary video file (mp4/mov/etc) into HLS format for quick streaming.
Then once the video is uploaded, we have another serverless callback that sends the video for transcription using Assembly AI (https://assemblyai.com). There are a ton of transcription based services and they vary dramatically in quality, based on the media content. I believe Google/Amazon services were largely built around the need to process phone calls, so unless you may for their "enhanced" models, the quality is surprisingly bad (and surprisingly slow).
I *highly highly* recommend Mux and Assembly AI if you are doing anything video/transcription based work.
To get an immediate update to the end user, we actually process two transcript requests - one that is just the first 60 seconds, and then the remainder of the video. This lets us render a preview transcript in the first 15-20 seconds.
We also have a serverless pipeline for generating the videos, but I won't go into that unless you're interested. In short, a serverless function kicks off a Docker instance running on ECS.
The requests to the serverless apps (mostly Node) have a callback to the Rails app, which then updates the end user state using websockets (which are very easy to use in Rail's ActionCable).
Interested to hear more about your pipeline and infrastructure for processing and delivering video. I'm working with processing short videos at the moment for my current startup, though I didn't use Mux (I figured it was a core competency we needed to develop). It's just a queue using FFMPEG to convert from MP4 to HLS.
I have horror stories about FFMPEG that I wont go into here.
In short, I'm just one person building this - so I'm sticking to what I know best. I want video to "just work" without having to worry about some video format/extension/containers that I have no idea about.
There are a number of video processing services, but Mux really is the best. The API is simple. They have a ton of really nice helper functions, that I use a lot (like timestamped thumbnails, preview gifs, and VTT storyboard generation), which I could easily spend a few days on making, and then countless hours maintaining.
I dont doubt that building video infra is a good idea, but just as I'm not about to train my own speech-to-text model, I'm not going to build out video infra.
At least for me, I'm more worried about the end user experience, and the more I can focus on that, the better the overall product will be.
I'm in the same boat - I'm the one building it, and my focus is on the user experience, but the business model won't tolerate the amount of video on someone else's service. :(
I haven't had FFMPEG nightmares yet, but I've done relatively little with it so far.
Any video apps I should look out for? I'm also pursuing a content creation angle that I've yet to spec out, so I'm always curious as to how others have approached the problem.
Hey! Jon from Mux here. Curious about this comment:
> the business model won't tolerate the amount of video on someone else's service
Does that mean you aren't using S3/EC2 or the like, or is there something about how we've built our cloud platform that doesn't work for your business model? We've designed Mux to be a low-level primitive for video, like Twilio is for SMS, so I'd be interested if we're doing something that makes this harder for you.
Hey Jon! I've looked at Mux (mainly the careers page), and it's a great platform. It would be a great fit technologically, but I'm not sure that my business model (which is tentative admittedly) would cover infra costs for processing, hosting and consuming the amount of video I'm eventually expecting, as I'm running on a shoestring budget at the moment.
Plus, it's a good chance to for me to learn the ins and outs of video. It's not reflective of the quality of your platform, just a choice I've made early in the piece for curiosity's sake.
The backend is a Ruby on Rails application that serves the frontend app's API. This interfaces with the user tables, database, and handles all the "state" of the app.
The serverless stuff has changed over the months, but primarily it handles the stuff I don't want Rails to handle: file uploads, video processing and transcription.
First, huge props to the Mux (https://mux.com) team and product. I can not express how easy it has been to build video (and audio) products. File uploads are handled to AWS/GCP (depending on a few things) and then trigger a serverless callback to Mux.com. Mux was the fastest way we found to turn an arbitrary video file (mp4/mov/etc) into HLS format for quick streaming.
Then once the video is uploaded, we have another serverless callback that sends the video for transcription using Assembly AI (https://assemblyai.com). There are a ton of transcription based services and they vary dramatically in quality, based on the media content. I believe Google/Amazon services were largely built around the need to process phone calls, so unless you may for their "enhanced" models, the quality is surprisingly bad (and surprisingly slow).
I *highly highly* recommend Mux and Assembly AI if you are doing anything video/transcription based work.
To get an immediate update to the end user, we actually process two transcript requests - one that is just the first 60 seconds, and then the remainder of the video. This lets us render a preview transcript in the first 15-20 seconds.
We also have a serverless pipeline for generating the videos, but I won't go into that unless you're interested. In short, a serverless function kicks off a Docker instance running on ECS.
The requests to the serverless apps (mostly Node) have a callback to the Rails app, which then updates the end user state using websockets (which are very easy to use in Rail's ActionCable).