Steph here - each image takes about ~250ms on a small single board compute like ...

Steph here - each image takes about ~250ms on a small single board compute like an Nvidia Orin Nano. On something larger like an RTX 4080 GPU it's less than 100ms. Because we're running big models we can't really just spin out more threads ourselves, we throw them over to the GPU (or deep learning accelerator - depending on the platform) and the driver's internal scheduler decides how to get it done.

In a robotic packaging scenario most of the time is spent by the robot picking up the objects and moving them, so for a 30 second cycle we usually get less than a second to take multiple pictures and make a decision about the part. For a smaller number of images - like 4 - it's pretty easy to handle with cheap hardware like an Orin Nano or Orin NX. If we've got more images (like 8) and a tight time budget (like less than 2 seconds) we'd usually just bump up the hardware, like going to a higher tier of Nvidia's line of Orins or using compute with an RTX 4080 GPU or equivalent in it.