XNect: Real-Time Multi-Person 3D Motion Capture with a Single RGB Camera

IOT_Apprentice · on July 23, 2020

This is a great way that various disciplines in sports, dance, martial arts etc. could be captured and preserved. A great first step. Adding in support for additional programming languages like say python could open this up to a wider audience of DIY mocap.

rednab · on July 23, 2020

Unfortunately, it appears this technology is made available only for research purposes¹)/people with an institutional email address.

Which means that the best way for DYI mocap that I know of is still getting hold of an increasingly rare Kinect One.

¹) https://gvv-assets.mpi-inf.mpg.de/xnect/?page_id=12

DoingIsLearning · on July 23, 2020

Is there a reasoning to limit the distribution to other researchers only?

It feels even more out-of-place considering most of the authors are affiliated with public institutes and likely benefited from public funding.

I actually thought this would be super neat for a poorman version of MS Kinect and hobbie gaming.

CyberDildonics · on July 23, 2020

I'm always extremely skeptical of claims like this. For something with input this simple, it could be embedded into a web page and take arbitrary video to prove that it works as well as they claim.

dynamite-ready · on July 23, 2020

I was thinking this. Perhaps the researchers have a buyer lined up, but were obliged to publish. So this way, they can at least have a place to start, if the tech leaks into the public domain, or finds it's way into some product.

maxst · on July 23, 2020

So many algorithms insist on using "single RGB camera" approach, when it would be much more practical to use 2 cameras.

kristopolous · on July 23, 2020

The power is the existing giant corpus of video. When I'm in front of a computer I'm going to run this up against the dancing of James Brown and Michael Jackson. Should be interesting.

Maybe an Olympic gymnast as well.

dTal · on July 23, 2020

Practical for an algorithm implementer, maybe - deeply impractical for real-world use. Stereo cameras are rare and nontrivial (you have to synchronize shutters). Monocular algorithms can be applied to the millions of hours of existing footage, or used with the billions of cameras, smartphones, robots, drones, and fancy doorbells that already exist right now.

derefr · on July 23, 2020

If you can calibrate the time-delay between the two cameras, can you not just interpolate one or the other signal backward or forward in time so that it aligns with the other? (By "interpolation", here, I mean the sort of thing the Oculus does on the display side, generating frames "between" frames, to smooth motion during head rotation. Take one real frame from one camera, and build an interpolated frame between two real frames from the other camera to match it.)

RavlaAlvar · on July 23, 2020

For “slow” moving object like human body, does synchronised shutter matters that much and if so , is there any tricks to compensate it if synchronisation is not possible?

joshvm · on July 23, 2020

You can do reasonable software sync with identical cameras and threading - it gets you to within a few milliseconds.

Even for slow objects it's a problem because being a few pixels off might make the difference between matching and not.

maxst · on July 23, 2020

There are dozens of 360 cameras on the market, so I think shutters synchronization is not that difficult to implement.

CyberDildonics · on July 23, 2020

360 cameras produce spherical panoramas from a single point and would not help capture of a person on a stage over a regular camera.

mlatu · on July 23, 2020

But 360° cams try to do something entirely different than, say a kinect.

I would be surprised if all 360° cameras had synchronized shutters.

regularfry · on July 23, 2020

Only for near-field. Further out than roughly an arm-span, your brain itself doesn't use binocular vision for 3D estimation because there's not that much information in the parallax.

dungdang · on July 23, 2020

do you have a source for that? because my source: one closed eye, tells me very clearly i do use binocular vision for 3d - 3d being how far something is.

you don't need 'much' information. you know the distance between the eyes, and then you have the 2 lines from each eye to the object. that's called a triangle. do you know how to calculate the height of a triangle? because that's your distance.

regularfry · on July 23, 2020

If you read my comment again, possibly all of it this time, you'll see that I'm not saying that we don't have or use binocular vision. I'm saying that there's a limit to how far out it's useful. That means adding a second camera is only going to be useful for a small number of tasks.

Messing about with triangles is called convergence. The accuracy falls off quite quickly with distance, and it's completely useless out past about 10m. Your brain has much better sources of depth cues before that point.

There are at least another 12 mechanisms humans use for depth perception, only one (arguably two) of which uses both eyes. I'll let you do the googling.

dungdang · on July 25, 2020

i don't need to 'google' what closing one eye does. that's the whole point of my comment. you claim it's unreasonable when people use their eyes for basic information. i say having to google what your own eyes tell you is what is unreasonable.

Mashimo · on July 23, 2020

Does the video have interlacing problems? How does that happen?

chmod775 · on July 23, 2020

How long until technology such as this is used for automatic "crime" detection with surveillance cameras?

Not looking forward to having that in the west too. This stuff is the wet dream of every authoritarian looking to effectively control a populace.

We really ought to amend our constitutions, adding protections against this, while we still can.