Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
XNect: Real-Time Multi-Person 3D Motion Capture with a Single RGB Camera (mpg.de)
62 points by nabla9 on July 23, 2020 | hide | past | favorite | 20 comments


This is a great way that various disciplines in sports, dance, martial arts etc. could be captured and preserved. A great first step. Adding in support for additional programming languages like say python could open this up to a wider audience of DIY mocap.


Unfortunately, it appears this technology is made available only for research purposes¹)/people with an institutional email address.

Which means that the best way for DYI mocap that I know of is still getting hold of an increasingly rare Kinect One.

¹) https://gvv-assets.mpi-inf.mpg.de/xnect/?page_id=12


Is there a reasoning to limit the distribution to other researchers only?

It feels even more out-of-place considering most of the authors are affiliated with public institutes and likely benefited from public funding.

I actually thought this would be super neat for a poorman version of MS Kinect and hobbie gaming.


I'm always extremely skeptical of claims like this. For something with input this simple, it could be embedded into a web page and take arbitrary video to prove that it works as well as they claim.


I was thinking this. Perhaps the researchers have a buyer lined up, but were obliged to publish. So this way, they can at least have a place to start, if the tech leaks into the public domain, or finds it's way into some product.


So many algorithms insist on using "single RGB camera" approach, when it would be much more practical to use 2 cameras.


The power is the existing giant corpus of video. When I'm in front of a computer I'm going to run this up against the dancing of James Brown and Michael Jackson. Should be interesting.

Maybe an Olympic gymnast as well.


Practical for an algorithm implementer, maybe - deeply impractical for real-world use. Stereo cameras are rare and nontrivial (you have to synchronize shutters). Monocular algorithms can be applied to the millions of hours of existing footage, or used with the billions of cameras, smartphones, robots, drones, and fancy doorbells that already exist right now.


If you can calibrate the time-delay between the two cameras, can you not just interpolate one or the other signal backward or forward in time so that it aligns with the other? (By "interpolation", here, I mean the sort of thing the Oculus does on the display side, generating frames "between" frames, to smooth motion during head rotation. Take one real frame from one camera, and build an interpolated frame between two real frames from the other camera to match it.)


For “slow” moving object like human body, does synchronised shutter matters that much and if so , is there any tricks to compensate it if synchronisation is not possible?


You can do reasonable software sync with identical cameras and threading - it gets you to within a few milliseconds.

Even for slow objects it's a problem because being a few pixels off might make the difference between matching and not.


There are dozens of 360 cameras on the market, so I think shutters synchronization is not that difficult to implement.


360 cameras produce spherical panoramas from a single point and would not help capture of a person on a stage over a regular camera.


But 360° cams try to do something entirely different than, say a kinect.

I would be surprised if all 360° cameras had synchronized shutters.


Only for near-field. Further out than roughly an arm-span, your brain itself doesn't use binocular vision for 3D estimation because there's not that much information in the parallax.


do you have a source for that? because my source: one closed eye, tells me very clearly i do use binocular vision for 3d - 3d being how far something is.

you don't need 'much' information. you know the distance between the eyes, and then you have the 2 lines from each eye to the object. that's called a triangle. do you know how to calculate the height of a triangle? because that's your distance.


If you read my comment again, possibly all of it this time, you'll see that I'm not saying that we don't have or use binocular vision. I'm saying that there's a limit to how far out it's useful. That means adding a second camera is only going to be useful for a small number of tasks.

Messing about with triangles is called convergence. The accuracy falls off quite quickly with distance, and it's completely useless out past about 10m. Your brain has much better sources of depth cues before that point.

There are at least another 12 mechanisms humans use for depth perception, only one (arguably two) of which uses both eyes. I'll let you do the googling.


i don't need to 'google' what closing one eye does. that's the whole point of my comment. you claim it's unreasonable when people use their eyes for basic information. i say having to google what your own eyes tell you is what is unreasonable.


Does the video have interlacing problems? How does that happen?


How long until technology such as this is used for automatic "crime" detection with surveillance cameras?

Not looking forward to having that in the west too. This stuff is the wet dream of every authoritarian looking to effectively control a populace.

We really ought to amend our constitutions, adding protections against this, while we still can.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: