I totally agree with you that this tech needs to get better, but I really want to see apples-to-apples comparison. I would expect Tesla to also stop if a child was running across the movement path in broad daylight.
The night example looks to be specifically stacked against autopilot. Tesla vision is notoriously bad at detecting stationary objects and it needs a lot of light to function well. Lidar/Radar are significantly better than cameras detecting straight ahead obstacles in low light conditions. I would really like to hear Tesla to defend their decision to not use them.
In any case, this testing is great because it lets us know when the autopilot requires extra supervision.
> but I really want to see apples-to-apples comparison.
EDIT: Luminar's car is on the other lane, and there's also a balloon-child in the Luminar's lane. You can see Luminar's car clearly stop in the head-to-head test.
There's also the "advanced" test, where the kid moves out from behind an obstacle here. Luminar's tech does well:
This "tech" can't even see a firetruck in broad daylight. Why do you think it can see a child?
This isn't a one-off freak accident either. "crashing into stopped emergency vehicles with flashing lights in broad daylight" is common enough that NHTSA has opened up an investigation into this rather specific effect: https://static.nhtsa.gov/odi/inv/2021/INOA-PE21020-1893.PDF
I'm in Sweden, and the sun shining directly into your eyes from barely above the horizon while the road is wet/covered with snow and reflects that sun at you is a regular occurence during winter months. I odubt Tesla's camera will be able to see anything.
This is the reason why a single camera alone is not capable of being the sole source of information for a self-driving system. The technology currently available for camera systems does not capture a high enough dynamic range to be able to see details in darkness when the Sun in in frame. You could use multiple cameras all with different sensitivities to light and combine them, but it's going to be very difficult.
I really don't see what's difficult. You don't even need multiple cameras, you can simply use very short exposures and combine short exposure shots into a longer exposure one when needed. Multiple cameras are useful to handle glare though.
Why would it be very difficult? You can split the same light beam after the lens, and send it to two cameras with different diaphragm or sensitivity. You'd then synthesize a perfectly aligned HDR picture.
Its because Tesla cars are regularly causing "phantom braking" events.
Tesla is trapped between a rock and a hard place. Their "phantom braking" events are causing a lot of dismay to their drivers (https://electrek.co/2021/11/15/tesla-serious-phantom-braking...). But if they reduce phantom-braking, they increase the chance of hitting that child on the road.
Elon claims that the radar was the primary source of phantom braking. He said that matching up a high fidelity sensor (the cameras) with a lower fidelity sensor (the radar) was proving near impossible. I also suspect the supply chain pains massively factored into his decision to remove the radar from all vehicles since roughly late January of last year.
Anyone in the car industry would know this as obviously false? Radar based emergency breaking is availability and works really well in many cars from 5+ years ago.
Radar was removed in May 2021, which predates the article I quoted by multiple months.
I'm sure Elon was blaming Radar for phantom braking in the April / May time period. We can give a few months for the cars to update to the newest version as well.
But by November 2021, RADAR was no longer a relevant excuse. I think you may be mistaken about when Elon said what and when. You gotta keep the dates in mind.
Respectfully, you’re incorrect on the date of the Tesla vision only hardware release. My wife got a model y in early Feb 2021 and it was in the first batch of Tesla vision vehicles that did not ship with a radar. It was manufacturered in January as that’s when we got the VIN. This is first hand experience, not heresay. Elon announced it after they’d been shipping those vehicles for a bit. I was both amused and surprised. She was pissed off that Autopilot was nerfed compared to my 2018 model 3 for max speed as they were working out bugs in the Tesla Vision branch of the code.
I also never said a date about when Elon said those things in my comment, but now understand what you mean about post-vision. But the FSD Beta and Autopilot codebases are so different I am not sure I’d compare them for phantom braking (though recent FSD Beta appears to have way less of this occurrence).
But maybe I’m biased. We have two Teslas, one with, and one without a radar. We’ve seen much more phantom braking with my radar equipped model 3. Anecdotally, I find it happening less in the Y. Also, I didn’t click the article originally as Fred is a click diva and generally disliked by the Tesla community for his questionable reporting. Electrek is an EV fan blog, not much else.
WashPo reports a huge spike of federal complaints from Tesla owners starting in Oct 2021, well into the Vision-only Tesla technology
These are some pretty respectable sources. Federal complaints are public.
> “We primarily drove the car on two-lane highways, which is where the issues would show themselves consistently,” he said in an email. “Although my 2017 Model X has phantom braked before, it is very rare, the vision-based system released May 2021 is night and day. We were seeing this behavior every day.”
So we have Electrek, Washington Post, and the official NHTSA Federal registry in agreement over these phantom braking events spiking in October / November timeframe of 2021. I don't think this is an issue you can brush off with anecdotal evidence or anti-website kind of logic.
That’s totally fair. I’m not pretending it isn’t a problem. Phantom braking is scary as hell when you’re on the highway. I misread your comment on the date and think that’s the thing you really focused on, when I didn’t. You’re right. This is a serious problem.
Tesla partnered with Luminar by the way and even tested their LiDAR on a model 3 last year. I guess they weren't impressed though, since they seem to still be all-in on passive optical recognition.
> I guess they weren't impressed though, since they seem to still be all-in on passive optical recognition.
That's one take - the other take is that they have been selling cars claiming that they are capable of full FSD because they are going to sell it without Lidar, and have been selling FSD as a $5k bolt on, so swapping to Lidar at this point would be a PR nightmare even if it was a better solution....
That's the cynical view though... (Although I also wouldn't be the one to tell the people that have spent lots of money on Autopilot that they have bought total vaporware - or be the CFO that announces they are back-fitting Lidar cameras). Once you are all-in on 'lidar is shit' it makes it hard to reverse the trend, despite rapidly falling costs.
>Once you are all-in on 'lidar is shit' it makes it hard to reverse the trend
It can be done, if there's good cause. Just partner with your lidar oem of choice, get them to do a white paper about how the latest point increase version of hardware or firmware is "revolutionary!" and then claim that your earlier criticisms of lidar have been fully addressed by the groundbreaking new lidar tech.
I've actually been suspecting this will happen once solid state LIDAR technology crossed a certain threshold.
Traditional old school LIDAR units with spinning scan heads are why quite a few self driving cars have the odd bumps and protrusions on them. It's very easy to see someone who wants to make a "cool car" looking at these protrusions, deciding "lidar is shit" and doing everything possible to avoid it. There are some good engineering reasons to avoid traditional lidar units. Meanwhile solid state LIDAR tech has only been on the market for a few years and is still quite expensive compared to traditional LIDAR models, but its definitely superior for a lot of places people want to be able to use LIDAR or where LIDAR would be an excellent competitor to other technology currently in use such as 3D depth mapping and Time of Flight cameras. I briefly looked into some of this stuff when considering work on an "art game" using VR and various 3D scanning technologies in order to make a "fake" Augmented Reality experience as part of constructing the deliberate aesthetic choices of the project.
Solid state LIDAR will definitely be pushed forward by market demand for wider fields of view, lower costs, and smaller module size. All of which will eventually lead to a situation where it will be stupid not to augment the self driving technology due to the massive benefits with zero downsides.
One way out of the LIDAR PR dead end would be for Tesla:
1.) When solid state LIDAR is ready, re-brand it something like SSL technology (Solid State LIDAR) and put it on new high end Teslas.
2.) Wait for all 'camera only' enabled Teslas with FSD beta to age out of service and upsell the owners on a heavily discounted FSD subscription for their brand new Teslas with SSL.
A third path would be to frame the addition of solid state LiDAR as purely an enhancement to their existing cameras, framing it as a camera upgrade instead of a new separate sensor.
That's straight out of Apple's playbook. I recall how Tim Apple ridiculed the OLED displays, until it became impossible to ignore. So I guess it can be done.
> The accusations could be valid or totally baseless
Read the listed report. All 11 accidents were confirmed to be:
1. Tesla vehicles
2. Confirmed to be on autopilot / full self driving.
3. Against a stopped emergency vehicle with flashing lights or road flares.
These facts are not in dispute. The accusations aren't "baseless", the only question remaining is "how widespread" is this phenomenon.
These 11 accidents have resulted in 1-fatality and 11 injuries.
--------
We are _WAY_ past "validity" of the claims. We're at "lets set up demos at CES to market ourselves using Tesla as a comparison point", because Tesla is provably that unreliable at stopping in these conditions.
I'm fine doing away with Uber's self-driving as well. Although I think Tesla's is the worst of the lot, I'm not confident in or thrilled by any self-driving tech on public roads in the next decade
The exact situation where "uber self driving" killed a pedestrian was: the driver was literally watching a movie at her job, while she was supposed to be driving a car and training a self driving system.
Sure, but this was supposed to be fully autonomous. Nobody is arguing the human didn’t make a mistake. The autonomous system, however, definitely also did.
This may be technically true (I actually don't know what the drivers full intended purpose at the time was) but it doesn't negate some extremely sketchy software practices on a safety critical system, like "action suppression" to avoid nuisance braking.
As in most accidents of this nature, there is a chain of mistakes. It's bad practice to ignore some mistakes simply because we can also point to other failures in the chain of events.
Volvo's emergency braking system, which detected it and would have braked in time, had been restricted by Uber to not be able to take any action.
Uber's system was set in a way that "non identified object in my way" didn't trigger an immediate slow down, but instead a "sleep it off for a second and check again". It saw the impact coming, and instead of notifying the driver it decided to wait a full second first, because it was programmed to do so. Which any programmer can recognize as the "turn it off and on again" idiom that tells us their system was misidentifying lots of things.
What the driver did or did not do once notified doesn't change that. That car would have killed someone, somewhere, sometime, because it was programmed to not avoid it.
Wasn't this not a pedestrian, but a cyclist crossing in a completely inappropriate place? Granted an SDC should still react to this while many humans in the same situation would not.
Pedestrian slowly walking a bike, with basically no reflective clothing, on a dark night. This is exactly how humans kill pedestrians with cars all the time.
Sure, but that's also why that specific car was equipped with a radar-based obstacle detection.....which the company specifically disabled. There's a very good chance that this system would have saved that person's life. Also while yes, humans are crap at this, it's very rare that you'd just plow into someone at full speed without even attempting to slow down or swerve - which is exactly what the car did.
Tesla didn't use LIDAR because it is more expensive [0]. Quoting Musk:
> Anyone relying on LIDAR is doomed. Doomed. Expensive sensors that are unnecessary. It’s like having a whole bunch of expensive appendices... you’ll see.
Cost is not the only point he was making. The problem you need to solve is not just “Is there something?”, but also “What is it? And where is it going to move?”. LIDAR cannot do that. Or at least if you get LIDAR to do that, then you would have also been able to get it done with a camera, in which case you wouldn’t have needed LIDAR in the first place.
LIDAR certainly is the low hanging fruit when it comes to the firmer question though (i.e. what is there in my path right now).
One question I've always had about Tesla's sensor approach: why not use binocular forward facing vision? Seems like it would be a simple and cheap way to get reliable depth maps, which might help performance in the situations which currently challenge the ML. Detecting whether a stationary object (emergency vehicle or child or whatever) is part of the background would be a lot easier with an accurate depth map, or so it seems to me.
Plus using the same cameras would help prevent the issues with sensor fusion of the radar described by Tesla due to the low resolution of the radar.
I know the b-pillar cameras exist, but I don't think their FOV covers the entire forward view, and I don't think they have the same resolution as the main forward cameras (partly due to wide FOV).
Sure, but they're not getting that 3d map from binocular vision. The forward camera sensors are within a few mm of each other and different focal lengths.
And the tweet thread you linked confirms it's a ML depth map:
> Well, the cars actually have a depth perceiving net inside indeed.
My speculation was that a binocular system might be less prone to error than the current net.
Sure. You're suggesting that Tesla could get depth perception by placing two identical cameras several inches apart from each other, with an overlapping field of view.
I'm just wondering if using cameras that are close to each other, but use different focal lengths, doesn't give the same results.
It seems to me that this is how modern phones are doing background removal: The lenses are very close to each other, very unlike the human eye. But they have different focal lengths, so depth can be estimated based on the diff between the images caused by the different focal lengths.
Also, wouldn't turning a multitude of views into a 3D map require a neural net anyway?
Whether the images differ because of different focal lengths or because of different positions seems to be essentially the same training task. In both cases, the model needs to learn "This difference in those two images means this depth".
I think with the human eye, we do the same thing. That's why some optical illusions work that confuse your perception of which objects are in front and which are in the back.
And those illusions work even though humans actually have an advantage over cheap fixed-focus cameras, in that focusing the lens on the object itself gives an indication of the object's distance. Much like you could use a DSL as a measuring device by focusing on the object and then checking the distance markers on the lens' focus ring. Tesla doesn't have that advantage. They have to compare two "flat" images.
> I'm just wondering if using cameras that are close to each other, but use different focal lengths, doesn't give the same results
I can see why it might seem that way intuitively, but different focal lengths won't give any additional information about depth, just the potential for more detail. If no other parameters change, an increase in focal length is effectively the same as just cropping in from a wider FOV. Other things like depth of field will only change if e.g. the distance between the subject and camera are changed as well.
The additional depth information provided by binocular vision comes from parallax [0].
> Also, wouldn't turning a multitude of views into a 3D map require a neural net anyway?
Not necessarily, you can just use geometry [1]. Stereo vision algorithms have been around since the 80s or earlier [2]. That said, machine learning also works and is probably much faster. Either way the results should in theory be superior to monocular depth perception through ML, since additional information is being provided.
> It seems to me that this is how modern phones are doing background removal: The lenses are very close to each other, very unlike the human eye. But they have different focal lengths, so depth can be estimated based on the diff between the images caused by the different focal lengths.
Like I said, there isn't any difference when changing focal length other than 'zooming'. There's no further depth information to get, except for a tiny parallax difference I suppose.
Emulation of background blur can certainly be done with just one camera through ML, and I assume this is the standard way of doing things although implementations probably vary. Some phones also use time-of-flight sensors, and Google uses a specialised kind of AF photosite to assist their single sensor -- again, taking advantage of parallax [3]. Unfortunately I don't think the Tesla sensors have any such PDAF pixels.
This is also why portrait modes often get small things wrong, and don't blur certain objects (e.g. hair) properly. Obviously such mistakes are acceptable in a phone camera, less so in an autonomous car.
> And those illusions work even though humans actually have an advantage over cheap fixed-focus cameras, in that focusing the lens on the object itself gives an indication of the object's distance
If you're referring to differences in depth of field when comparing a near vs far focus plane, yeah that information certainly can be used to aid depth perception. Panasonic does this with their DFD (depth-from-defocus) system [4]. As you say though, not practical for Tesla cameras.
>different focal lengths won't give any additional information about depth, just the potential for more detail.
This is also why some people will optimize each eye for different focal length when getting laser eye surgery. When your lens is too stiff from age, it won't provide any additional depth perception but will give you more detail at different distances.
Wow. Ok. I did not know that. I thought that there is depth information embedded in the diff between the images taken at different focal lengths.
I'm still wondering. As a photographer, you learn that you always want to use a focal length of 50mm+ for portraits. Otherwise, the face will look distorted. And even a non-photographer can often intuitively tell a professional photo from an iPhone selfie. The wider angle of the iPhone selfie lens changes the geometry of the face. It is very subtle. But if you took both images and overlayed them, you see that there are differences.
But, of course, I'm overlooking something here. Because if you take the same portrait at 50mm and with, say, 20mm, it's not just the focal length of the camera that differs. What also differs is the position of each camera. The 50mm camera will be positioned further away from the subject, whereas the 20mm camera has to be positioned much closer to achieve the same "shot".
So while there are differences in the geometry of the picture, these are there not because of the difference in the lenses being used, but because of the difference in the camera-subject distance.
So now I'm wondering, too, why Tesla decided against stereo vision.
It does seem, though, that they are getting that depth information through other means:
Perhaps it helps that the vehicle moves? That is, after all, very close to having the same scene photographed by cameras positioned at different distances. Only that Tesla uses the same camera, but has it moving.
Also, among the front-facing cameras, the two outermost are at least a few centimeters apart. I haven't measured it, but it looks like a distance not unlike between a human's eyes [0]. Maybe that's already enough?
> But, of course, I'm overlooking something here. Because if you take the same portrait at 50mm and with, say, 20mm, it's not just the focal length of the camera that differs. What also differs is the position of each camera. The 50mm camera will be positioned further away from the subject, whereas the 20mm camera has to be positioned much closer to achieve the same "shot".
Yep, totally.
> Perhaps it helps that the vehicle moves? That is, after all, very close to having the same scene photographed by cameras positioned at different distances.
I think you're right, they must be taking advantage of this to get the kind of results they are getting. That point cloud footage is impressive, it's hard to imagine getting that kind of detail and accuracy just from individual 2d stills.
Maybe this also gives some insight into the situations where the system seems to struggle. When moving forward in a straight line, objects in the peripheral will shift noticeably in relative size, position and orientation within the frame, whereas objects directly in front will only change in size, not position or orientation. You can see this effect just by moving your head back and forth.
So it might be that the net has less information to go on when considering objects stationary directly in or slightly adjacent to the vehicles path -- which seems to be one of the scenarios where it makes mistakes in the real world, e.g. with stationary emergency vehicles. I'm just speculating here though.
> Also, among the front-facing cameras, the two outermost are at least a few centimeters apart. I haven't measured it, but it looks like a distance not unlike between a human's eyes [0]. Maybe that's already enough?
Maybe. The distance between the cameras is pretty small from memory, less than in human eyes I would say. It would also only work over a smaller section of the forward view due to the difference in focal length between the cams. I can't help but think that if they really wanted to take advantage of binocular vision, they would have used more optimal hardware. So I guess that implies that the engineers are confident that what they have should be sufficient, one way or another.
Because Tesla have demonstrated that it's unnecessary. The depth information they are getting from the forward-facing camera is exceptional. Their vision stack now produces depth information that is dramatically superior to that from a forward-facing radar.
(It's also worth noting that depth information can be validated when the vehicle is in motion, because a camera in motion has the ability to see the scene from multiple angles, just like a binocular configuration. This is how Tesla trains the neural networks to determine depth from the camera data.)
It makes intuitive sense since you can say, play video games with one eye closed. Yes you lose field of view. Yes you lose some depth perception. But you don’t need to touch your finger tips and all your ability to make predictive choices and scan for things in your one-eyed field of view remains intact.
In fact, we already have things with remote human pilots.
So increasing the field of view with a single camera should intuitively work as long as the brains of the operation was up to the task.
What I was talking about was largely doesn’t apply to the Autopilot legacy stack currently deployed to most Tesla cars.
Personally I wish Tesla would spend a couple of months cleaning up their current beta stack and deploying it specifically for AEB. But I don’t know if that’s even feasible without affecting the legacy stack.
> Their vision stack now produces depth information that is dramatically superior to that from a forward-facing radar.
RADAR is more low fidelity though, blocky, slow and doesn't do changes in direction or dimension very well. RADAR isn't as good as humans at depth. Only benefit of RADAR is it works well in weather/night and near range as it is slower to bounce back than lasers. I assume the manholes and bridges that confuse RADAR are due to the low fidelty / blocky feedback.
LiDAR is very high fidelity and probably more precise than the pixels. LiDAR is better than humans at depth and at distance. LiDAR isn't as good at weather, neither is computer vision. Great for 30m-200m. Precise depth, dimension, direction and size of object in motion or stationary.
See the image at the top of this page and overview on it. [1]
> High-end LiDAR sensors can identify the details of a few centimeters at more than 100 meters. For example, Waymo's LiDAR system not only detects pedestrians but it can also tell which direction they’re facing. Thus, the autonomous vehicle can accurately predict where the pedestrian will walk. The high-level of accuracy also allows it to see details such as a cyclist waving to let you pass, two football fields away while driving at full speed with incredible accuracy.
> Because Tesla have demonstrated that it's unnecessary. The depth information they are getting from the forward-facing camera is exceptional.
Sure! Here's a Tesla using its exceptional cameras to decide to drive into a couple of trucks. For some strange reason the wretched human at the wheel disagreed with the faultless Tesla:
That was an issue with the path planner, not depth perception, as demonstrated by the visualisation on screen. The challenge of path planning is underrated, and it's not a challenge that gets materially easier with the addition of LIDAR or HD maps. At best it allows you to replace one set of boneheaded errors with another set of boneheaded errors.
No! It was an issue with the trucks! They shouldn't have been in the way in the first place! Don't they know a Tesla is driving through? They mustn't have been able to see it since they lack exceptional cameras.
Because the software running in release mode is a much, much older legacy stack. (Do we know if the vehicle being tested was equipped with radar or vision only?)
But AI and ML isn't as good as a human brain or maybe any brain. I imagine the gap has to be closed with better and multiple sensors or make fundamental leaps in computing technology.
I've never understood their reasoning. It sounds like a Most Interesting Man in the World commercial: "I don't always tackle the hardest AI problems known to mankind, but when I do, I tie one hand behind my back by not fusing data from every possible sensor I can find in the DigiKey catalog."
IR lidar would be pretty useful in rain and fog, I'd think. But I'd rather have all three -- lidar, radar, and visual. Hell, throw in ultrasonic sonar too. That's what Kalman filters are for. Maybe then the system will notice that it's about to ram a fire truck.
The puzzle piece you are missing is that sensor fusion is not an easy problem either. The Tesla perspective is that adding N different sensors into the mix means you now have N*M problems instead of M.
I hope that's not their perspective, because that perspective would be wrong. There are entire subdisciplines of control theory devoted to sensor fusion, and it's not particularly new. Rule 1: More information is better. Rule 2: If the information is unreliable (and what information isn't?), see rule 1.
Some potential improvements are relatively trivial, even without getting into the hardcore linear algebra. If the camera doesn't see an obstacle but both radar and lidar do, that's an opportunity to fail relatively safely (potential false braking) rather than failing in a way that causes horrific crashes.
Bottom line: if you can't do sensor fusion, you literally have no business working on leading-edge AI/ML applications.
Every Pro iPhone has one. So it already got pretty cheap by now. Looking at Mercedes' Level 3 Autopilot tech you can also see how well you can integrate the sensors into the front of a car.
At the time of comment, a LiDAR rig would cost around $10,000. A few years before that, they were more like $100,000. Presumably the cameras are much cheaper.
I would be willing to bet that production efficiencies will be found that will eventually drive that cost down significantly.
To be fair, it's not a computer performance benchmark being gamed here. If nightime is problematic, autopilot shouldn't be running at night. Because if I'm a pedestrian then the odds are stacked against me with any physical encounter with a vehicle. Fairness in the scenario setup shouldn't really be part of the conversation unless it goes beyond the claims if the manufacturer, i.e., if Tesla had said "autopilot does not function in these conditions and should not be used at those times" and then "nightime" was one of those conditions listed. If Tesla hasn't said that a scenario is outside the scope of AutoPilot then the scenario is an appropriate test & comparison point.
> if Tesla had said "autopilot does not function in these conditions and should not be used at those times" and then "nightime" was one of those conditions listed. If Tesla hasn't said that a scenario is outside the scope of AutoPilot then the scenario is an appropriate test & comparison point.
I'd go further and say add "and set the software not to engage this feature at nighttime". Simple disclaimers are not enough when lives are at stake.
I'd never trust a "self-driving" car without lidar. It should be a requirement. There's tons of research on how easy it is to completely fool neural nets with images.
The night example looks to be specifically stacked against autopilot.
I don't think so. If the autopilot can't be used at night, I - who live in Norway - just can't use it during the winter as there isn't enough light. I don't even live above the arctic circle and am lucky enough to get 4-5 hours of (somewhat dimmed) daylight during the darkest times.
If it doesn't do night, it is simply a gimmick to lure you into paying for a promise of a car.
The night example looks to be specifically stacked against autopilot. Tesla vision is notoriously bad at detecting stationary objects and it needs a lot of light to function well. Lidar/Radar are significantly better than cameras detecting straight ahead obstacles in low light conditions. I would really like to hear Tesla to defend their decision to not use them.
In any case, this testing is great because it lets us know when the autopilot requires extra supervision.