Full disclosure, I work at Foxglove right now. Before joining, I spent over seven years consulting and had more than 50 clients during that period. Here are some thoughts:
* Combing through the syslogs to find issues is an absolute nightmare, even more so if you are told that the machine broke at some point last night
* Even if you find the error, it's not necessarily when something broke; it could have happened way before, but you just discovered it because the system hit a state that required it
* If combing through syslog is hard, try rummaging through multiple mcap files by hand to see where a fault happened
* The hardware failing silently is a big PITA - this is especially true for things that read analog signals (think PLCs)
Many of the above issues can be solved with the right architecture or tooling, but often the teams I joined didn't have it, and lacked the capacity to develop it.
At Foxglove, we make it easy to aggregate and visualize the data and have some helper features (e.g., events, data loaders) that can speed up workflows. However, I would say that having good architecture, procedures, and an aligned team goes a long way in smoothing out troubleshooting, regardless of the tools.
This is super insightful, thank you for laying it out so clearly. Your point about the error surfacing way after it first occurred is exactly the sort of issue we’re interested in tackling. Foxglove is doing a great job with visualization and aggregation; what we’re thinking is more of a complementary diagnostic layer that:
• Correlates syslogs with mcap/bag file anomalies automatically
• Flags when a hardware failure might have begun (not just when it manifests)
• Surfaces probable root causes instead of leaving teams to manually chase timestamps
From your experience across 50+ clients, which do you think is the bigger timesink: data triage across multiple logs/files or interpreting what the signals actually mean once you’ve found them?
Our current thinking is to focus heavily on automating triage across syslogs and bag/mcap files, since that’s where the hours really get burned, even for experienced folks. For interpretation, we see it more as an assistive layer (e.g., surfacing “likely causes” or linking to past incidents), rather than trying to replace domain expertise.
Do you think there are specific triage workflows where even a small automation (say, correlating error timestamps across syslog and bag files) would save meaningful time?
One thing that comes to mind is checking the timestamps across sensors and other topics. Two cases come to mind:
* I was setting up Ouster lidar to use gos time, don’t remember the details now but it was reporting the time ~32 seconds in the past (probably some leap seconds setting?)
* I had a ROS node misbehaving in some weird ways - it turned out there was a service call to insert something into db and for some reason the db started taking 5+ minutes to complete which wasn’t really appropriate for a blocking call
I think the timing is one thing that needs to be consistently done right on every platform. The other issues I came across were very application specific.
I have 11 years of experience making software for robots. For the past 7 years, I've been working as a consultant on various projects. I have experience building software for hardware platforms, from autonomous mobile robots through drones to heavy-metal industrial robots. These days, I have a slight preference for consulting projects but can consider a full-time position if there is a good match.
Anyone has any stories on companies overusing AI? I’ve had some very frustrating encounters already when non-technical people were trying to help by sending AI solution to the issue which totally didn’t make any sense. I liked how the researchers in this work [1] prose calling LLM output “Frankfurtian BS”. I think it’s very fitting.
I'm a robotics software developer with experience leading teams and integrating software on various platforms (custom industrial robots in smelters, UAVs, rovers, USVs). Have experience on robotics full-stack from low-level interfaces of actuators, sensors etc. to deploying navigation and behaviour trees. Mostly experienced with ROS, but did a fair bit of work working with Px4 autopilot. My super power is putting things together and diagnosing difficult issues.
I’ve been consulting for close to 7 years in robotics now, robotics being a niche field I guess it has been a bit easier for me to find clients. I never had a period without a client and never had to do cold outreach.
Here are some things I learned:
* You can usually charge way more per project basis instead of charging by the hour but it’s crazy difficult to get it right for R&D projects
* Most of my clients came to me through my blog
* 2 years ago UpWork was surprisingly good for finding projects, these days there is lots of AI stuff coming from freelancers when they apply and I’m not so sure anymore
* Discuss price as early as possible to not waste time
"2 years ago UpWork was surprisingly good for finding projects, these days there is lots of AI stuff coming from freelancers when they apply and I’m not so sure anymore "
I get one good client on Upwork a year. 99% of the clients I've found are either unwilling to pay market rates, are subcontracting out before they actually landed the project (had lots of initial meetings and then suddenly they didn't get the project, final meeting fell through), or are trying to build up a team first, with no clients (I had 2 of these already this year).
They canned the UP! - there was not enough money to make for them in that market.
I currently have a 2017 Polo and there is nothing below in their model lineup and when I look at the replacement Polo it is bigger and more expensive and now the size of a Golf.
Either people are getting bigger, or the concept of an entry level affordable car escapes car manufacturers.
It is as you wrote: there's very little profit in small city cars. They (and others) decided to not service that market anymore and focus on segments that have higher profit margins.
Electrification compounds this issue: customers want range which means kilo's of battery which further impacts the profit on those small models.
I also think that for 20k the up! just wasn't a good car. Which probably was one of the reasons it didn't get a new model.
>Seems like they are walking in circles.
Why? They are gradually releasing EVs for different segments of the market. The e-up! and e-Golf were very clearly meant to test designing developing and building EVs.
One thing I learned really quickly renting in Europe is to take pictures of the rental before driving off. Last time I rented a car there was a huge scratch on the side that wasn’t included in their pickup form. When I returned the car they were ready to bill me for it until I showed the picture and their response was: “OK, no worries then, all is good”. From the reviews I’ve been reading people often get charged this way and I’m wondering how many people end up paying for the same damage.
I just had this in Hawaii (Big Island). Dollar car rental dude checks my car when I return it and decided some scratch on the fender was new (I'm pretty sure it was not). I had full insurance and I told him he can figure it out with the insurance company, that I was in rush to catch my flight, and he can do whatever he wants. They didn't do anything. They're just looking for some sucker they can charge extra on some random thing.
I've had a lot of rental car experiences. Usually it's fine. Taking photos sounds like a good idea.
Hey!
I love Nix, and I've been using it as my daily driver for more than 1 year.
There is a lot of people putting a lot of energy on documenting and explaining, but the current recommendation is _suffer_.
For Docker, you could start here in this HN thread [1], for NixOS and flakes there is a video series and git [2] I used at the begining which I liked.
I wanted something a bit more 'complete', so if you will you can read through my nix repo [3].
I have built Portry and Go applications (that I'll push to nixpkgs at some point) and GCP images for VMs, so if you need a reference for that just ask :)
I maintain a list of interesting robotics projects (https://github.com/msadowski/awesome-weekly-robotics). You will find a simulator section there but whether they are useful for you will depend on the details of your project. I don’t think it’s there but you could also check out MuJoCo.
* Combing through the syslogs to find issues is an absolute nightmare, even more so if you are told that the machine broke at some point last night
* Even if you find the error, it's not necessarily when something broke; it could have happened way before, but you just discovered it because the system hit a state that required it
* If combing through syslog is hard, try rummaging through multiple mcap files by hand to see where a fault happened
* The hardware failing silently is a big PITA - this is especially true for things that read analog signals (think PLCs)
Many of the above issues can be solved with the right architecture or tooling, but often the teams I joined didn't have it, and lacked the capacity to develop it.
At Foxglove, we make it easy to aggregate and visualize the data and have some helper features (e.g., events, data loaders) that can speed up workflows. However, I would say that having good architecture, procedures, and an aligned team goes a long way in smoothing out troubleshooting, regardless of the tools.