Agree that “data realism” is the quiet differentiator in mature visual generation domains.
Floor plans / technical drawings feel a lot less mature though — we don’t really have generators that are “good” in the sense that they preserve the constraints that matter (scale, closure, topology, entrances, unit stats, cross-floor consistency, etc.). A lot of outputs can look plausible but fall apart the moment you treat them as geometry for downstream tasks.
That’s why I’ve been pushing the idea that simplistic generators are kind of doomed without a context graph (spatial topology + semantics + building/unit/site constraints, ideally with environmental context). Otherwise you’re generating pretty pictures, not usable plans.
Also: I’m a bit surprised how few researchers have used these datasets for basic EDA. Even before training anything, there’s a ton of value in just mapping distributions, correlations, biases, and failure modes. Feels like we’re skipping the “understand the data” step far too often.
Totally agree that for floor plans the bottleneck is usually label/geometry quality, not model architecture. We looked at CV early on, but real plan archives are a pretty adversarial input: ~100-year-old drawings mixed with modern exports, lots of drafting styles/implicit ontologies, low-res scans + distortion, and sometimes multiple conflicting “truths” for the same plan (revisions, partial updates, different sources). Even with decent models, you still pay heavily in expert cleanup.
So we optimized against the real baseline: manual CAD-style annotation. The “data-centric” work for us was making manual annotation cheap and auditable: limited ontology, a web editor that enforces structure (scale normalization, closed rooms, openings must attach to walls, etc.), plus hard QA gates against external numeric truth (client index / measured areas, room counts). Typical QA tolerance is ~3%; in Swiss Dwellings we report median area deviation <1.2% with a hard max of 5%. Once we could hit those bounds at <~1/10th the prevailing manual cost, CV stopped being a clear value add for this stage.
On ambiguity (doors vs windows, stairs vs ramps): we try not to “guess” — we push it into constraints + consistency checks (attachment to walls, adjacency, unit connectivity, cross-floor consistency) and flag conflicts for review. On generalization: I don’t think this is zero-shot across styles; the goal is bounded adaptation (stable primitives + QA gates, small mapping/rules layer changes). Trade-off is less expressiveness, but for geometry-sensitive downstream tasks small errors compound fast.
i am currently working on a paper in this field, focusing on the capitalisation of expertise (analogue to marx) in the dynamics of cultural industry (adorno, horkheimer). it integrates the theories of piketty and luhmann. it is rather theoretical, with a focus on the european theories (instead of adorno you could theoretically also reference chomsky). is this something you would be interested in? i can share the link of course
Be careful, barely mentioning Marx, Chomsky or Picketty is a thoughtcrime in the new US. Many will shut themselves down to not have to engage with what you are saying.
Here are my 2¢:
Leverage platforms that offer mock technical interviews (e.g., Pramp, Interviewing.io, probably there are others too). This approach lets you simulate the interview experience in a risk-free environment, getting you accustomed to the format and the pressure. It’s crucial to receive feedback, and these platforms pair you with industry professionals who can provide just that. This method is effective because it targets your interview skills directly, allows for rapid iteration based on feedback, and builds your confidence in a more controlled setting than actual interviews.
Aside from just technical skills, these mock interviews can help you articulate your thought process clearly, which is often as important as the solutions themselves in ML roles. Remember, it’s not just about getting the right answer, but also showing how you approach problems.
A side note based on a pattern I've observed so far: candidates who practice like this tend to perform better not just in technical assessments but also in explaining their past projects and teamwork experiences, which are equally critical parts of the interview process.
Hope this helps. Dive in, get that feedback, and refine your approach. Good luck!
There are also plenty of virtual groups around that offers p2p interviews (each side interviews another).
These might be better for initial warmup as the only cost is an extra hour of your time (but it also allows you to experience the interview from the interviewer's perspective).
Floor plans / technical drawings feel a lot less mature though — we don’t really have generators that are “good” in the sense that they preserve the constraints that matter (scale, closure, topology, entrances, unit stats, cross-floor consistency, etc.). A lot of outputs can look plausible but fall apart the moment you treat them as geometry for downstream tasks.
That’s why I’ve been pushing the idea that simplistic generators are kind of doomed without a context graph (spatial topology + semantics + building/unit/site constraints, ideally with environmental context). Otherwise you’re generating pretty pictures, not usable plans.
Also: I’m a bit surprised how few researchers have used these datasets for basic EDA. Even before training anything, there’s a ton of value in just mapping distributions, correlations, biases, and failure modes. Feels like we’re skipping the “understand the data” step far too often.
reply