I feel like using sequential transformers generating code feels like a brute-force solution for UI generation.
The "obvious" path forward for frontend assistants is to move away from raw code generation toward some domain-specific language. UI is inherently structural - it should be expressed through component hierarchies that implement pre-defined design guidelines (with colors, margins, border radius etc being defined outside of the core model, and applied as a kind of theme to the model's output).
If we define the UI as a composition of VStacks/HStacks and predefined components, we can use diffusion models to generate the layout, and consistently apply the 'theme' afterwards. It's a much cleaner abstraction than asking an LLM to hallucinate valid CSS classes.
The "obvious" path forward for frontend assistants is to move away from raw code generation toward some domain-specific language. UI is inherently structural - it should be expressed through component hierarchies that implement pre-defined design guidelines (with colors, margins, border radius etc being defined outside of the core model, and applied as a kind of theme to the model's output).
If we define the UI as a composition of VStacks/HStacks and predefined components, we can use diffusion models to generate the layout, and consistently apply the 'theme' afterwards. It's a much cleaner abstraction than asking an LLM to hallucinate valid CSS classes.