Constrained generation makes models somewhat less intelligent. Although it shouldn't be an issue in thinking mode, since it can prepare an unconstrained response and then fix it up.
Not true and citation needed. Whatever you cite there are competing papers claiming that structured and constrained generation does zero harm to output diversity/creativity (within a schema).
That is clearly not possible. Imagine if you asked a model yes/no questions with a schema that didn't contain "yes".
In general you can break any model by using a sampler that chooses bad enough tokens sometimes. I don't think it's well studied how well different models respond to this.
I mean that's too reductionist if you're being exact and not a worry if you're not.
Even asking for JSON (without constrained sampling) sometimes degrades output, but also even the name and order of keys can affect performance or even act as structured thinking.
At the end of the day current models have enough problems with generalization that they should establish a baseline and move from there.
But also Gemini supports contrained generation which can't fail to match a schema, so why not use that instead of prompting?