My take is that they should *always* be committed, but *never* generated by the ...

koolba · on Aug 13, 2024

My preference is to do both. Have them generated by a dev, committed, and also generated in CI. The latter gets compared with the checked in contents to ensure the results match the expected value.

This speeds up CI (the generation path can be done in parallel) and most local development.

The one catch is that it relies on mostly trusting whoever has a commit bit. But if you don’t have that and any part of the build involves scripts that are part of the repo itself, then you’ve already lost.

giancarlostoro · on Aug 13, 2024

> The one catch is that it relies on mostly trusting whoever has a commit bit.

Would the comparison not show that the person you're trusting goofed or is being malicious?

rangerelf · on Aug 13, 2024

In either case it would prompt closer examination.

If the dev goofed, then good thing it got caught.

If the dev is not trustworthy, then you have evidence of such untrustworthiness.

gjvc · on Aug 13, 2024

My preference is to do both. Have them generated by a dev, committed, and also generated in CI. The latter gets compared with the checked in contents to ensure the results match the expected value.

Bingo. This is what I am working towards convincing people to adopt at my current job. It's a long road.

anbotero · on Aug 13, 2024

Would you happen to know of a documented workflow? Or blog posts that present solutions like this.

I would be very interested in how seeing how other people are doing it.

Thanks!

koolba · on Aug 13, 2024

The generation routine bits would be highly specific to the project, but the final check in CI is as simple checking the git diff/status of the generated targets to see if they match the ref. Any deviance indicates that it’s been missed by the patch submitter (likely inadvertently in the case of honest actors).

The real work is being able to transform the generation task into a reproducible step that be run consistently anywhere. Containerizing those steps can help but it’s not strictly required nor is it enough if the “inputs” are a non-seeded random or the current time.

mnahkies · on Aug 13, 2024

I have a simple script that asserts a clean working directory here https://github.com/mnahkies/openapi-code-generator/blob/main... which I use to check generated output hasn't changed after running the generation step in CI.

It relies on your generated artifacts being deterministic, which is a design goal of that particular project so works fine there.

KptMarchewa · on Aug 13, 2024

No, they should be generated by either dev or something like pre-commit and then checked if they match what's generated by CI.

And yes, those have to be deterministic with regards to inputs, it does not make sense otherwise.

EuAndreh · on Aug 13, 2024

No unauditable generated code for me, either manually or automatically, thanks.

tjoff · on Aug 13, 2024

Why would generated code be unauditable?

The inputs and the generation will obviously be defined.

EuAndreh · on Aug 13, 2024

That's not the case for autotools output, or flex and bison output.

If the generated files are what you say? Well, just embed the generation step into the build system. A simple approach like that is easily made reproducible, and we avoid introducing noise into the repository.

tjoff · on Aug 13, 2024

The blog post do explain why some of the generation is done separately. But yes, that is also a viable approach.