Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My take is that they should always be committed, but never generated by the dev, instead generated and pushed when necessary by CI. The problem with generating those files yourself is that, in many cases, it makes the output nondeterministic and nonreproducible. In the ideal world those tools would just generate those files deterministically, but until then for me committing them from CI is an acceptable stopgap


My preference is to do both. Have them generated by a dev, committed, and also generated in CI. The latter gets compared with the checked in contents to ensure the results match the expected value.

This speeds up CI (the generation path can be done in parallel) and most local development.

The one catch is that it relies on mostly trusting whoever has a commit bit. But if you don’t have that and any part of the build involves scripts that are part of the repo itself, then you’ve already lost.


> The one catch is that it relies on mostly trusting whoever has a commit bit.

Would the comparison not show that the person you're trusting goofed or is being malicious?


In either case it would prompt closer examination.

If the dev goofed, then good thing it got caught.

If the dev is not trustworthy, then you have evidence of such untrustworthiness.


My preference is to do both. Have them generated by a dev, committed, and also generated in CI. The latter gets compared with the checked in contents to ensure the results match the expected value.

Bingo. This is what I am working towards convincing people to adopt at my current job. It's a long road.


Would you happen to know of a documented workflow? Or blog posts that present solutions like this.

I would be very interested in how seeing how other people are doing it.

Thanks!


The generation routine bits would be highly specific to the project, but the final check in CI is as simple checking the git diff/status of the generated targets to see if they match the ref. Any deviance indicates that it’s been missed by the patch submitter (likely inadvertently in the case of honest actors).

The real work is being able to transform the generation task into a reproducible step that be run consistently anywhere. Containerizing those steps can help but it’s not strictly required nor is it enough if the “inputs” are a non-seeded random or the current time.


I have a simple script that asserts a clean working directory here https://github.com/mnahkies/openapi-code-generator/blob/main... which I use to check generated output hasn't changed after running the generation step in CI.

It relies on your generated artifacts being deterministic, which is a design goal of that particular project so works fine there.


No, they should be generated by either dev or something like pre-commit and then checked if they match what's generated by CI.

And yes, those have to be deterministic with regards to inputs, it does not make sense otherwise.


No unauditable generated code for me, either manually or automatically, thanks.


Why would generated code be unauditable?

The inputs and the generation will obviously be defined.


That's not the case for autotools output, or flex and bison output.

If the generated files are what you say? Well, just embed the generation step into the build system. A simple approach like that is easily made reproducible, and we avoid introducing noise into the repository.


The blog post do explain why some of the generation is done separately. But yes, that is also a viable approach.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: