-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
-
None
-
None
In our current GHA workflows, we only run workflows in branches in personal forks. GHA isolation rules say that workflow caches from the parent branches can be used by descendant branches. For our branches, the usual parent is "master". Since we do not run workflows on "master", this means every time we create a new branch, GHA would start with logically empty caches for it. Only the next trigger on the same branch would use the caches, saved from the first workflow run.
This means we put additional load on shared infrastructure with pulling JDKs, building jtreg (and pulling its dependencies), bootstrapping sysroots, etc. All these steps also fail intermittently every so often. It also means everyone carries lots of caches around, segregated by branch (look into your https://github.com/<id>/jdk/actions/caches), only relying on cache cleanups when it starts to hit 10 GB. With 200+ contributors, this is easily 2 TB of cloud space we effectively waste in GHA clouds.
We can make all this more reliable, if we manage to run a master-branch workflow that bootstraps all required dependencies and caches them. These dependencies can then be used by PR branches, as "master" branch is their effective parent. I tested this locally by flushing GHA caches, running the full workflow on my "master" branch, and seeing all subsequent jobs to use the cache from "master" branch.
It remains to be seen what is a reasonable strategy for running this workflow. It might be OK to run it on schedule every N days or so, provided that running workflow on populated caches is effectively no-op. Or we can run on every push to master, which means it would run on every sync of personal fork's master to upstream.
This means we put additional load on shared infrastructure with pulling JDKs, building jtreg (and pulling its dependencies), bootstrapping sysroots, etc. All these steps also fail intermittently every so often. It also means everyone carries lots of caches around, segregated by branch (look into your https://github.com/<id>/jdk/actions/caches), only relying on cache cleanups when it starts to hit 10 GB. With 200+ contributors, this is easily 2 TB of cloud space we effectively waste in GHA clouds.
We can make all this more reliable, if we manage to run a master-branch workflow that bootstraps all required dependencies and caches them. These dependencies can then be used by PR branches, as "master" branch is their effective parent. I tested this locally by flushing GHA caches, running the full workflow on my "master" branch, and seeing all subsequent jobs to use the cache from "master" branch.
It remains to be seen what is a reasonable strategy for running this workflow. It might be OK to run it on schedule every N days or so, provided that running workflow on populated caches is effectively no-op. Or we can run on every push to master, which means it would run on every sync of personal fork's master to upstream.