While looking at our Skara bot deployment, I noticed that the PullRequestBot is using a lot more disk than any of the others. Looking closer I noticed that every WorkItem or Command that needs a repository clone creates it in a different sub directory in the scratch dir. This is pretty wasteful with disk space and IMO completely unnecessary. When a WorkItem runs, it is assigned a scratch dir that no other process or thread will touch for the duration. I think we should be able to trust Git repositories and the HostedRepositoryPool enough to reuse repos from different WorkItems, at least as long as no WorkItem is doing weird things with it.
As an example, I could count 5 instances of the main jdk repository in a single scratch dir:
pr-external/scratch/scratch-0/pr/commit-comments/openjdk/jdk
pr-external/scratch/scratch-0/pr/check/openjdk/jdk
pr-external/scratch/scratch-0/pr/command/sponsor/openjdk/jdk
pr-external/scratch/scratch-0/pr/command/integrate/openjdk/jdk
pr-external/scratch/scratch-0/pr/labeler/openjdk/jdk
We run 32 worker threads for this bot runner, so that creates 32 separate scratch dirs, with 5 in each that's 160 clones of the jdk repo. Then multiply that with all the other repos we are running Skara for. Reducing most (if not all) repo clones to a single instance per scratch dir would go a long way towards reducing this disk load.
As an example, I could count 5 instances of the main jdk repository in a single scratch dir:
pr-external/scratch/scratch-0/pr/commit-comments/openjdk/jdk
pr-external/scratch/scratch-0/pr/check/openjdk/jdk
pr-external/scratch/scratch-0/pr/command/sponsor/openjdk/jdk
pr-external/scratch/scratch-0/pr/command/integrate/openjdk/jdk
pr-external/scratch/scratch-0/pr/labeler/openjdk/jdk
We run 32 worker threads for this bot runner, so that creates 32 separate scratch dirs, with 5 in each that's 160 clones of the jdk repo. Then multiply that with all the other repos we are running Skara for. Reducing most (if not all) repo clones to a single instance per scratch dir would go a long way towards reducing this disk load.