Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P3
Fix Version/s: 1.0
Affects Version/s: None
Component/s: bots
Labels:
None

While looking at our Skara bot deployment, I noticed that the PullRequestBot is using a lot more disk than any of the others. Looking closer I noticed that every WorkItem or Command that needs a repository clone creates it in a different sub directory in the scratch dir. This is pretty wasteful with disk space and IMO completely unnecessary. When a WorkItem runs, it is assigned a scratch dir that no other process or thread will touch for the duration. I think we should be able to trust Git repositories and the HostedRepositoryPool enough to reuse repos from different WorkItems, at least as long as no WorkItem is doing weird things with it.

As an example, I could count 5 instances of the main jdk repository in a single scratch dir:

pr-external/scratch/scratch-0/pr/commit-comments/openjdk/jdk
pr-external/scratch/scratch-0/pr/check/openjdk/jdk
pr-external/scratch/scratch-0/pr/command/sponsor/openjdk/jdk
pr-external/scratch/scratch-0/pr/command/integrate/openjdk/jdk
pr-external/scratch/scratch-0/pr/labeler/openjdk/jdk

We run 32 worker threads for this bot runner, so that creates 32 separate scratch dirs, with 5 in each that's 160 clones of the jdk repo. Then multiply that with all the other repos we are running Skara for. Reducing most (if not all) repo clones to a single instance per scratch dir would go a long way towards reducing this disk load.

links to

Commit openjdk/skara/c91bf6a5

Review openjdk/skara/1491

Assignee:: Zhao Song

Reporter:: Erik Joelsson

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023-01-05 13:13

Updated:: 2023-04-28 08:52

Resolved:: 2023-04-28 08:52

Details

Description

Attachments

Issue Links

Activity

People

Dates