Reduce disk usage in pr bot

XMLWordPrintable

    • Type: Enhancement
    • Resolution: Fixed
    • Priority: P3
    • 1.0
    • Affects Version/s: None
    • Component/s: bots
    • None

      While looking at our Skara bot deployment, I noticed that the PullRequestBot is using a lot more disk than any of the others. Looking closer I noticed that every WorkItem or Command that needs a repository clone creates it in a different sub directory in the scratch dir. This is pretty wasteful with disk space and IMO completely unnecessary. When a WorkItem runs, it is assigned a scratch dir that no other process or thread will touch for the duration. I think we should be able to trust Git repositories and the HostedRepositoryPool enough to reuse repos from different WorkItems, at least as long as no WorkItem is doing weird things with it.

      As an example, I could count 5 instances of the main jdk repository in a single scratch dir:

      pr-external/scratch/scratch-0/pr/commit-comments/openjdk/jdk
      pr-external/scratch/scratch-0/pr/check/openjdk/jdk
      pr-external/scratch/scratch-0/pr/command/sponsor/openjdk/jdk
      pr-external/scratch/scratch-0/pr/command/integrate/openjdk/jdk
      pr-external/scratch/scratch-0/pr/labeler/openjdk/jdk

      We run 32 worker threads for this bot runner, so that creates 32 separate scratch dirs, with 5 in each that's 160 clones of the jdk repo. Then multiply that with all the other repos we are running Skara for. Reducing most (if not all) repo clones to a single instance per scratch dir would go a long way towards reducing this disk load.

            Assignee:
            Zhao Song
            Reporter:
            Erik Joelsson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: