A common operation for several WorkItems is to materialize a handful of repos. Most of the time, there are already local copies available that just need to fetch the latest. This seems like a reasonably efficient strategy. However, before using a local copy, the materialize method will first check Repository::isHealthy. For Git repos this will run:
'git fsck --connectivity-only'
Depending on the repo this is run on, it can take a lot of time. Luckily, our JDK repos usually take <10s. The mlbridge archive repo doesn't fair so well however, taking between 45s and 1m15s currently, and this time is just expected to increase over time, as we send more and more emails that get archived. In the JDK (or other product) repo case, I don't think we can get around this. Operating on the target repos is Skara's main job. However, using repositories for metadata storage and treating them the same way was not a great design decision. Especially for data where each user/WorkItem only cares for a very small part of the data, but still has to keep a complete copy of all the data up to date. This just doesn't scale well.
Changing the backend storage model for these metadata repos is probably too much work at this point, but we can do things to improve the current situation. For the mlbridge archive repo specifically, each ArchiveWorkItem only reads and writes to a single file in the repo. So instead of requiring a local clone of the complete repo and spending a whole minute making sure it's healthy, we could just fetch the contents of the single file we are interested in from the HostedRepository and then write back the new version in the same way, using the REST API. This should speed up mailing operations quite a bit.
'git fsck --connectivity-only'
Depending on the repo this is run on, it can take a lot of time. Luckily, our JDK repos usually take <10s. The mlbridge archive repo doesn't fair so well however, taking between 45s and 1m15s currently, and this time is just expected to increase over time, as we send more and more emails that get archived. In the JDK (or other product) repo case, I don't think we can get around this. Operating on the target repos is Skara's main job. However, using repositories for metadata storage and treating them the same way was not a great design decision. Especially for data where each user/WorkItem only cares for a very small part of the data, but still has to keep a complete copy of all the data up to date. This just doesn't scale well.
Changing the backend storage model for these metadata repos is probably too much work at this point, but we can do things to improve the current situation. For the mlbridge archive repo specifically, each ArchiveWorkItem only reads and writes to a single file in the repo. So instead of requiring a local clone of the complete repo and spending a whole minute making sure it's healthy, we could just fetch the contents of the single file we are interested in from the HostedRepository and then write back the new version in the same way, using the REST API. This should speed up mailing operations quite a bit.