-
Enhancement
-
Resolution: Duplicate
-
P3
-
None
-
None
-
None
The TestInfoBot handles propagating of "checks" (e.g. Github actions runs) from the source repository of a PR to the PR itself. This adds some unique requirements on how to poll PRs efficiently without missing important updates. The current implementation is trying to handle this but fails pretty badly in some cases. This is causing TestInfoBot to have multiple WorkItems active at all times.
It always queries for both open and closed PRs that have been updated in the last day (24h). This provides a hard limit for how long any PR is later periodically re-evaluated. This result is then filtered. If the PR, at the current head hash, has been evaluated before, it may have been given an expire time, if so then only let it through if that expire time has passed. Otherwise, process all PRs without expire times.
What this means is that unless the TestInfoBotWorkItem assigns an expire time, then the PR will always be re-evaluated in the next round. There are several situations in which a PR does not get an expire time, here is how it currently works:
1. If no Github Actions are configured on the source repo. On the very first time this happens, an expire time of 2m is set and an informational message is posted. After that it's unlimited re-evaluation.
2. If no Gitlab jobs are enabled, no expire time is set.
3. If checks are configured, but either not started or not ready yet, an expire time of 2m is set. This makes sense as we need to keep polling to see when the checks are done, and this would not trigger an update on the PR on its own.
4. If checks are configured and the checks are done, an expire of 30m is set. I'm not sure why we need this, but also 30m is not that much of a load on the system.
For case 1 and 2, they seem a bit weird to handle separately, as in practice they just represent configurations that are unique to either Github or Gitlab. In either case, it's unlikely that the user will go and change the checks config on the source repo, so even 2m seems like an excessive amount of checking. The unlimited re-evaluations is of course just a bug. Maybe setting it to 30m after posting the notice (which we currently only do for github). If not posting a notice, we should just stop looking.
Case 3 seems reasonable.
Case 4 seems like some kind of fallback catch all just in case the checks config on the source repo was to change. I'm not convinced we need it, but perhaps it should match case 1 after notice posting and 30m is ok. We could also say that 30m isn't helping anyone enough and we shouldn't bother. The PR would be revisited when touched in some other way anyway.
The main issue here is when should we stop looking after setting an expire time? The current implementation will (I assume by accident) stop looking after 24h of inactivity in the PR, based on the initial query in getPeriodicItems. However, this isn't guaranteed as the query may be truncated if there are lots of results (PRs for a given repo), in which case we may seemingly randomly stop checking certain PRs earlier. I think we need a more explicitly configured max age on a PR where we no longer keep checking, or we stop further checking when a PR is closed.
I don't think I will attempt fixing this beforeSKARA-1565 is done, as I would want to leverage the new functionality from that change.
It always queries for both open and closed PRs that have been updated in the last day (24h). This provides a hard limit for how long any PR is later periodically re-evaluated. This result is then filtered. If the PR, at the current head hash, has been evaluated before, it may have been given an expire time, if so then only let it through if that expire time has passed. Otherwise, process all PRs without expire times.
What this means is that unless the TestInfoBotWorkItem assigns an expire time, then the PR will always be re-evaluated in the next round. There are several situations in which a PR does not get an expire time, here is how it currently works:
1. If no Github Actions are configured on the source repo. On the very first time this happens, an expire time of 2m is set and an informational message is posted. After that it's unlimited re-evaluation.
2. If no Gitlab jobs are enabled, no expire time is set.
3. If checks are configured, but either not started or not ready yet, an expire time of 2m is set. This makes sense as we need to keep polling to see when the checks are done, and this would not trigger an update on the PR on its own.
4. If checks are configured and the checks are done, an expire of 30m is set. I'm not sure why we need this, but also 30m is not that much of a load on the system.
For case 1 and 2, they seem a bit weird to handle separately, as in practice they just represent configurations that are unique to either Github or Gitlab. In either case, it's unlikely that the user will go and change the checks config on the source repo, so even 2m seems like an excessive amount of checking. The unlimited re-evaluations is of course just a bug. Maybe setting it to 30m after posting the notice (which we currently only do for github). If not posting a notice, we should just stop looking.
Case 3 seems reasonable.
Case 4 seems like some kind of fallback catch all just in case the checks config on the source repo was to change. I'm not convinced we need it, but perhaps it should match case 1 after notice posting and 30m is ok. We could also say that 30m isn't helping anyone enough and we shouldn't bother. The PR would be revisited when touched in some other way anyway.
The main issue here is when should we stop looking after setting an expire time? The current implementation will (I assume by accident) stop looking after 24h of inactivity in the PR, based on the initial query in getPeriodicItems. However, this isn't guaranteed as the query may be truncated if there are lots of results (PRs for a given repo), in which case we may seemingly randomly stop checking certain PRs earlier. I think we need a more explicitly configured max age on a PR where we no longer keep checking, or we stop further checking when a PR is closed.
I don't think I will attempt fixing this before
- duplicates
-
SKARA-1659 Fix polling and retries in testinfo bot
- Resolved
- is blocked by
-
SKARA-1565 Only poll for updated MRs from GitLab
- Resolved