It can happen that an ArchiveWorkItem fails to retrieve the current contents of an mbox when evaluating a PR. When that happens, it will end up resending all emails. This recently happened to https://git.openjdk.org/jdk/pull/18787. My conclusion from investigating this incident gave that Skara didn't really do anything wrong. It received a 404, which must have been a temporary glitch, when requesting the current mbox content
Looking at the GitLab API for writing a file, I remembered that creating a new file uses POST and updating a file uses PUT. In our implementation we just try one and fall back on the other. Similarly for GitHub, updating a file requires including the current commit hash to avoid overwriting the wrong thing, which we also just bypass. We could change this and add explicit methods for create new and for overwrite. The ArchiveWorkItem knows if it found any existing content in the mbox archive, so can pick the right method to call. If it falsely failed to get the existing content from OraHub, it will call create new file, which should fail, forcing a retry of the WorkItem, which will hopefully succeed in getting the existing content.
If we found any content in the file, we should still be able to trust that, it's just the case when the file appeared to not exist that we need to handle differently.
It's unfortunate to not be able to trust the return codes from servers, but that's also the reality of things. The proposed strategy should be more robust and handle things better in more situations.
Looking at the GitLab API for writing a file, I remembered that creating a new file uses POST and updating a file uses PUT. In our implementation we just try one and fall back on the other. Similarly for GitHub, updating a file requires including the current commit hash to avoid overwriting the wrong thing, which we also just bypass. We could change this and add explicit methods for create new and for overwrite. The ArchiveWorkItem knows if it found any existing content in the mbox archive, so can pick the right method to call. If it falsely failed to get the existing content from OraHub, it will call create new file, which should fail, forcing a retry of the WorkItem, which will hopefully succeed in getting the existing content.
If we found any content in the file, we should still be able to trust that, it's just the case when the file appeared to not exist that we need to handle differently.
It's unfortunate to not be able to trust the return codes from servers, but that's also the reality of things. The proposed strategy should be more robust and handle things better in more situations.
- links to
-
Commit(master) openjdk/skara/4666739a
-
Review(master) openjdk/skara/1696