With more monitoring and investigation using JFR, I've come to the conclusion that all the bots are leaking memory to some extent. In most cases, the leak is very slow, but in the worst cases, we are looking at OOME at least once a week.
It looks like the culprit is the RestRequestCache. After a quick inspection of the source I can't find any code responsible for cleaning out old outdated cache entries. It seems like it's just adding (and updating) new entries, never removing anything. I think we need to either add a service thread that clears out some old data every once in a while, or amortize it every once in a while when called. It shouldn't be that expensive to check if some entries are older than a certain threshold and if so, remove them.
It looks like the culprit is the RestRequestCache. After a quick inspection of the source I can't find any code responsible for cleaning out old outdated cache entries. It seems like it's just adding (and updating) new entries, never removing anything. I think we need to either add a service thread that clears out some old data every once in a while, or amortize it every once in a while when called. It shouldn't be that expensive to check if some entries are older than a certain threshold and if so, remove them.