Resolved
The situation is resolved
Monitoring
We're monitoring the situation. Errors appear to have stopped and metrics look good our end, though this could coincide with work day ending in Europe and in turn less opportunities for errors to occur. Further work will continue tomorrow unless the situation drastically changes later today.
Identified
Another change has been to made to mitigate the issue to further reduce error frequency.
We have also identified a way to properly fix the underlying issue, and discovered why jsrunner is affected by an unrelated service. Both of these things will be implemented/patched.
Identified
We have taken two steps to try to mitigate the underlying issue. This should reduce, but not remove, the frequency of errors. We are working on a more permanent fix.
The underlying issue only affects our internal observability and does not negatively impact any customer.
Identified
An issue is affecting a microservice called "jsrunner" we use for running connector scripts (Custom Javascripts in the connectors), causing occasional connectors to fail, which in turn causes the case/email to enter a failed state.
The issue isn't directly related to jsrunner, but rather another service crashing and taking jsrunner down with it.
So far we've it affect West Europe and South UK regions.