Start: 2026-04-29 22:24 UTC
End: 2026-04-29 22:48 UTC
Endpoints: All (due to /login outage)
Impact: API requests failed ~100% due to authentication outage.
Cause: Data processing jobs caused out-of-memory events, which in turn blocked a distributed cache that serves authentication. All logins were unsuccessful due to the blocked cache.
Fix: Internal, automatic notifications informed the team at the time of the incident. We recovered by downscaling the authentication service to avoid the distributed cache. We then removed the use of a distributed cache.
Prevention: We configured our data jobs to prevent memory issues by both right-sizing them and setting default memory requests.