API Incident: /login outage

Incident Report for WattTime API

Postmortem

Start: 2026-04-29 22:24 UTC

End: 2026-04-29 22:48 UTC

Endpoints: All (due to /login outage)

Impact: API requests failed ~100% due to authentication outage.

Cause: Data processing jobs caused out-of-memory events, which in turn blocked a distributed cache that serves authentication. All logins were unsuccessful due to the blocked cache. 

Fix: Internal, automatic notifications informed the team at the time of the incident. We recovered by downscaling the authentication service to avoid the distributed cache. We then removed the use of a distributed cache.

Prevention: We configured our data jobs to prevent memory issues by both right-sizing them and setting default memory requests.

Posted May 14, 2026 - 13:11 PDT

Resolved

We've confirmed the /login endpoint and all other endpoints are operating normally, and we are continuing to monitor the API closely.
Our post-mortem investigation will seek to prevent recurrence.
We will provide more details in the incident post-mortem in the coming days.
Posted Apr 29, 2026 - 16:26 PDT

Monitoring

We implemented a fix, and the API seems to have recovered.
Monitoring shows /login is UP as of Apr 29, 2026 22:48:02 (UTC), after 19 minutes 2 seconds of being DOWN.
We are monitoring for additional issues as we continue investigating to confirm that the root cause has been addressed.
We'll provide another update within one hour.
Posted Apr 29, 2026 - 15:58 PDT

Identified

We saw increased errors on the /login endpoint
This started at approximately 22:29 UTC.
An initial fix seems to have restored /login, but we are still looking to understand the issue and ensure they are stable.
We'll provide another update within 30 minutes.
Posted Apr 29, 2026 - 15:56 PDT

Investigating

We've observed an increase in errors from the API.
We are investigating further and will provide an update within 30 minutes.
Posted Apr 29, 2026 - 15:35 PDT
This incident affected: WattTime API.