Outage during security upgrade

Incident Report for WattTime API

Postmortem

What: An API outage during unplanned maintenance

Outage Start: 2025-03-26 08:43 UTC

Outage End: 2025-03-26 09:29 UTC

Duration: 46 minutes

Endpoints: All

What went wrong: In response to news the day before about a K8s ingress-nginx vulnerability, WattTime readied and deployed a patch during an unplanned maintenance window to address the security vulnerability. The upgrade affected some configurations in an unexpected way which resulted in errors to all traffic, including many HTTP 302s.

How we fixed it: The patch was rolled back to restore traffic. Then, we deployed an upgrade that was compatible with our configurations. The second upgrade was successful at fixing the security vulnerability and caused no downtime.

We apologize for any disruption the outage from this unplanned maintenance caused. We are committed to maintaining a secure and reliable API, and though we're proud of how well we perform, we still look for ways to improve.

Posted 11 days ago. Mar 26, 2025 - 13:04 PDT

Resolved

The incident was resolved and the API has been back online for 30 minutes. We'll continue to monitor and investigate the cause of the issue. We'll provide further details in the post-mortem report.
Posted 12 days ago. Mar 26, 2025 - 02:59 PDT

Monitoring

We've been investigating an issue and are monitoring a fix that we put in place.
Posted 12 days ago. Mar 26, 2025 - 02:39 PDT
This incident affected: WattTime API.