Application Loading Issues

Incident Report for Air

Postmortem

Overview

Incident name: Ineffective planning of isolated db queries led to degraded app experience
Date and time: 2025-10-02 13:47–14:01 ET
Affected areas: Air’s web app
Status: Resolved

Customer impact

  • Some users experienced slow loading or were unable to load the Air app for roughly 15 minutes. The issue was fully resolved the same day.
  • Time window: Approximately 1:47–2:01 PM ET on October 2, 2025
  • Data and security: No data loss or security exposure occurred

What happened

Multiple long-running queries caused contention on the primary database, leading to app unavailability and elevated error rates until mitigation reduced load and queries resumed normal performance.

Root cause

  • Primary cause: On the writer, the query planner failed to use an index on the clip table for a frequently executed query, resulting in full table scans and spills to disk under load.

Timeline (high level)

  • 13:47: Degradation reported; app fails to load.
  • 13:48: Error rate confirmed elevated; incident channel started.
  • 14:00: Error rates decrease and app loads.
  • 14:01: Huddle begins; root cause investigation continues.
  • 2025-10-03 04:56: Database parameter change applied to reduce spill risk for heavy queries.

Preventative actions

  • Immediate fixes completed

    • Reduced load and stabilized the app within minutes
    • Completed database maintenance and configuration updates to improve query planning and performance
  • Near-term improvements (planned or in progress)

    • Enhanced autoscaling of database readers.
    • Configure and tune autovacuum to reduce bloat.
    • Timeout configuration for Lambda.

Frequently asked questions

  • Was any customer data lost?

    • No. We confirmed no data loss or security exposure.
  • Do customers need to take any action?

    • No. Systems recovered without degradation. If anything looks off, contact us and we will investigate immediately.
  • Could this happen again?

    • The underlying query inefficiency has been mitigated and additional monitoring and runbook improvements are in progress.

Need help?

If you notice anything unexpected, please reach out to your Air contact or reply to your most recent support thread and we’ll follow up immediately.

Posted Oct 09, 2025 - 22:12 UTC

Resolved

This incident has been resolved.
Posted Oct 02, 2025 - 17:59 UTC

Investigating

We are currently investigating reports that users are unable to load the application. Our team is actively looking into the issue and we will provide an update as soon as we have more information.
Posted Oct 02, 2025 - 17:42 UTC
This incident affected: Web App.