Runbooks.

The short answers to recurring "how do I…" and "why is this red?" moments between commit and production — composite indexes, a poisoned build cache, adding a secret, a webhook that won't fire, and reading prod logs. These are pointers to get you unstuck fast; the code and each AGENTS.md carry the full context.

A query needs a composite index

  1. Ship the query. At runtime it fails with failed-precondition.
  2. The error contains a direct create-index link — open it and create the index in the Firebase console.
  3. Note the index in your PR's deploy step. Don't re-add it to firestore.indexes.json — that file is intentionally empty; indexes are console-managed.

CI is red on code I didn't touch

  • Because the main job uses --continue, a failure elsewhere still surfaces in your run. Check whether the failing file is in git diff --name-only origin/main...HEAD — if not, it's not yours to fix.
  • Proto drift → run pnpm protoc:build and commit the regenerated tree.
  • A stale branch behind main can fail required checks on untouched code (Turbo cache miss + reviewdog). Merge origin/main to clear it.

Busting the Go build cache

The Go build cache lives in gs://perkup-nix-cache, keyed by (sha256(backend/app/go.mod)[:16], UTC-date) with a 30-day TTL. If a poisoned cache entry is causing bad builds, delete the day's object:

gcloud storage rm gs://perkup-nix-cache/<key>

The next build repopulates it from scratch.

Adding a Firebase secret

  1. defineSecret('MY_SECRET') in functions/src/consts/secrets.ts.
  2. Add it to bin/setup-firebase-secrets.sh.
  3. Add an op:// reference in .env.local.default.
  4. Flag in the PR — KEVIN provisions the real value in Google Secret Manager.

Webhook isn't firing locally

  • functions/.secret.local must hold the real secret, not the dummy placeholder ("Invalid credentials" = dummy).
  • Inspect the inbound request at the ngrok dashboard (localhost:4040).
  • Signature mismatch → the verification is computed over the raw request body; see Integrations for the shape and the code for the exact scheme.
  • OAuth callback failing → the redirect URI must match the provider config exactly.
  • Slack token decrypt failing → check SLACK_ENCRYPTION_SECRET.

Reading prod logs

Services run on Cloud Run in projects perkup-app (production) and perkup-app-test (staging). Key services: v2services, v2lazy, eventrouter, frontend-proxy, amazon-punchout, slack-notifier. Start broad, then narrow:

gcloud logging read "severity>=ERROR" \
  --limit=50 --format=json --freshness=1h \
  --project=perkup-app

Aggregated errors land in GCP Cloud Error Reporting (with a Slack alert on deploy failure); frontend errors are in LogRocket. Use the trace field to stitch a single request across services.

Incident postmortems

No recent postmortems are mirrored here. The only incident docs in Notion date to October 2022 — older than 18 months, so they're skipped per the content-age policy. Current incident-response process lives under People → Security & Compliance (SOC 2 program).

Source: docs/integrations/AGENTS.md · CLAUDE_CICD.md · functions/AGENTS.md · perkup-app/CLAUDE.md. Compiled 2026-06-07.