Failure Protocols

We do not ask if a system will fail. We ask when. These are the standard operating procedures for the 4 most common failure modes.

Risk: High

Missing fields, malformed JSON, or duplicates.

PROTOCOL: Validate at entry -> Reject Invalid Payload -> Log Error. Bad data never enters the logic stream.

Risk: Medium

Rate limits (429), Timeouts (504), or Auth Failures (401).

PROTOCOL: Exponential Backoff Retry (x3) -> Circuit Breaker -> Human Alert. We assume APIs are unreliable.

Risk: Critical

The workflow stops without crashing, often due to a 'ghost' logic path.

PROTOCOL: Heartbeat Monitors + 'Inactivity Threshold' Alerts. If a scheduled job doesn't run, we know.

Risk: High

Race conditions or duplicate writes creating 'zombie' records.

PROTOCOL: Idempotency Keys -> Read-After-Write Verification. We check state before mutating it.