1.3 KiB
1.3 KiB
worker-self-healing-governance Specification
Purpose
TBD - created by archiving change p2-ops-self-healing-runbook. Update Purpose after archive.
Requirements
Requirement: Automated Worker Recovery SHALL Use Bounded Policy Guards
Automated worker restart behavior MUST enforce cooldown periods and bounded restart attempts within a configurable time window.
Scenario: Repeated worker degradation within short window
- WHEN degradation events exceed configured restart-attempt budget
- THEN automated restarts MUST pause and surface a blocked-recovery signal for operator intervention
Requirement: Restart-Churn Protection SHALL Prevent Recovery Storms
The runtime MUST classify restart churn and prevent uncontrolled restart loops.
Scenario: Churn threshold exceeded
- WHEN restart count crosses churn threshold in active window
- THEN watchdog MUST enter guarded mode and require explicit manual override before further restart attempts
Requirement: Recovery Decisions SHALL Be Audit-Ready
Every auto-recovery decision and manual override action MUST be recorded with structured metadata.
Scenario: Worker restart decision emitted
- WHEN system executes or denies a restart action
- THEN structured logs/events MUST include reason, thresholds, actor/source, and resulting state