Files
DashBoard/openspec/specs/worker-self-healing-governance/spec.md

1.3 KiB

worker-self-healing-governance Specification

Purpose

TBD - created by archiving change p2-ops-self-healing-runbook. Update Purpose after archive.

Requirements

Requirement: Automated Worker Recovery SHALL Use Bounded Policy Guards

Automated worker restart behavior MUST enforce cooldown periods and bounded restart attempts within a configurable time window.

Scenario: Repeated worker degradation within short window

  • WHEN degradation events exceed configured restart-attempt budget
  • THEN automated restarts MUST pause and surface a blocked-recovery signal for operator intervention

Requirement: Restart-Churn Protection SHALL Prevent Recovery Storms

The runtime MUST classify restart churn and prevent uncontrolled restart loops.

Scenario: Churn threshold exceeded

  • WHEN restart count crosses churn threshold in active window
  • THEN watchdog MUST enter guarded mode and require explicit manual override before further restart attempts

Requirement: Recovery Decisions SHALL Be Audit-Ready

Every auto-recovery decision and manual override action MUST be recorded with structured metadata.

Scenario: Worker restart decision emitted

  • WHEN system executes or denies a restart action
  • THEN structured logs/events MUST include reason, thresholds, actor/source, and resulting state