Harden runtime diagnostics for milestone 3

Make the milestone 3 runtime story predictable instead of treating doctor, self-check, and startup failures as loosely related surfaces. Split doctor and self-check into distinct read-only flows, add tri-state diagnostic status with stable IDs and next steps, and reuse that wording in CLI output, service logs, and tray-triggered diagnostics. Add non-mutating config/model probes, a make runtime-check gate, and public recovery/validation docs for the X11 GA roadmap. Validation: make runtime-check; PYTHONPATH=src python3 -m unittest discover -s tests -p 'test_*.py'; python3 -m py_compile src/*.py tests/*.py; PYTHONPATH=src python3 -m aman doctor --help; PYTHONPATH=src python3 -m aman self-check --help. Leave milestone 3 open in the roadmap until the manual X11 validation rows are filled.
2026-03-12 17:41:23 -03:00 · 2026-03-12 17:41:23 -03:00 · ed1b59240b
commit ed1b59240b
parent a3368056ff
16 changed files with 1298 additions and 248 deletions
--- a/README.md
+++ b/README.md
@ -103,6 +103,31 @@ When Aman does not behave as expected, use this order:
 3. Inspect `journalctl --user -u aman -f`.
 4. Re-run Aman in the foreground with `aman run --config ~/.config/aman/config.json --verbose`.

+See [`docs/runtime-recovery.md`](docs/runtime-recovery.md) for the failure IDs,
+example output, and the common recovery branches behind this sequence.
+
+## Diagnostics
+
+- `aman doctor` is the fast, read-only preflight for config, X11 session,
+  audio runtime, input resolution, hotkey availability, injection backend
+  selection, and service prerequisites.
+- `aman self-check` is the deeper, still read-only installed-system readiness
+  check. It includes every `doctor` check plus managed model cache, cache
+  writability, service unit/state, and startup readiness.
+- The tray `Run Diagnostics` action runs the same deeper `self-check` path and
+  logs any non-`ok` results.
+- Exit code `0` means every check finished as `ok` or `warn`. Exit code `2`
+  means at least one check finished as `fail`.
+
+Example output:
+
+```text
+[OK] config.load: loaded config from /home/user/.config/aman/config.json
+[WARN] model.cache: managed editor model is not cached at /home/user/.cache/aman/models/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf | next_step: start Aman once on a networked connection so it can download the managed editor model, then rerun `aman self-check --config /home/user/.config/aman/config.json`
+[FAIL] service.state: user service is installed but failed to start | next_step: inspect `journalctl --user -u aman -f` to see why aman.service is failing
+overall: fail
+```
+
 ## Runtime Dependencies

 - X11
@ -319,6 +344,8 @@ Service notes:
  setup, support, or debugging.
 - Start recovery with `aman doctor`, then `aman self-check`, before inspecting
  `systemctl --user status aman` and `journalctl --user -u aman -f`.
+- See [`docs/runtime-recovery.md`](docs/runtime-recovery.md) for the expected
+  diagnostic IDs and next steps.

 ## Usage

@ -354,6 +381,7 @@ make package
 make package-portable
 make package-deb
 make package-arch
+make runtime-check
 make release-check
 ```

@ -398,6 +426,7 @@ make run
 make run config.example.json
 make doctor
 make self-check
+make runtime-check
 make eval-models
 make sync-default-model
 make check-default-model