Harden runtime diagnostics for milestone 3

Make the milestone 3 runtime story predictable instead of treating doctor, self-check, and startup failures as loosely related surfaces.

Split doctor and self-check into distinct read-only flows, add tri-state diagnostic status with stable IDs and next steps, and reuse that wording in CLI output, service logs, and tray-triggered diagnostics. Add non-mutating config/model probes, a make runtime-check gate, and public recovery/validation docs for the X11 GA roadmap.

Validation: make runtime-check; PYTHONPATH=src python3 -m unittest discover -s tests -p 'test_*.py'; python3 -m py_compile src/*.py tests/*.py; PYTHONPATH=src python3 -m aman doctor --help; PYTHONPATH=src python3 -m aman self-check --help. Leave milestone 3 open in the roadmap until the manual X11 validation rows are filled.
This commit is contained in:
Thales Maciel 2026-03-12 17:41:23 -03:00
parent a3368056ff
commit ed1b59240b
No known key found for this signature in database
GPG key ID: 33112E6833C34679
16 changed files with 1298 additions and 248 deletions

48
docs/runtime-recovery.md Normal file
View file

@ -0,0 +1,48 @@
# Runtime Recovery Guide
Use this guide when Aman is installed but not behaving correctly.
## Command roles
- `aman doctor --config ~/.config/aman/config.json` is the fast, read-only preflight for config, X11 session, audio runtime, input device resolution, hotkey availability, injection backend selection, and service prerequisites.
- `aman self-check --config ~/.config/aman/config.json` is the deeper, still read-only readiness check. It includes every `doctor` check plus the managed model cache, cache writability, installed user service, current service state, and startup readiness.
- Tray `Run Diagnostics` uses the same deeper `self-check` path and logs any non-`ok` results.
## Reading the output
- `ok`: the checked surface is ready.
- `warn`: the checked surface is degraded or incomplete, but the command still exits `0`.
- `fail`: the supported path is blocked, and the command exits `2`.
Example output:
```text
[OK] config.load: loaded config from /home/user/.config/aman/config.json
[WARN] model.cache: managed editor model is not cached at /home/user/.cache/aman/models/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf | next_step: start Aman once on a networked connection so it can download the managed editor model, then rerun `aman self-check --config /home/user/.config/aman/config.json`
[FAIL] service.state: user service is installed but failed to start | next_step: inspect `journalctl --user -u aman -f` to see why aman.service is failing
overall: fail
```
## Failure map
| Symptom | First command | Diagnostic ID | Meaning | Next step |
| --- | --- | --- | --- | --- |
| Config missing or invalid | `aman doctor` | `config.load` | Config is absent or cannot be parsed | Save settings, fix the JSON, or rerun `aman init --force`, then rerun `doctor` |
| No X11 session | `aman doctor` | `session.x11` | `DISPLAY` is missing or Wayland was detected | Start Aman from the same X11 user session you expect to use daily |
| Audio runtime or microphone missing | `aman doctor` | `runtime.audio` or `audio.input` | PortAudio or the selected input device is unavailable | Install runtime dependencies, connect a microphone, or choose a valid `recording.input` |
| Hotkey cannot be registered | `aman doctor` | `hotkey.parse` | The configured hotkey is invalid or already taken | Choose a different hotkey in Settings |
| Output injection fails | `aman doctor` | `injection.backend` | The chosen X11 output path is not usable | Switch to a supported backend or rerun in the foreground with `--verbose` |
| Managed editor model missing or corrupt | `aman self-check` | `model.cache` | The managed model is absent or has a bad checksum | Start Aman once on a networked connection, or clear the broken cache and retry |
| Model cache directory is not writable | `aman self-check` | `cache.writable` | Aman cannot create or update its managed model cache | Fix permissions on `~/.cache/aman/models/` |
| User service missing or disabled | `aman self-check` | `service.unit` or `service.state` | The service was not installed cleanly or is not active | Reinstall Aman or run `systemctl --user enable --now aman` |
| Startup still fails after install | `aman self-check` | `startup.readiness` | Aman can load config but cannot assemble its runtime without failing | Fix the named runtime dependency, custom model path, or editor dependency, then rerun `self-check` |
## Escalation order
1. Run `aman doctor --config ~/.config/aman/config.json`.
2. Run `aman self-check --config ~/.config/aman/config.json`.
3. Inspect `journalctl --user -u aman -f`.
4. Re-run Aman in the foreground with `aman run --config ~/.config/aman/config.json --verbose`.
If you are collecting evidence for a release or support handoff, copy the first
non-`ok` diagnostic line and the first matching `journalctl` failure block.