aman/docs/x11-ga/03-runtime-reliability-and-diagnostics.md
Thales Maciel b4a3d446fa
Some checks failed
ci / test-and-build (push) Has been cancelled
Close milestones 2 and 3 on Arch evidence
Record the user-reported Arch X11 validation pass and thread it through the portable and runtime validation matrices.

Adjust the milestone 2 and 3 closeout wording so one fully validated representative distro family is enough for now, while keeping Debian/Ubuntu, Fedora, and openSUSE coverage as an explicit milestone 5 GA signoff requirement.

Update the roadmap and GA validation rollup to mark milestones 2 and 3 complete for now rather than fully GA-complete, and archive the raw Arch evidence in user-readiness/1773357669.md.

Validation: documentation consistency review only; no code or behavior changes were made.
2026-03-12 20:29:42 -03:00

3.6 KiB

Milestone 3: Runtime Reliability and Diagnostics

Why this milestone exists

Once Aman is installed, the next GA risk is not feature depth. It is whether the product behaves predictably, fails loudly, and tells the user what to do next. This milestone turns diagnostics and recovery into a first-class product surface.

Problems it closes

  • Startup readiness and failure paths are not yet shaped into one user-facing recovery model.
  • Diagnostics exist, but their roles are not clearly separated.
  • Audio, hotkey, injection, and model-cache failures can still feel like implementation details instead of guided support flows.
  • The release process does not yet require restart, recovery, or soak evidence.

In scope

  • Define aman doctor as the fast preflight check for config, runtime dependencies, hotkey validity, audio device resolution, and service prerequisites.
  • Define aman self-check as the deeper installed-system readiness check, including managed model availability, writable cache locations, and end-to-end startup prerequisites.
  • Make diagnostics return actionable messages with one next step, not generic failures.
  • Standardize startup and runtime error wording across CLI output, service logs, tray-triggered diagnostics, and docs.
  • Cover recovery paths for:
    • broken config
    • missing audio device
    • hotkey registration failure
    • X11 injection failure
    • model download or cache failure
    • service startup failure
  • Add repeated-run validation, restart validation, and offline-start validation to release gates, and manually validate them on at least one representative distro family for milestone closeout.
  • Treat journalctl --user -u aman and aman run --verbose as the default support escalations after diagnostics.

Out of scope

  • New dictation features unrelated to supportability.
  • Remote telemetry or cloud monitoring.
  • Non-X11 backends.

Dependencies

  • Milestone 1 support contract.
  • Milestone 2 portable install layout and service lifecycle.
  • Existing diagnostics commands and systemd service behavior.

Definition of done: objective

  • doctor and self-check have distinct documented roles.
  • The main end-user failure modes each produce an actionable diagnostic result or service-log message.
  • No supported happy-path failure is known to fail silently.
  • Restart after reboot and restart after service crash are part of the validation matrix and are manually validated on at least one representative distro family for milestone closeout.
  • Offline start with already-cached models is part of the validation matrix and is manually validated on at least one representative distro family for milestone closeout.
  • Release gates include repeated-run and recovery scenarios, not only unit tests.
  • Support docs map each common failure class to a matching diagnostic command or log path.

Definition of done: subjective

  • When Aman fails, the user can usually answer "what broke?" and "what should I try next?" without reading source code.
  • Daily use feels predictable even when the environment is imperfect.
  • The support story feels unified instead of scattered across commands and logs.

Evidence required to close

  • Updated command help and docs for doctor and self-check, including a public runtime recovery guide.
  • Diagnostic output examples for success, warning, and failure cases.
  • A release validation report covering restart, offline-start, and representative recovery scenarios, with one real distro pass sufficient for milestone closeout and full four-family coverage deferred to milestone 5 GA signoff.
  • Manual support runbooks that use diagnostics first and verbose foreground mode second.