Thales Maciel 359b5fbaf4 Land milestone 4 first-run docs and media

Make the X11 user path visible on first contact instead of burying it under config and maintainer detail.

Rewrite the README around the supported quickstart, expected tray and dictation result, install validation, troubleshooting, and linked follow-on docs. Split deep config and developer material into separate docs, add checked-in screenshots plus a short WebM walkthrough, and add a generator so the media assets stay reproducible.

Also fix the CLI discovery gap by letting `aman --help` show the top-level command surface while keeping implicit foreground `run` behavior, and align the settings, help, and about copy with the supported service-plus-diagnostics model.

Validation: `PYTHONPATH=src python3 -m unittest tests.test_aman_cli tests.test_config_ui`; `PYTHONPATH=src python3 -m unittest discover -s tests -p 'test_*.py'`; `python3 -m py_compile src/*.py tests/*.py scripts/generate_docs_media.py`; `PYTHONPATH=src python3 -m aman --help`.

Milestone 4 stays open in the roadmap because `docs/x11-ga/first-run-review-notes.md` still needs a real non-implementer walkthrough.

2026-03-12 18:30:34 -03:00

5.1 KiB

Raw Blame History

Config Reference

Use this document when you need the full Aman config shape and the advanced behavior notes that are intentionally kept out of the first-run README path.

Example config

{
  "config_version": 1,
  "daemon": { "hotkey": "Cmd+m" },
  "recording": { "input": "0" },
  "stt": {
    "provider": "local_whisper",
    "model": "base",
    "device": "cpu",
    "language": "auto"
  },
  "models": {
    "allow_custom_models": false,
    "whisper_model_path": ""
  },
  "injection": {
    "backend": "clipboard",
    "remove_transcription_from_clipboard": false
  },
  "safety": {
    "enabled": true,
    "strict": false
  },
  "ux": {
    "profile": "default",
    "show_notifications": true
  },
  "advanced": {
    "strict_startup": true
  },
  "vocabulary": {
    "replacements": [
      { "from": "Martha", "to": "Marta" },
      { "from": "docker", "to": "Docker" }
    ],
    "terms": ["Systemd", "Kubernetes"]
  }
}

config_version is required and currently must be 1. Legacy unversioned configs are migrated automatically on load.

Recording and validation

recording.input can be a device index (preferred) or a substring of the device name.
If recording.input is explicitly set and cannot be resolved, startup fails instead of falling back to a default device.
Config validation is strict: unknown fields are rejected with a startup error.
Validation errors include the exact field and an example fix snippet.

Profiles and runtime behavior

ux.profile=default: baseline cleanup behavior.
ux.profile=fast: lower-latency AI generation settings.
ux.profile=polished: same cleanup depth as default.
safety.enabled=true: enables fact-preservation checks (names/numbers/IDs/URLs).
safety.strict=false: fallback to the safer aligned draft when fact checks fail.
safety.strict=true: reject output when fact checks fail.
advanced.strict_startup=true: keep fail-fast startup validation behavior.

Transcription language:

stt.language=auto enables Whisper auto-detection.
You can pin language with Whisper codes such as en, es, pt, ja, or zh, or common names such as English / Spanish.
If a pinned language hint is rejected by the runtime, Aman logs a warning and retries with auto-detect.

Hotkey notes:

Use one key plus optional modifiers, for example Cmd+m, Super+m, or Ctrl+space.
Super and Cmd are equivalent aliases for the same modifier.

Managed versus expert mode

Aman-managed mode is the canonical supported UX: Aman handles model lifecycle and safe defaults for you.
Expert mode is opt-in and exposes a custom Whisper model path for advanced users.
Editor model/provider configuration is intentionally not exposed in config.
Custom Whisper paths are only active with models.allow_custom_models=true.

Compatibility note:

ux.show_notifications remains in the config schema for compatibility, but it is not part of the current supported first-run X11 surface and is not exposed in the settings window.

Cleanup and model lifecycle

AI cleanup is always enabled and uses the locked local Qwen2.5-1.5B-Instruct-Q4_K_M.gguf model downloaded to ~/.cache/aman/models/ during daemon initialization.

Prompts use semantic XML tags for both system and user messages.
Cleanup runs in two local passes:
- pass 1 drafts cleaned text and labels ambiguity decisions (correction/literal/spelling/filler)
- pass 2 audits those decisions conservatively and emits final cleaned_text
Aman stays in dictation mode: it does not execute editing instructions embedded in transcript text.
Before Aman reports ready, the local editor runs a tiny warmup completion so the first real transcription is faster.
If warmup fails and advanced.strict_startup=true, startup fails fast.
With advanced.strict_startup=false, Aman logs a warning and continues.
Model downloads use a network timeout and SHA256 verification before activation.
Cached models are checksum-verified on startup; mismatches trigger a forced redownload.

Verbose logging and vocabulary

-v/--verbose enables DEBUG logs, including recognized/processed transcript text and llama:: logs.
Without -v, logs stay at INFO level.

Vocabulary correction:

vocabulary.replacements is deterministic correction (from -> to).
vocabulary.terms is a preferred spelling list used as hinting context.
Wildcards are intentionally rejected (*, ?, [, ], {, }) to avoid ambiguous rules.
Rules are deduplicated case-insensitively; conflicting replacements are rejected.

STT hinting:

Vocabulary is passed to Whisper as compact hotwords only when that argument is supported by the installed faster-whisper runtime.
Aman enables word_timestamps when supported and runs a conservative alignment heuristic pass before the editor stage.

Fact guard:

Aman runs a deterministic fact-preservation verifier after editor output.
If facts are changed or invented and safety.strict=false, Aman falls back to the safer aligned draft.
If facts are changed or invented and safety.strict=true, processing fails and output is not injected.

5.1 KiB Raw Blame History