Make the milestone 3 runtime story predictable instead of treating doctor, self-check, and startup failures as loosely related surfaces. Split doctor and self-check into distinct read-only flows, add tri-state diagnostic status with stable IDs and next steps, and reuse that wording in CLI output, service logs, and tray-triggered diagnostics. Add non-mutating config/model probes, a make runtime-check gate, and public recovery/validation docs for the X11 GA roadmap. Validation: make runtime-check; PYTHONPATH=src python3 -m unittest discover -s tests -p 'test_*.py'; python3 -m py_compile src/*.py tests/*.py; PYTHONPATH=src python3 -m aman doctor --help; PYTHONPATH=src python3 -m aman self-check --help. Leave milestone 3 open in the roadmap until the manual X11 validation rows are filled.
16 KiB
aman
Local amanuensis
Python X11 STT daemon that records audio, runs Whisper, applies local AI cleanup, and injects text.
Target User
The canonical Aman user is a desktop professional who wants dictation and rewriting features without learning Python tooling.
- End-user path: portable X11 release bundle for mainstream distros.
- Alternate package channels: Debian/Ubuntu
.deband Arch packaging inputs. - Developer path: Python/uv workflows.
Persona details and distribution policy are documented in
docs/persona-and-distribution.md.
Release Channels
Aman is not GA yet for X11 users across distros. The maintained release channels are:
- Portable X11 bundle: current canonical end-user channel.
- Debian/Ubuntu
.deb: secondary packaged channel. - Arch
PKGBUILDplus source tarball: secondary maintainer and power-user channel. - Python wheel and sdist: current developer and integrator channel.
GA Support Matrix
| Surface | Contract |
|---|---|
| Desktop session | X11 only |
| Runtime dependencies | Installed from the distro package manager |
| Supported daily-use mode | systemd --user service |
| Manual foreground mode | aman run for setup, support, and debugging |
| Canonical recovery sequence | aman doctor -> aman self-check -> journalctl --user -u aman -> aman run --verbose |
| Representative GA validation families | Debian/Ubuntu, Arch, Fedora, openSUSE |
| Portable installer prerequisite | System CPython 3.10, 3.11, or 3.12 |
Install (Portable Bundle)
Download aman-x11-linux-<version>.tar.gz and
aman-x11-linux-<version>.tar.gz.sha256, install the runtime dependencies for
your distro, then install the bundle:
sha256sum -c aman-x11-linux-<version>.tar.gz.sha256
tar -xzf aman-x11-linux-<version>.tar.gz
cd aman-x11-linux-<version>
./install.sh
The installer writes the user service, updates ~/.local/bin/aman, and runs
systemctl --user enable --now aman automatically.
On first service start, Aman opens the graphical settings window if
~/.config/aman/config.json does not exist yet.
Upgrade by extracting the newer bundle and running its install.sh again.
Config and cache are preserved by default.
Uninstall with:
~/.local/share/aman/current/uninstall.sh
Add --purge if you also want to remove ~/.config/aman/ and
~/.cache/aman/.
Detailed install, upgrade, uninstall, and conflict guidance lives in
docs/portable-install.md.
Secondary Channels
Debian/Ubuntu (.deb)
Download a release artifact and install it:
sudo apt install ./aman_<version>_<arch>.deb
systemctl --user daemon-reload
systemctl --user enable --now aman
Arch Linux
Use the generated packaging inputs (PKGBUILD + source tarball) in dist/arch/
or your own packaging pipeline.
Daily-Use And Support Modes
- Supported daily-use path: install Aman, then run it as a
systemd --userservice. - Supported manual path: use
aman runin the foreground while setting up, debugging, or collecting support logs.
Recovery Sequence
When Aman does not behave as expected, use this order:
- Run
aman doctor --config ~/.config/aman/config.json. - Run
aman self-check --config ~/.config/aman/config.json. - Inspect
journalctl --user -u aman -f. - Re-run Aman in the foreground with
aman run --config ~/.config/aman/config.json --verbose.
See docs/runtime-recovery.md for the failure IDs,
example output, and the common recovery branches behind this sequence.
Diagnostics
aman doctoris the fast, read-only preflight for config, X11 session, audio runtime, input resolution, hotkey availability, injection backend selection, and service prerequisites.aman self-checkis the deeper, still read-only installed-system readiness check. It includes everydoctorcheck plus managed model cache, cache writability, service unit/state, and startup readiness.- The tray
Run Diagnosticsaction runs the same deeperself-checkpath and logs any non-okresults. - Exit code
0means every check finished asokorwarn. Exit code2means at least one check finished asfail.
Example output:
[OK] config.load: loaded config from /home/user/.config/aman/config.json
[WARN] model.cache: managed editor model is not cached at /home/user/.cache/aman/models/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf | next_step: start Aman once on a networked connection so it can download the managed editor model, then rerun `aman self-check --config /home/user/.config/aman/config.json`
[FAIL] service.state: user service is installed but failed to start | next_step: inspect `journalctl --user -u aman -f` to see why aman.service is failing
overall: fail
Runtime Dependencies
- X11
- PortAudio runtime (
libportaudio2or distro equivalent) - GTK3 and AppIndicator runtime (
gtk3,libayatana-appindicator3) - Python GTK and X11 bindings (
python3-gi/python-gobject,python-xlib)
Ubuntu/Debian
sudo apt install -y libportaudio2 python3-gi python3-xlib gir1.2-gtk-3.0 libayatana-appindicator3-1
Arch Linux
sudo pacman -S --needed portaudio gtk3 libayatana-appindicator python-gobject python-xlib
Fedora
sudo dnf install -y portaudio gtk3 libayatana-appindicator-gtk3 python3-gobject python3-xlib
openSUSE
sudo zypper install -y portaudio gtk3 libayatana-appindicator3-1 python3-gobject python3-python-xlib
Quickstart (Portable Bundle)
For supported daily use on the portable bundle:
- Install the runtime dependencies for your distro.
- Download and extract the portable release bundle.
- Run
./install.shfrom the extracted bundle. - Save the first-run settings window.
- Validate the install:
aman self-check --config ~/.config/aman/config.json
If you need the manual foreground path for setup or support:
aman run --config ~/.config/aman/config.json
On first launch, Aman opens a graphical settings window automatically. It includes sections for:
- microphone input
- hotkey
- output backend
- writing profile
- output safety policy
- runtime strategy (managed vs custom Whisper path)
- help/about actions
Config
Create ~/.config/aman/config.json (or let aman create it automatically on first start if missing):
{
"config_version": 1,
"daemon": { "hotkey": "Cmd+m" },
"recording": { "input": "0" },
"stt": {
"provider": "local_whisper",
"model": "base",
"device": "cpu",
"language": "auto"
},
"models": {
"allow_custom_models": false,
"whisper_model_path": ""
},
"injection": {
"backend": "clipboard",
"remove_transcription_from_clipboard": false
},
"safety": {
"enabled": true,
"strict": false
},
"ux": {
"profile": "default",
"show_notifications": true
},
"advanced": {
"strict_startup": true
},
"vocabulary": {
"replacements": [
{ "from": "Martha", "to": "Marta" },
{ "from": "docker", "to": "Docker" }
],
"terms": ["Systemd", "Kubernetes"]
}
}
config_version is required and currently must be 1. Legacy unversioned
configs are migrated automatically on load.
Recording input can be a device index (preferred) or a substring of the device
name.
If recording.input is explicitly set and cannot be resolved, startup fails
instead of falling back to a default device.
Config validation is strict: unknown fields are rejected with a startup error. Validation errors include the exact field and an example fix snippet.
Profile options:
ux.profile=default: baseline cleanup behavior.ux.profile=fast: lower-latency AI generation settings.ux.profile=polished: same cleanup depth as default.safety.enabled=true: enables fact-preservation checks (names/numbers/IDs/URLs).safety.strict=false: fallback to safer draft when fact checks fail.safety.strict=true: reject output when fact checks fail.advanced.strict_startup=true: keep fail-fast startup validation behavior.
Transcription language:
stt.language=auto(default) enables Whisper auto-detection.- You can pin language with Whisper codes (for example
en,es,pt,ja,zh) or common names likeEnglish/Spanish. - If a pinned language hint is rejected by the runtime, Aman logs a warning and retries with auto-detect.
Hotkey notes:
- Use one key plus optional modifiers (for example
Cmd+m,Super+m,Ctrl+space). SuperandCmdare equivalent aliases for the same modifier.
AI cleanup is always enabled and uses the locked local Qwen2.5-1.5B GGUF model
downloaded to ~/.cache/aman/models/ during daemon initialization.
Prompts are structured with semantic XML tags for both system and user messages
to improve instruction adherence and output consistency.
Cleanup runs in two local passes:
- pass 1 drafts cleaned text and labels ambiguity decisions (correction/literal/spelling/filler)
- pass 2 audits those decisions conservatively and emits final
cleaned_textThis keeps Aman in dictation mode: it does not execute editing instructions embedded in transcript text. Before Aman reportsready, local llama runs a tiny warmup completion so the first real transcription is faster. If warmup fails andadvanced.strict_startup=true, startup fails fast. Withadvanced.strict_startup=false, Aman logs a warning and continues. Model downloads use a network timeout and SHA256 verification before activation. Cached models are checksum-verified on startup; mismatches trigger a forced redownload.
Provider policy:
Aman-managedmode (recommended) is the canonical supported UX: Aman handles model lifecycle and safe defaults for you.Expert modeis opt-in and exposes a custom Whisper model path for advanced users.- Editor model/provider configuration is intentionally not exposed in config.
- Custom Whisper paths are only active with
models.allow_custom_models=true.
Use -v/--verbose to enable DEBUG logs, including recognized/processed
transcript text and llama.cpp logs (llama:: prefix). Without -v, logs are
INFO level.
Vocabulary correction:
vocabulary.replacementsis deterministic correction (from -> to).vocabulary.termsis a preferred spelling list used as hinting context.- Wildcards are intentionally rejected (
*,?,[,],{,}) to avoid ambiguous rules. - Rules are deduplicated case-insensitively; conflicting replacements are rejected.
STT hinting:
- Vocabulary is passed to Whisper as compact
hotwordsonly when that argument is supported by the installedfaster-whisperruntime. - Aman enables
word_timestampswhen supported and runs a conservative alignment heuristic pass (self-correction/restart detection) before the editor stage.
Fact guard:
- Aman runs a deterministic fact-preservation verifier after editor output.
- If facts are changed/invented and
safety.strict=false, Aman falls back to the safer aligned draft. - If facts are changed/invented and
safety.strict=true, processing fails and output is not injected.
systemd user service
make install-service
Service notes:
- The supported daily-use path is the user service.
- The portable installer writes and enables the user unit automatically.
- The local developer unit launched by
make install-servicestill resolvesamanfromPATH. - Package installs should provide the
amancommand automatically. - Use
aman run --config ~/.config/aman/config.jsonin the foreground for setup, support, or debugging. - Start recovery with
aman doctor, thenaman self-check, before inspectingsystemctl --user status amanandjournalctl --user -u aman -f. - See
docs/runtime-recovery.mdfor the expected diagnostic IDs and next steps.
Usage
- Press the hotkey once to start recording.
- Press it again to stop and run STT.
- Press
Escwhile recording to cancel without processing. Escis only captured during active recording.- Recording start is aborted if the cancel listener cannot be armed.
- Transcript contents are logged only when
-v/--verboseis used. - Tray menu includes:
Settings...,Help,About,Pause/Resume Aman,Reload Config,Run Diagnostics,Open Config Path, andQuit. - If required settings are not saved, Aman enters a
Settings Requiredtray mode and does not capture audio.
Wayland note:
- Running under Wayland currently exits with a message explaining that it is not supported yet.
Injection backends:
clipboard: copy to clipboard and inject via Ctrl+Shift+V (GTK clipboard + XTest)injection: type the text with simulated keypresses (XTest)injection.remove_transcription_from_clipboard: whentrueand backend isclipboard, restores/clears the clipboard after paste so the transcript is not kept there
Editor stage:
- Canonical local llama.cpp editor model (managed by Aman).
- Runtime flow is explicit:
ASR -> Alignment Heuristics -> Editor -> Fact Guard -> Vocabulary -> Injection.
Build and packaging (maintainers):
make build
make package
make package-portable
make package-deb
make package-arch
make runtime-check
make release-check
make package-portable builds dist/aman-x11-linux-<version>.tar.gz plus its
.sha256 file.
make package-deb installs Python dependencies while creating the package.
For offline packaging, set AMAN_WHEELHOUSE_DIR to a directory containing the
required wheels.
Benchmarking (STT bypass, always dry):
aman bench --text "draft a short email to Marta confirming lunch" --repeat 10 --warmup 2
aman bench --text-file ./bench-input.txt --repeat 20 --json
bench does not capture audio and never injects text to desktop apps. It runs
the processing path from input transcript text through alignment/editor/fact-guard/vocabulary cleanup and
prints timing summaries.
Model evaluation lab (dataset + matrix sweep):
aman build-heuristic-dataset --input benchmarks/heuristics_dataset.raw.jsonl --output benchmarks/heuristics_dataset.jsonl
aman eval-models --dataset benchmarks/cleanup_dataset.jsonl --matrix benchmarks/model_matrix.small_first.json --heuristic-dataset benchmarks/heuristics_dataset.jsonl --heuristic-weight 0.25 --output benchmarks/results/latest.json
aman sync-default-model --report benchmarks/results/latest.json --artifacts benchmarks/model_artifacts.json --constants src/constants.py
eval-models runs a structured model/parameter sweep over a JSONL dataset and
outputs latency + quality metrics (including hybrid score, pass-1/pass-2 latency breakdown,
and correction safety metrics for I mean and spelling-disambiguation cases).
When --heuristic-dataset is provided, the report also includes alignment-heuristic
quality metrics (exact match, token-F1, rule precision/recall, per-tag breakdown).
sync-default-model promotes the report winner to the managed default model constants
using the artifact registry and can be run in --check mode for CI/release gates.
Control:
make run
make run config.example.json
make doctor
make self-check
make runtime-check
make eval-models
make sync-default-model
make check-default-model
make check
Developer setup (optional, uv workflow):
uv sync --extra x11
uv run aman run --config ~/.config/aman/config.json
Developer setup (optional, pip workflow):
make install-local
aman run --config ~/.config/aman/config.json
CLI (support and developer workflows):
aman doctor --config ~/.config/aman/config.json --json
aman self-check --config ~/.config/aman/config.json --json
aman run --config ~/.config/aman/config.json
aman bench --text "example transcript" --repeat 5 --warmup 1
aman build-heuristic-dataset --input benchmarks/heuristics_dataset.raw.jsonl --output benchmarks/heuristics_dataset.jsonl --json
aman eval-models --dataset benchmarks/cleanup_dataset.jsonl --matrix benchmarks/model_matrix.small_first.json --heuristic-dataset benchmarks/heuristics_dataset.jsonl --heuristic-weight 0.25 --json
aman sync-default-model --check --report benchmarks/results/latest.json --artifacts benchmarks/model_artifacts.json --constants src/constants.py
aman version
aman init --config ~/.config/aman/config.json --force