No description
Find a file
Thales Maciel 511fab683a
Archive the initial user readiness review
Keep the first user-readiness assessment in the repo so the GA work has a\nconcrete evaluator baseline to refer back to.\n\nAdd the existing timestamped report and document the directory convention in\nuser-readiness/README.md so future reviews can be added without guessing how\nfiles are named or what they represent.
2026-03-12 15:00:58 -03:00
.github/workflows Add package-first build and distribution workflow 2026-02-27 15:06:57 -03:00
benchmarks Add benchmark-driven model promotion workflow and pipeline stages 2026-02-28 15:12:33 -03:00
docs Define the X11 support contract for milestone 1 2026-03-12 14:14:24 -03:00
packaging Add package-first build and distribution workflow 2026-02-27 15:06:57 -03:00
scripts Add package-first build and distribution workflow 2026-02-27 15:06:57 -03:00
src Define the X11 support contract for milestone 1 2026-03-12 14:14:24 -03:00
systemd Add multilingual STT support and config UI/runtime updates 2026-02-27 12:38:13 -03:00
tests Simplify editor cleanup and keep live ASR metadata 2026-03-12 13:24:36 -03:00
user-readiness Archive the initial user readiness review 2026-03-12 15:00:58 -03:00
.gitignore Ignore generated egg-info directories 2026-03-12 15:00:37 -03:00
AGENTS.md Rename project from lel to aman 2026-02-25 11:11:10 -03:00
CHANGELOG.md Add package-first build and distribution workflow 2026-02-27 15:06:57 -03:00
config.example.json Add benchmark-driven model promotion workflow and pipeline stages 2026-02-28 15:12:33 -03:00
Makefile Add benchmark-driven model promotion workflow and pipeline stages 2026-02-28 15:12:33 -03:00
pyproject.toml Add benchmark-driven model promotion workflow and pipeline stages 2026-02-28 15:12:33 -03:00
README.md Define the X11 support contract for milestone 1 2026-03-12 14:14:24 -03:00
uv.lock Add multilingual STT support and config UI/runtime updates 2026-02-27 12:38:13 -03:00

aman

Local amanuensis

Python X11 STT daemon that records audio, runs Whisper, applies local AI cleanup, and injects text.

Target User

The canonical Aman user is a desktop professional who wants dictation and rewriting features without learning Python tooling.

  • End-user path today: distro-specific release artifacts.
  • GA target: portable X11 release bundle for mainstream distros.
  • Developer path: Python/uv workflows.

Persona details and distribution policy are documented in docs/persona-and-distribution.md.

Current Release Channels

Aman is not GA yet for X11 users across distros. Today the maintained release channels are:

  • Debian/Ubuntu .deb: current end-user channel.
  • Arch PKGBUILD plus source tarball: current maintainer and power-user channel.
  • Python wheel and sdist: current developer and integrator channel.
  • The portable X11 installer described in the GA roadmap is the target distribution model, but it is not shipped yet.

GA Support Matrix

Surface Contract
Desktop session X11 only
Runtime dependencies Installed from the distro package manager
Supported daily-use mode systemd --user service
Manual foreground mode aman run for setup, support, and debugging
Canonical recovery sequence aman doctor -> aman self-check -> journalctl --user -u aman -> aman run --verbose
Representative GA validation families Debian/Ubuntu, Arch, Fedora, openSUSE
Portable installer prerequisite System python3 3.10+ for the future GA installer

Current Install Instructions

Debian/Ubuntu (.deb)

Download a release artifact and install it:

sudo apt install ./aman_<version>_<arch>.deb

Then enable the user service:

systemctl --user daemon-reload
systemctl --user enable --now aman

Arch Linux

Use the generated packaging inputs (PKGBUILD + source tarball) in dist/arch/ or your own packaging pipeline.

Daily-Use And Support Modes

  • Supported daily-use path: install Aman, then run it as a systemd --user service.
  • Supported manual path: use aman run in the foreground while setting up, debugging, or collecting support logs.
  • Current release channels still differ by distro. The portable installer is the milestone 2 target, not part of the current release.

Recovery Sequence

When Aman does not behave as expected, use this order:

  1. Run aman doctor --config ~/.config/aman/config.json.
  2. Run aman self-check --config ~/.config/aman/config.json.
  3. Inspect journalctl --user -u aman -f.
  4. Re-run Aman in the foreground with aman run --config ~/.config/aman/config.json --verbose.

Runtime Dependencies

  • X11
  • PortAudio runtime (libportaudio2 or distro equivalent)
  • GTK3 and AppIndicator runtime (gtk3, libayatana-appindicator3)
  • Python GTK and X11 bindings (python3-gi/python-gobject, python-xlib)
Ubuntu/Debian
sudo apt install -y libportaudio2 python3-gi python3-xlib gir1.2-gtk-3.0 libayatana-appindicator3-1
Arch Linux
sudo pacman -S --needed portaudio gtk3 libayatana-appindicator python-gobject python-xlib
Fedora
sudo dnf install -y portaudio gtk3 libayatana-appindicator-gtk3 python3-gobject python3-xlib
openSUSE
sudo zypper install -y portaudio gtk3 libayatana-appindicator3-1 python3-gobject python3-python-xlib

Quickstart (Current Release)

For supported daily use on current release channels:

  1. Install the runtime dependencies for your distro.
  2. Install the current release artifact for your distro.
  3. Enable and start the user service:
systemctl --user daemon-reload
systemctl --user enable --now aman

If you need the manual foreground path for setup or support:

aman run --config ~/.config/aman/config.json

On first launch, Aman opens a graphical settings window automatically. It includes sections for:

  • microphone input
  • hotkey
  • output backend
  • writing profile
  • output safety policy
  • runtime strategy (managed vs custom Whisper path)
  • help/about actions

Config

Create ~/.config/aman/config.json (or let aman create it automatically on first start if missing):

{
  "config_version": 1,
  "daemon": { "hotkey": "Cmd+m" },
  "recording": { "input": "0" },
  "stt": {
    "provider": "local_whisper",
    "model": "base",
    "device": "cpu",
    "language": "auto"
  },
  "models": {
    "allow_custom_models": false,
    "whisper_model_path": ""
  },
  "injection": {
    "backend": "clipboard",
    "remove_transcription_from_clipboard": false
  },
  "safety": {
    "enabled": true,
    "strict": false
  },
  "ux": {
    "profile": "default",
    "show_notifications": true
  },
  "advanced": {
    "strict_startup": true
  },
  "vocabulary": {
    "replacements": [
      { "from": "Martha", "to": "Marta" },
      { "from": "docker", "to": "Docker" }
    ],
    "terms": ["Systemd", "Kubernetes"]
  }
}

config_version is required and currently must be 1. Legacy unversioned configs are migrated automatically on load.

Recording input can be a device index (preferred) or a substring of the device name. If recording.input is explicitly set and cannot be resolved, startup fails instead of falling back to a default device.

Config validation is strict: unknown fields are rejected with a startup error. Validation errors include the exact field and an example fix snippet.

Profile options:

  • ux.profile=default: baseline cleanup behavior.
  • ux.profile=fast: lower-latency AI generation settings.
  • ux.profile=polished: same cleanup depth as default.
  • safety.enabled=true: enables fact-preservation checks (names/numbers/IDs/URLs).
  • safety.strict=false: fallback to safer draft when fact checks fail.
  • safety.strict=true: reject output when fact checks fail.
  • advanced.strict_startup=true: keep fail-fast startup validation behavior.

Transcription language:

  • stt.language=auto (default) enables Whisper auto-detection.
  • You can pin language with Whisper codes (for example en, es, pt, ja, zh) or common names like English/Spanish.
  • If a pinned language hint is rejected by the runtime, Aman logs a warning and retries with auto-detect.

Hotkey notes:

  • Use one key plus optional modifiers (for example Cmd+m, Super+m, Ctrl+space).
  • Super and Cmd are equivalent aliases for the same modifier.

AI cleanup is always enabled and uses the locked local Qwen2.5-1.5B GGUF model downloaded to ~/.cache/aman/models/ during daemon initialization. Prompts are structured with semantic XML tags for both system and user messages to improve instruction adherence and output consistency. Cleanup runs in two local passes:

  • pass 1 drafts cleaned text and labels ambiguity decisions (correction/literal/spelling/filler)
  • pass 2 audits those decisions conservatively and emits final cleaned_text This keeps Aman in dictation mode: it does not execute editing instructions embedded in transcript text. Before Aman reports ready, local llama runs a tiny warmup completion so the first real transcription is faster. If warmup fails and advanced.strict_startup=true, startup fails fast. With advanced.strict_startup=false, Aman logs a warning and continues. Model downloads use a network timeout and SHA256 verification before activation. Cached models are checksum-verified on startup; mismatches trigger a forced redownload.

Provider policy:

  • Aman-managed mode (recommended) is the canonical supported UX: Aman handles model lifecycle and safe defaults for you.
  • Expert mode is opt-in and exposes a custom Whisper model path for advanced users.
  • Editor model/provider configuration is intentionally not exposed in config.
  • Custom Whisper paths are only active with models.allow_custom_models=true.

Use -v/--verbose to enable DEBUG logs, including recognized/processed transcript text and llama.cpp logs (llama:: prefix). Without -v, logs are INFO level.

Vocabulary correction:

  • vocabulary.replacements is deterministic correction (from -> to).
  • vocabulary.terms is a preferred spelling list used as hinting context.
  • Wildcards are intentionally rejected (*, ?, [, ], {, }) to avoid ambiguous rules.
  • Rules are deduplicated case-insensitively; conflicting replacements are rejected.

STT hinting:

  • Vocabulary is passed to Whisper as compact hotwords only when that argument is supported by the installed faster-whisper runtime.
  • Aman enables word_timestamps when supported and runs a conservative alignment heuristic pass (self-correction/restart detection) before the editor stage.

Fact guard:

  • Aman runs a deterministic fact-preservation verifier after editor output.
  • If facts are changed/invented and safety.strict=false, Aman falls back to the safer aligned draft.
  • If facts are changed/invented and safety.strict=true, processing fails and output is not injected.

systemd user service

make install-service

Service notes:

  • The supported daily-use path is the user service.
  • The user unit launches aman from PATH.
  • Package installs should provide the aman command automatically.
  • Use aman run --config ~/.config/aman/config.json in the foreground for setup, support, or debugging.
  • Start recovery with aman doctor, then aman self-check, before inspecting systemctl --user status aman and journalctl --user -u aman -f.

Usage

  • Press the hotkey once to start recording.
  • Press it again to stop and run STT.
  • Press Esc while recording to cancel without processing.
  • Esc is only captured during active recording.
  • Recording start is aborted if the cancel listener cannot be armed.
  • Transcript contents are logged only when -v/--verbose is used.
  • Tray menu includes: Settings..., Help, About, Pause/Resume Aman, Reload Config, Run Diagnostics, Open Config Path, and Quit.
  • If required settings are not saved, Aman enters a Settings Required tray mode and does not capture audio.

Wayland note:

  • Running under Wayland currently exits with a message explaining that it is not supported yet.

Injection backends:

  • clipboard: copy to clipboard and inject via Ctrl+Shift+V (GTK clipboard + XTest)
  • injection: type the text with simulated keypresses (XTest)
  • injection.remove_transcription_from_clipboard: when true and backend is clipboard, restores/clears the clipboard after paste so the transcript is not kept there

Editor stage:

  • Canonical local llama.cpp editor model (managed by Aman).
  • Runtime flow is explicit: ASR -> Alignment Heuristics -> Editor -> Fact Guard -> Vocabulary -> Injection.

Build and packaging (maintainers):

make build
make package
make package-deb
make package-arch
make release-check

make package-deb installs Python dependencies while creating the package. For offline packaging, set AMAN_WHEELHOUSE_DIR to a directory containing the required wheels.

Benchmarking (STT bypass, always dry):

aman bench --text "draft a short email to Marta confirming lunch" --repeat 10 --warmup 2
aman bench --text-file ./bench-input.txt --repeat 20 --json

bench does not capture audio and never injects text to desktop apps. It runs the processing path from input transcript text through alignment/editor/fact-guard/vocabulary cleanup and prints timing summaries.

Model evaluation lab (dataset + matrix sweep):

aman build-heuristic-dataset --input benchmarks/heuristics_dataset.raw.jsonl --output benchmarks/heuristics_dataset.jsonl
aman eval-models --dataset benchmarks/cleanup_dataset.jsonl --matrix benchmarks/model_matrix.small_first.json --heuristic-dataset benchmarks/heuristics_dataset.jsonl --heuristic-weight 0.25 --output benchmarks/results/latest.json
aman sync-default-model --report benchmarks/results/latest.json --artifacts benchmarks/model_artifacts.json --constants src/constants.py

eval-models runs a structured model/parameter sweep over a JSONL dataset and outputs latency + quality metrics (including hybrid score, pass-1/pass-2 latency breakdown, and correction safety metrics for I mean and spelling-disambiguation cases). When --heuristic-dataset is provided, the report also includes alignment-heuristic quality metrics (exact match, token-F1, rule precision/recall, per-tag breakdown). sync-default-model promotes the report winner to the managed default model constants using the artifact registry and can be run in --check mode for CI/release gates.

Control:

make run
make run config.example.json
make doctor
make self-check
make eval-models
make sync-default-model
make check-default-model
make check

Developer setup (optional, uv workflow):

uv sync --extra x11
uv run aman run --config ~/.config/aman/config.json

Developer setup (optional, pip workflow):

make install-local
aman run --config ~/.config/aman/config.json

CLI (support and developer workflows):

aman doctor --config ~/.config/aman/config.json --json
aman self-check --config ~/.config/aman/config.json --json
aman run --config ~/.config/aman/config.json
aman bench --text "example transcript" --repeat 5 --warmup 1
aman build-heuristic-dataset --input benchmarks/heuristics_dataset.raw.jsonl --output benchmarks/heuristics_dataset.jsonl --json
aman eval-models --dataset benchmarks/cleanup_dataset.jsonl --matrix benchmarks/model_matrix.small_first.json --heuristic-dataset benchmarks/heuristics_dataset.jsonl --heuristic-weight 0.25 --json
aman sync-default-model --check --report benchmarks/results/latest.json --artifacts benchmarks/model_artifacts.json --constants src/constants.py
aman version
aman init --config ~/.config/aman/config.json --force