thaloco/aman

Fork 0

Thales Maciel 993f51712b Add package-first build and distribution workflow

2026-02-27 15:06:57 -03:00

8.3 KiB

Raw Blame History

aman

Local amanuensis

Python X11 STT daemon that records audio, runs Whisper, applies local AI cleanup, and injects text.

Target User

The canonical Aman user is a desktop professional who wants dictation and rewriting features without learning Python tooling.

End-user path: native OS package install.
Developer path: Python/uv workflows.

Persona details and distribution policy are documented in docs/persona-and-distribution.md.

Install (Recommended)

End users do not need uv.

Debian/Ubuntu (`.deb`)

Download a release artifact and install it:

sudo apt install ./aman_<version>_<arch>.deb

Then enable the user service:

systemctl --user daemon-reload
systemctl --user enable --now aman

Arch Linux

Use the generated packaging inputs (PKGBUILD + source tarball) in dist/arch/ or your own packaging pipeline.

Distribution Matrix

Channel	Audience	Status
Debian package (`.deb`)	End users on Ubuntu/Debian	Canonical
Arch `PKGBUILD` + source tarball	Arch maintainers/power users	Supported
Python wheel/sdist	Developers/integrators	Supported

Runtime Dependencies

X11
PortAudio runtime (libportaudio2 or distro equivalent)
GTK3 and AppIndicator runtime (gtk3, libayatana-appindicator3)
Python GTK and X11 bindings (python3-gi/python-gobject, python-xlib)

Ubuntu/Debian

sudo apt install -y libportaudio2 python3-gi python3-xlib gir1.2-gtk-3.0 libayatana-appindicator3-1

Arch Linux

sudo pacman -S --needed portaudio gtk3 libayatana-appindicator python-gobject python-xlib

Fedora

sudo dnf install -y portaudio gtk3 libayatana-appindicator-gtk3 python3-gobject python3-xlib

openSUSE

sudo zypper install -y portaudio gtk3 libayatana-appindicator3-1 python3-gobject python3-python-xlib

Quickstart

aman run

On first launch, Aman opens a graphical settings window automatically. It includes sections for:

microphone input
hotkey
output backend
writing profile
runtime and model strategy
help/about actions

Config

Create ~/.config/aman/config.json (or let aman create it automatically on first start if missing):

{
  "config_version": 1,
  "daemon": { "hotkey": "Cmd+m" },
  "recording": { "input": "0" },
  "stt": {
    "provider": "local_whisper",
    "model": "base",
    "device": "cpu",
    "language": "auto"
  },
  "llm": { "provider": "local_llama" },
  "models": {
    "allow_custom_models": false,
    "whisper_model_path": "",
    "llm_model_path": ""
  },
  "external_api": {
    "enabled": false,
    "provider": "openai",
    "base_url": "https://api.openai.com/v1",
    "model": "gpt-4o-mini",
    "timeout_ms": 15000,
    "max_retries": 2,
    "api_key_env_var": "AMAN_EXTERNAL_API_KEY"
  },
  "injection": {
    "backend": "clipboard",
    "remove_transcription_from_clipboard": false
  },
  "ux": {
    "profile": "default",
    "show_notifications": true
  },
  "advanced": {
    "strict_startup": true
  },
  "vocabulary": {
    "replacements": [
      { "from": "Martha", "to": "Marta" },
      { "from": "docker", "to": "Docker" }
    ],
    "terms": ["Systemd", "Kubernetes"]
  }
}

config_version is required and currently must be 1. Legacy unversioned configs are migrated automatically on load.

Recording input can be a device index (preferred) or a substring of the device name. If recording.input is explicitly set and cannot be resolved, startup fails instead of falling back to a default device.

Config validation is strict: unknown fields are rejected with a startup error. Validation errors include the exact field and an example fix snippet.

Profile options:

ux.profile=default: baseline cleanup behavior.
ux.profile=fast: lower-latency AI generation settings.
ux.profile=polished: same cleanup depth as default.
advanced.strict_startup=true: keep fail-fast startup validation behavior.

Transcription language:

stt.language=auto (default) enables Whisper auto-detection.
You can pin language with Whisper codes (for example en, es, pt, ja, zh) or common names like English/Spanish.
If a pinned language hint is rejected by the runtime, Aman logs a warning and retries with auto-detect.

Hotkey notes:

Use one key plus optional modifiers (for example Cmd+m, Super+m, Ctrl+space).
Super and Cmd are equivalent aliases for the same modifier.

AI cleanup is always enabled and uses the locked local Llama-3.2-3B GGUF model downloaded to ~/.cache/aman/models/ during daemon initialization. Model downloads use a network timeout and SHA256 verification before activation. Cached models are checksum-verified on startup; mismatches trigger a forced redownload.

Provider policy:

Aman-managed mode (recommended) is the canonical supported UX: Aman handles model lifecycle and safe defaults for you.
Expert mode is opt-in and exposes custom providers/models for advanced users.
External API auth is environment-variable based (external_api.api_key_env_var); no API key is stored in config.
Custom local model paths are only active with models.allow_custom_models=true.

Use -v/--verbose to enable DEBUG logs, including recognized/processed transcript text and llama.cpp logs (llama:: prefix). Without -v, logs are INFO level.

Vocabulary correction:

vocabulary.replacements is deterministic correction (from -> to).
vocabulary.terms is a preferred spelling list used as hinting context.
Wildcards are intentionally rejected (*, ?, [, ], {, }) to avoid ambiguous rules.
Rules are deduplicated case-insensitively; conflicting replacements are rejected.

STT hinting:

Vocabulary is passed to Whisper as hotwords/initial_prompt only when those arguments are supported by the installed faster-whisper runtime.

systemd user service

make install-service

Service notes:

The user unit launches aman from PATH.
Package installs should provide the aman command automatically.
Inspect failures with systemctl --user status aman and journalctl --user -u aman -f.

Usage

Press the hotkey once to start recording.
Press it again to stop and run STT.
Press Esc while recording to cancel without processing.
Esc is only captured during active recording.
Recording start is aborted if the cancel listener cannot be armed.
Transcript contents are logged only when -v/--verbose is used.
Tray menu includes: Settings..., Help, About, Pause/Resume Aman, Reload Config, Run Diagnostics, Open Config Path, and Quit.
If required settings are not saved, Aman enters a Settings Required tray mode and does not capture audio.

Wayland note:

Running under Wayland currently exits with a message explaining that it is not supported yet.

Injection backends:

clipboard: copy to clipboard and inject via Ctrl+Shift+V (GTK clipboard + XTest)
injection: type the text with simulated keypresses (XTest)
injection.remove_transcription_from_clipboard: when true and backend is clipboard, restores/clears the clipboard after paste so the transcript is not kept there

AI processing:

Default local llama.cpp model.
Optional external API provider through llm.provider=external_api.

Build and packaging (maintainers):

make build
make package
make package-deb
make package-arch
make release-check

make package-deb installs Python dependencies while creating the package. For offline packaging, set AMAN_WHEELHOUSE_DIR to a directory containing the required wheels.

Control:

make run
make run config.example.json
make doctor
make self-check
make check

Developer setup (optional, uv workflow):

uv sync --extra x11
uv run aman run --config ~/.config/aman/config.json

Developer setup (optional, pip workflow):

make install-local
aman run --config ~/.config/aman/config.json

CLI (internal/support fallback):

aman run --config ~/.config/aman/config.json
aman doctor --config ~/.config/aman/config.json --json
aman self-check --config ~/.config/aman/config.json --json
aman version
aman init --config ~/.config/aman/config.json --force

8.3 KiB Raw Blame History

aman

Target User

Install (Recommended)

Debian/Ubuntu (.deb)

Arch Linux

Distribution Matrix

Runtime Dependencies

Quickstart

Config

systemd user service

Usage

8.3 KiB

Raw Blame History

Debian/Ubuntu (`.deb`)