No description

Find a file

Thales Maciel e221d49020 Add pipeline engine and remove legacy compatibility paths		2026-02-25 22:40:03 -03:00
src	Add pipeline engine and remove legacy compatibility paths	2026-02-25 22:40:03 -03:00
systemd	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
tests	Add pipeline engine and remove legacy compatibility paths	2026-02-25 22:40:03 -03:00
.gitignore	Update project files	2026-02-10 11:01:36 -03:00
AGENTS.md	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
config.example.json	Remove unused vocabulary and domain mode options	2026-02-25 11:26:23 -03:00
Makefile	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
pipelines.example.py	Add pipeline engine and remove legacy compatibility paths	2026-02-25 22:40:03 -03:00
pyproject.toml	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
README.md	Add pipeline engine and remove legacy compatibility paths	2026-02-25 22:40:03 -03:00
uv.lock	Rename project from lel to aman	2026-02-25 11:11:10 -03:00

README.md

aman

Local amanuensis

Python X11 STT daemon that records audio, runs Whisper, applies local AI cleanup, and injects text.

Requirements

X11
sounddevice (PortAudio)
faster-whisper
llama-cpp-python
Tray icon deps: gtk3, libayatana-appindicator3
Python deps (core): numpy, pillow, faster-whisper, llama-cpp-python, sounddevice
X11 extras: PyGObject, python-xlib

System packages (example names): portaudio/libportaudio2.

Ubuntu/Debian

sudo apt install -y portaudio19-dev libportaudio2 python3-gi gir1.2-gtk-3.0 libayatana-appindicator3-1

Arch Linux

sudo pacman -S --needed portaudio gtk3 libayatana-appindicator

Fedora

sudo dnf install -y portaudio portaudio-devel gtk3 libayatana-appindicator-gtk3

openSUSE

sudo zypper install -y portaudio portaudio-devel gtk3 libayatana-appindicator3-1

Python Daemon

Install Python deps:

X11 (supported):

uv sync --extra x11

Config

Create ~/.config/aman/config.json (or let aman create it automatically on first start if missing):

{
  "daemon": { "hotkey": "Cmd+m" },
  "recording": { "input": "0" },
  "stt": { "model": "base", "device": "cpu" },
  "injection": {
    "backend": "clipboard",
    "remove_transcription_from_clipboard": false
  },
  "vocabulary": {
    "replacements": [
      { "from": "Martha", "to": "Marta" },
      { "from": "docker", "to": "Docker" }
    ],
    "terms": ["Systemd", "Kubernetes"]
  }
}

Recording input can be a device index (preferred) or a substring of the device name.

Hotkey notes:

Use one key plus optional modifiers (for example Cmd+m, Super+m, Ctrl+space).
Super and Cmd are equivalent aliases for the same modifier.
Invalid hotkey syntax in config prevents startup/reload.
When ~/.config/aman/pipelines.py exists, hotkeys come from HOTKEY_PIPELINES.
daemon.hotkey is used as the fallback/default hotkey only when no pipelines file is present.

AI cleanup is always enabled and uses the locked local Llama-3.2-3B GGUF model downloaded to ~/.cache/aman/models/ during daemon initialization.

Use -v/--verbose to enable DEBUG logs, including recognized/processed transcript text and llama.cpp logs (llama:: prefix). Without -v, logs are INFO level.

Vocabulary correction:

vocabulary.replacements is deterministic correction (from -> to).
vocabulary.terms is a preferred spelling list used as hinting context.
Wildcards are intentionally rejected (*, ?, [, ], {, }) to avoid ambiguous rules.
Rules are deduplicated case-insensitively; conflicting replacements are rejected.

STT hinting:

Vocabulary is passed to Whisper as hotwords/initial_prompt only when those arguments are supported by the installed faster-whisper runtime.

systemd user service

mkdir -p ~/.local/share/aman/src/assets
cp src/*.py ~/.local/share/aman/src/
cp src/assets/*.png ~/.local/share/aman/src/assets/
cp systemd/aman.service ~/.config/systemd/user/aman.service
systemctl --user daemon-reload
systemctl --user enable --now aman

Usage

Press the hotkey once to start recording.
Press it again to stop and run STT.
Press Esc while recording to cancel without processing.
Transcript contents are logged only when -v/--verbose is used.
Config changes are hot-reloaded automatically (polled every 1 second).
~/.config/aman/pipelines.py changes are hot-reloaded automatically (polled every 1 second).
Send SIGHUP to force an immediate reload of config and pipelines: systemctl --user kill -s HUP aman (or send HUP to the process directly).
Reloads are applied when the daemon is idle; invalid updates are rejected and the last valid config stays active.
Reload success/failure is logged, and desktop notifications are shown when available.

Wayland note:

Running under Wayland currently exits with a message explaining that it is not supported yet.

Injection backends:

clipboard: copy to clipboard and inject via Ctrl+Shift+V (GTK clipboard + XTest)
injection: type the text with simulated keypresses (XTest)
injection.remove_transcription_from_clipboard: when true and backend is clipboard, restores/clears the clipboard after paste so the transcript is not kept there

AI processing:

Local llama.cpp model only (no remote provider configuration).

Pipelines API

aman is split into:

shell daemon: hotkeys, recording/cancel, and desktop injection
pipeline engine: lib.transcribe(...) and lib.llm(...)
pipeline implementation: Python callables mapped per hotkey

Pipeline file path:

~/.config/aman/pipelines.py
You can start from pipelines.example.py.
If pipelines.py is missing, aman uses a built-in reference pipeline bound to daemon.hotkey.
If pipelines.py exists but is invalid, startup fails fast.
Pipelines are hot-reloaded automatically when the module file changes.
Send SIGHUP to force an immediate reload of both config and pipelines.

Expected module exports:

HOTKEY_PIPELINES = {
  "Super+m": my_pipeline,
  "Super+Shift+m": caps_pipeline,
}

PIPELINE_OPTIONS = {
  "Super+Shift+m": {"failure_policy": "strict"},  # optional
}

Pipeline callable signature:

def my_pipeline(audio, lib) -> str:
  text = lib.transcribe(audio)
  context = lib.llm(
    system_prompt="context system prompt",
    user_prompt=f"Transcript: {text}",
  )
  out = lib.llm(
    system_prompt="amanuensis prompt",
    user_prompt=f"context={context}\ntext={text}",
  )
  return out

lib API:

lib.transcribe(audio, hints=None, whisper_opts=None) -> str
lib.llm(system_prompt=..., user_prompt=..., llm_opts=None) -> str

Failure policy options:

best_effort (default): pipeline errors return empty output
strict: pipeline errors abort the current run

Validation:

HOTKEY_PIPELINES must be a non-empty dictionary.
Every hotkey key must be a non-empty string.
Every pipeline value must be callable.
PIPELINE_OPTIONS must be a dictionary when provided.

Reference behavior:

The built-in fallback pipeline (used when pipelines.py is missing) uses lib.llm(...) twice:
- first to infer context
- second to run the amanuensis rewrite
The second pass requests JSON output and expects {"cleaned_text": "..."}.
Deterministic dictionary replacements are then applied as part of that reference implementation.

Control:

make run
make check