No description

Find a file

Thales Maciel 09090102a2 Rename project from lel to aman		2026-02-25 11:11:10 -03:00
src	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
systemd	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
tests	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
.gitignore	Update project files	2026-02-10 11:01:36 -03:00
AGENTS.md	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
config.example.json	Remove ai.enabled configuration support	2026-02-25 11:01:18 -03:00
Makefile	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
pyproject.toml	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
README.md	Rename project from lel to aman	2026-02-25 11:11:10 -03:00
uv.lock	Rename project from lel to aman	2026-02-25 11:11:10 -03:00

README.md

aman

Python X11 STT daemon that records audio, runs Whisper, applies local AI cleanup, and injects text.

Requirements

X11 (Wayland support scaffolded but not available yet)
sounddevice (PortAudio)
faster-whisper
llama-cpp-python
Tray icon deps: gtk3, libayatana-appindicator3
Python deps (core): numpy, pillow, faster-whisper, llama-cpp-python, sounddevice
X11 extras: PyGObject, python-xlib

System packages (example names): portaudio/libportaudio2.

Ubuntu (X11)

sudo apt install -y portaudio19-dev libportaudio2 python3-gi gir1.2-gtk-3.0 libayatana-appindicator3-1

Debian (X11)

sudo apt install -y portaudio19-dev libportaudio2 python3-gi gir1.2-gtk-3.0 libayatana-appindicator3-1

Arch Linux (X11)

sudo pacman -S --needed portaudio gtk3 libayatana-appindicator

Fedora (X11)

sudo dnf install -y portaudio portaudio-devel gtk3 libayatana-appindicator-gtk3

openSUSE (X11)

sudo zypper install -y portaudio portaudio-devel gtk3 libayatana-appindicator3-1

Python Daemon

Install Python deps:

X11 (supported):

uv sync --extra x11

Wayland (scaffold only):

uv sync --extra wayland

Run:

uv run python3 src/aman.py --config ~/.config/aman/config.json

Config

Create ~/.config/aman/config.json:

{
  "daemon": { "hotkey": "Cmd+m" },
  "recording": { "input": "0" },
  "stt": { "model": "base", "device": "cpu" },
  "injection": {
    "backend": "clipboard",
    "remove_transcription_from_clipboard": false
  },
  "vocabulary": {
    "replacements": [
      { "from": "Martha", "to": "Marta" },
      { "from": "docker", "to": "Docker" }
    ],
    "terms": ["Systemd", "Kubernetes"],
    "max_rules": 500,
    "max_terms": 500
  },
  "domain_inference": { "enabled": true, "mode": "auto" }
}

Recording input can be a device index (preferred) or a substring of the device name.

AI cleanup is always enabled and uses the locked local Llama-3.2-3B GGUF model downloaded to ~/.cache/aman/models/ on first use.

Use -v/--verbose to enable DEBUG logs, including recognized/processed transcript text and llama.cpp logs (llama:: prefix). Without -v, logs are INFO level.

Vocabulary correction:

vocabulary.replacements is deterministic correction (from -> to).
vocabulary.terms is a preferred spelling list used as hinting context.
Wildcards are intentionally rejected (*, ?, [, ], {, }) to avoid ambiguous rules.
Rules are deduplicated case-insensitively; conflicting replacements are rejected.
Limits are bounded by max_rules and max_terms.

Domain inference:

domain_inference.mode currently supports auto.
Domain context is advisory only and is used to improve cleanup prompts.
When confidence is low, it falls back to general context.

STT hinting:

Vocabulary is passed to Whisper as hotwords/initial_prompt only when those arguments are supported by the installed faster-whisper runtime.

systemd user service

mkdir -p ~/.local/share/aman/src/assets
cp src/*.py ~/.local/share/aman/src/
cp src/assets/*.png ~/.local/share/aman/src/assets/
cp systemd/aman.service ~/.config/systemd/user/aman.service
systemctl --user daemon-reload
systemctl --user enable --now aman

Usage

Press the hotkey once to start recording.
Press it again to stop and run STT.
Press Esc while recording to cancel without processing.
Transcript contents are logged only when -v/--verbose is used.

Wayland note:

Running under Wayland currently exits with a message explaining that it is not supported yet.

Injection backends:

clipboard: copy to clipboard and inject via Ctrl+Shift+V (GTK clipboard + XTest)
injection: type the text with simulated keypresses (XTest)
injection.remove_transcription_from_clipboard: when true and backend is clipboard, restores/clears the clipboard after paste so the transcript is not kept there

AI processing:

Local llama.cpp model only (no remote provider configuration).

Control:

make run
make check