Switch to sounddevice recording

2026-02-24 10:25:21 -03:00 · 2026-02-24 10:25:21 -03:00 · b6c0fc0793
commit b6c0fc0793
parent afdf088d17
9 changed files with 250 additions and 468 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -2,15 +2,15 @@

 ## Project Structure & Module Organization

- `lel.sh` is the primary entrypoint; it records audio, runs `whisper`, and prints the transcript.
- `env/` is a local Python virtual environment (optional) used to install runtime dependencies.
- There are no separate source, test, or asset directories at this time.
+- `src/leld.py` is the primary entrypoint (X11 transcription daemon).
+- `src/recorder.py` handles audio capture using PortAudio via `sounddevice`.
+- `src/stt.py` wraps faster-whisper for transcription.

 ## Build, Test, and Development Commands

- `./lel.sh` streams transcription from the microphone until you press Enter.
- Example with overrides: `WHISPER_MODEL=small WHISPER_LANG=pt WHISPER_DEVICE=cuda ./lel.sh`.
- Dependencies expected on PATH: `ffmpeg` and `whisper` (the OpenAI Whisper CLI).
+- Install deps: `uv sync`.
+- Run daemon: `uv run python3 src/leld.py --config ~/.config/lel/config.json`.
+- Open settings: `uv run python3 src/leld.py --settings --config ~/.config/lel/config.json`.

 ## Coding Style & Naming Conventions

@ -30,7 +30,5 @@

 ## Configuration Tips

- Audio input is controlled via `WHISPER_FFMPEG_IN` (default `pulse:default`), e.g., `alsa:default`.
- Streaming is on by default; set `WHISPER_STREAM=0` to transcribe after recording.
- Segment duration for streaming is `WHISPER_SEGMENT_SEC` (default `5`).
+- Audio input is controlled via `WHISPER_FFMPEG_IN` (device index or name).
 - Model, language, device, and extra args can be set with `WHISPER_MODEL`, `WHISPER_LANG`, `WHISPER_DEVICE`, and `WHISPER_EXTRA_ARGS`.