Switch to sounddevice recording

This commit is contained in:
Thales Maciel 2026-02-24 10:25:21 -03:00
parent afdf088d17
commit b6c0fc0793
No known key found for this signature in database
GPG key ID: 33112E6833C34679
9 changed files with 250 additions and 468 deletions

View file

@ -2,15 +2,15 @@
## Project Structure & Module Organization
- `lel.sh` is the primary entrypoint; it records audio, runs `whisper`, and prints the transcript.
- `env/` is a local Python virtual environment (optional) used to install runtime dependencies.
- There are no separate source, test, or asset directories at this time.
- `src/leld.py` is the primary entrypoint (X11 transcription daemon).
- `src/recorder.py` handles audio capture using PortAudio via `sounddevice`.
- `src/stt.py` wraps faster-whisper for transcription.
## Build, Test, and Development Commands
- `./lel.sh` streams transcription from the microphone until you press Enter.
- Example with overrides: `WHISPER_MODEL=small WHISPER_LANG=pt WHISPER_DEVICE=cuda ./lel.sh`.
- Dependencies expected on PATH: `ffmpeg` and `whisper` (the OpenAI Whisper CLI).
- Install deps: `uv sync`.
- Run daemon: `uv run python3 src/leld.py --config ~/.config/lel/config.json`.
- Open settings: `uv run python3 src/leld.py --settings --config ~/.config/lel/config.json`.
## Coding Style & Naming Conventions
@ -30,7 +30,5 @@
## Configuration Tips
- Audio input is controlled via `WHISPER_FFMPEG_IN` (default `pulse:default`), e.g., `alsa:default`.
- Streaming is on by default; set `WHISPER_STREAM=0` to transcribe after recording.
- Segment duration for streaming is `WHISPER_SEGMENT_SEC` (default `5`).
- Audio input is controlled via `WHISPER_FFMPEG_IN` (device index or name).
- Model, language, device, and extra args can be set with `WHISPER_MODEL`, `WHISPER_LANG`, `WHISPER_DEVICE`, and `WHISPER_EXTRA_ARGS`.