Add Vosk keystroke eval tooling and findings

This commit is contained in:
Thales Maciel 2026-02-28 17:20:09 -03:00
parent 8c1f7c1e13
commit 510d280b74
15 changed files with 2219 additions and 0 deletions

View file

@ -294,6 +294,51 @@ aman bench --text-file ./bench-input.txt --repeat 20 --json
the processing path from input transcript text through alignment/editor/fact-guard/vocabulary cleanup and
prints timing summaries.
Internal Vosk exploration (fixed-phrase dataset collection):
```bash
aman collect-fixed-phrases \
--phrases-file exploration/vosk/fixed_phrases/phrases.txt \
--out-dir exploration/vosk/fixed_phrases \
--samples-per-phrase 10
```
This internal command prompts each allowed phrase and records labeled WAV
samples with manual start/stop (Enter to start, Enter to stop). It does not run
Vosk decoding and does not execute desktop commands. Output includes:
- `exploration/vosk/fixed_phrases/samples/`
- `exploration/vosk/fixed_phrases/manifest.jsonl`
Internal Vosk exploration (keystroke dictation: literal vs NATO):
```bash
# collect literal-key dataset
aman collect-fixed-phrases \
--phrases-file exploration/vosk/keystrokes/literal/phrases.txt \
--out-dir exploration/vosk/keystrokes/literal \
--samples-per-phrase 10
# collect NATO-key dataset
aman collect-fixed-phrases \
--phrases-file exploration/vosk/keystrokes/nato/phrases.txt \
--out-dir exploration/vosk/keystrokes/nato \
--samples-per-phrase 10
# evaluate both grammars across available Vosk models
aman eval-vosk-keystrokes \
--literal-manifest exploration/vosk/keystrokes/literal/manifest.jsonl \
--nato-manifest exploration/vosk/keystrokes/nato/manifest.jsonl \
--intents exploration/vosk/keystrokes/intents.json \
--output-dir exploration/vosk/keystrokes/eval_runs \
--models-file exploration/vosk/keystrokes/models.example.json
```
`eval-vosk-keystrokes` writes a structured report (`summary.json`) with:
- intent accuracy and unknown-rate by grammar
- per-intent/per-letter confusion tables
- latency (avg/p50/p95), RTF, and model-load time
- strict grammar compliance checks (out-of-grammar hypotheses hard-fail the model run)
Model evaluation lab (dataset + matrix sweep):
```bash
@ -344,6 +389,8 @@ aman run --config ~/.config/aman/config.json
aman doctor --config ~/.config/aman/config.json --json
aman self-check --config ~/.config/aman/config.json --json
aman bench --text "example transcript" --repeat 5 --warmup 1
aman collect-fixed-phrases --phrases-file exploration/vosk/fixed_phrases/phrases.txt --out-dir exploration/vosk/fixed_phrases --samples-per-phrase 10
aman eval-vosk-keystrokes --literal-manifest exploration/vosk/keystrokes/literal/manifest.jsonl --nato-manifest exploration/vosk/keystrokes/nato/manifest.jsonl --intents exploration/vosk/keystrokes/intents.json --output-dir exploration/vosk/keystrokes/eval_runs --json
aman build-heuristic-dataset --input benchmarks/heuristics_dataset.raw.jsonl --output benchmarks/heuristics_dataset.jsonl --json
aman eval-models --dataset benchmarks/cleanup_dataset.jsonl --matrix benchmarks/model_matrix.small_first.json --heuristic-dataset benchmarks/heuristics_dataset.jsonl --heuristic-weight 0.25 --json
aman sync-default-model --check --report benchmarks/results/latest.json --artifacts benchmarks/model_artifacts.json --constants src/constants.py