Add Vosk keystroke eval tooling and findings

2026-02-28 17:20:09 -03:00 · 2026-02-28 17:20:09 -03:00 · 510d280b74
commit 510d280b74
parent 8c1f7c1e13
15 changed files with 2219 additions and 0 deletions
--- a/exploration/vosk/keystrokes/findings.md
+++ b/exploration/vosk/keystrokes/findings.md
@ -0,0 +1,31 @@
+# Vosk Keystroke Grammar Findings
+
+- Date (UTC): 2026-02-28
+- Run ID: `run-20260228T200047Z`
+- Dataset size:
+  - Literal grammar: 90 samples
+  - NATO grammar: 90 samples
+- Intents: 9 (`ctrl|shift|ctrl+shift` x `d|b|p`)
+
+## Results
+
+| Model | Literal intent accuracy | NATO intent accuracy | Literal p50 | NATO p50 |
+|---|---:|---:|---:|---:|
+| `vosk-small-en-us-0.15` | 71.11% | 100.00% | 26.07 ms | 26.38 ms |
+| `vosk-en-us-0.22-lgraph` | 74.44% | 100.00% | 210.34 ms | 214.97 ms |
+
+## Main Error Pattern (Literal Grammar)
+
+- Letter confusion is concentrated on `p -> b`:
+  - `control p -> control b`
+  - `shift p -> shift b`
+  - `control shift p -> control shift b`
+
+## Takeaways
+
+- NATO grammar is strongly validated for this keystroke use case (100% on both tested models).
+- `vosk-small-en-us-0.15` is the practical default for command-keystroke experiments because it matches NATO accuracy while being much faster.
+
+## Raw Report
+
+- `exploration/vosk/keystrokes/eval_runs/run-20260228T200047Z/summary.json`