On a functional level it's the fastest & the most convenient input method when you're not in front of the proper keyboard. Which even for me constitutes a big chunk of my life 🙃.
Taking a voice note allows me to quickly close an open loop of a stray though or a new idea and go on with my life - now mentally unburdened
Another important aspect is that taking voice notes evokes a certain experience of "fluency" for me.
That is because when you're taking a voice note, you can record a stream of consciousness without having to re-formulate or filter things.
Which allows you to avoid switching away from the context of original thought/breaking the flow to enter an "editor mode".
This is the reason that I'd occasionally take voice notes even if I'm in front of the proper keyboard.
See Iterating fast: voice dictation as a form of babble for more on this.
It's worth noting that I'm specifically talking about taking a voice note vs doing voice typing/dictation in this context.
Voice notes allow me to better harness the fruits of diffuse thinking mode (shower thoughts)
These days I take most of my voice notes, using a smartwatch, which I optimized to allow me start recording a new note in just one click.
This further contributes to "fluency" aspect of voice notes, minimizing friction for starting a new note.
Notes are automatically uploaded to Google Drive.
Then, I have a Pipedream workflow that would transcribe the notes using OpenAI API and send the transcriptions to Roam.
You probably should replicate the above workflow (or future iterations of it). I've developed mine independently and the key differences atm are:
Use multi-modal input with a GPT4o to directly give it audio (vs using Whisper to transcribe and an LLM to do post-processing)
I speculate that one can get better results with this approach, but I haven't really compared the results & not sure if this is worth the increased per-token costs.
Use Roam Matrix integration for ingestion vs Roam API. Mainly for historical contingency reasons as I built this before the backend API became available.
I might switch over to using API at some point, but for now this allows me to customize things more in a way that I like.
After transcribed entries are ingested into Roam - I process them by doing one of the following:
immediate execution
add SRS metadata
just fix up the note and add references to existing concept pages
Previous iterations of this used otter.ai and voice messages in Matrix Chat