Comparing Google's Eloquent to Open-source Alternatives
Benchmarking Google’s new Eloquent app against a popular open‑weight models with 50 transcripts from daily engineering work.
Raw model output (no cleanup) · WER = word error rate · good ≤15% · acceptable ≤35% · poor >35% · Eloquent ran via the app (capture-affected rows tagged); the other three ran directly on the audio files
44 of 44 samples
| Target Output mean WER below · lower is better | Eloquent (Google) 60.0% via app · BlackHole capture | Qwen3-ASR 18.7% direct on audio files | Parakeet 24.4% direct on audio files | Gemma 3n (E2B) 62.4% direct · open sibling of Eloquent's engine |
|---|---|---|---|---|
1General Or if there's some way to do this dynamically so that it's normalized to a standard audio playback range, that would be great too! | AcceptableWER 33.3%partly truncated if there's some way to do this dynamically. standard audio playback. that would be great, too. | GoodWER 0% Or if there's some way to do this dynamically so that it's normalized to a standard audio playback range, that would be great too. | GoodWER 8.3% Or if there's some way to do this dynamically, so that it's no lise to a standard audio playback range that would be great too. | AcceptableWER 20.8% There's some way to do this dynamically so that it's normalized to a standard audio playback range that we'd be good to. |
2General Also, in this build, I getting the error message that Onit is trying to record my screen. I've already given it screen recording permissions, so why does this keep showing up? | PoorWER 61.3%partly truncated I. that on and it's trying to work my screen. screen recording permissions. Why does this keep | AcceptableWER 32.3% Also, this building, you get a gear message that on it is trying to record my screen time. I've already given it screen recording permissions. Why does this keep showing up? | AcceptableWER 25.8% Also, with this build, I keep getting here message that on if it's trying to record my screen, I've already given it screen recording permissions. Why does this keep showing up? | PoorWER 93.5% I'm sorry, but I can't transcribe the speech in this audio. It appears to be unintelligible. |
3General Testing out Onit | PoorWER 66.7% testing out on it. | PoorWER 66.7% Testing out on it. | PoorWER 100% Passing out on it. | PoorWER 66.7% Testing out on it. |
4General I actually think the worst thing was that the dictionary silently failed the first time because the model was still downloading. | PoorWER 85.7%truncated thing was that | GoodWER 14.3% I actually think the worst thing was that dictionaries I only failed the first time because the model was still downloading. | GoodWER 9.5% I actually think the worst thing was that the dictionary sign only failed the first time because the model was still downloading. | GoodWER 14.3% I think the worst thing was that the dictionary sign only failed the first time because the model was still downloading. |
5General How come we're hiding the microphoneBarArea and suggestionBar when we're in trackpad mode? | PoorWER 84.6% hiding the microphone. | PoorWER 38.5% How come we're hiding the microphone bar area and suggestion bar? We're in trackpad mode. | PoorWER 84.6% How come we're hiding the microphone bar area and suggestion bar? Where can you track that node? | PoorWER 84.6% How many are hiding in the next bar area and suggestion bar? |
6General I agree with his assessment that not a lot of people are going to want to use the ZoomAudioDevice. | PoorWER 42.1% Parker this assessment that not a lot of people are going to want to use. use the Zoom audio device. | AcceptableWER 21.1% I agree with this assessment that not a lot of people are going to want to use the Zoom audio device. | AcceptableWER 31.6% I heard this assessment that not a lot of people are going to want to use the zoom audio device. | AcceptableWER 21.1% I agree with this assessment that not a lot of people are going to want to use the Zoom audio device. |
7General Which backend are you using in the offline test? Are you using coreML or FluidAudio? | AcceptableWER 26.7%partly truncated Which backend are you using in the offline test? Are you using Core ML or fluid audio? | AcceptableWER 33.3% Which backend are you using in the offline test? Are using Core ML or Fluid Audio? | AcceptableWER 26.7% Which backend are you using in the offline test are you using Core ML or Fluid Audio? | PoorWER 113.3% I'm sorry, but I am unable to transcribe the speech in this audio as it is currently unavailable. |
8Numbers Also in dictations that don't contain the word Onit and don't really contain anything that is close to the word Onit, I am seeing in our debug view that it's detected with scores under -10. | PoorWER 37.1%partly truncated Also in dictations that don't contain the word on it, don't really contain anything that is close to the word on it. I am seeing in our | AcceptableWER 17.1% Also, in dictations that don't contain the word on it, and don't really contain anything that is close to the word on it, I am seeing in our debug view that it's detected with scores under negative ten. | GoodWER 14.3% Also in dictations that don't contain the word onit and don't really contain anything that are is close to the word on it, I am seeing in our debug view that it's detected with scores under negative ten. | AcceptableWER 34.3% also indications of down contain the word on it and don't really contain anything better. He's close to the word on it. I am seeing in our debug view that it's detected with scores under negative ten. |
9General Anyway, I want to investigate the onboarding analytics events. | PoorWER 88.9% asking the on-boarding analytics about | GoodWER 0% Anyway, I want to investigate the onboarding analytics events. | GoodWER 0% Anyway, I want to investigate the onboarding analytics events. | AcceptableWER 33.3% I want to investigate the on-boarding analytics events. |
10General Does this code base use a Levenshtein distance for the custom dictionary? | PoorWER 41.7% This could be a use a little distance for the custom dictionary. | AcceptableWER 25% Does this codebase use the Levenshtein distance for the custom dictionary? | GoodWER 8.3% Does this code base use a Levenstein distance for the custom dictionary? | PoorWER 66.7% This could be used as a live event thing distance for the custom dictionary. |
11General Does this branch contain logic for positioning the correct and teach Onit UI near the pasted text? | AcceptableWER 23.5%partly truncated Does this branch contain logic for positioning the correct and teach on it EY near the patient text. | GoodWER 11.8% Does this branch contain logic for positioning the correct and teach on it UI near the pasted text? | GoodWER 11.8% Does this branch contain logic for positioning the correct and teach on a UI near the pasted text? | AcceptableWER 29.4% Does this branch contain logic for my positioning the correct teach on a YW near the pasted text? |
12General Quick edit is enabled, but I think that's going to be unrelated to the fix and teach Onit dialogue. | GoodWER 10.5%partly truncated quick edit is enabled, but I think that's going to be unrelated to the fix and teach on it dialogue. | AcceptableWER 15.8% Quick edit is enabled, but I think that's going to be unrelated to the fix and teach on it dialog. | AcceptableWER 26.3% A quick edit is enabled, but I think that's gonna be unrelated to the fix and teach on it dialogue. | PoorWER 42.1% I think that's gonna be unrelated to the fix and teach content dialogue. |
13General the delete keys would misalign all of the following keys. | AcceptableWER 20%partly truncated But leak keys would misalign all of the following keys. | AcceptableWER 20% Which keys would misalign all of the following keys? | AcceptableWER 20% Keys would misalign all of the following keys. | PoorWER 100% I |
14General There used to be some page in a wrong word that was like the transcription welcome page that asked you turn on the feature or not. | PoorWER 61.5%partly truncated to be some patient or I'm warning that was like page that asks you to turn. not. | AcceptableWER 15.4% There used to be some page in our homeboarding that was like the transcription welcome page that asked you to turn on the feature or not. | AcceptableWER 19.2% There used to be some page in our own boarding that was like the transcription welcome page that asked you to determine the feature or not. | PoorWER 50% Please be some patient or I'm worried that it's like a transcription. Welcome page that asks you to term on the feature or not. |
15General I'm able to run the KeyboardEval bundle. | PoorWER 100%truncated (no output) | AcceptableWER 28.6% I'm able to run the keyboard eval bundle. | PoorWER 57.1% Mill to run the keyboard eval bundle. | PoorWER 157.1% I'm unable to transcribe the speech as it is not present in the audio. |
16General Yeah, so we'd found some examples where the Levenstein distance was not giving good results. | AcceptableWER 20%partly truncated So we found some examples where the distance was not giving good results. | AcceptableWER 26.7% So we found some examples where that same distance was not giving good results. | AcceptableWER 33.3% Soviet found some examples where the momentum distance was not given good results. | PoorWER 113.3% I'm sorry, but I am unable to transcribe the speech in this audio as it is currently unavailable. |
17General And I suspect this is going to be a case with a lot of dictionary terms that people add. | GoodWER 10.5%partly truncated And I suspect this is going to be a case with a lot of terms that people have. | GoodWER 10.5% And I suspect this is going to be a case with a lot of t-shirting terms that people add. | PoorWER 36.8% And I suspect those are gonna be a case with a lot of t sharing terms that people have. | AcceptableWER 31.6% I suspect this could be a case with a lot of teaching terms that people have |
18General So in our Onit example it gets difficult because you often use that in a sentence. | PoorWER 81.2% So, in our auditing in the case of go because | GoodWER 12.5% So, in our auditing, it gets difficult because you often use that in a sentence. | AcceptableWER 18.8% So in our auditive input gets difficult because you often use that in a sentence. | PoorWER 162.5% I'm sorry, I am unable to transcribe the speech in this audio. The audio is too faint and unclear for me to accurately identify the words spoken. |
19General We were working on replacing the Levenshtein distance algorithm with a perplexity check. | PoorWER 38.5%partly truncated because we were working on algorithm with a perplexity check. | GoodWER 0% We were working on replacing the Levenshtein distance algorithm with a perplexity check. | GoodWER 7.7% We were working on replacing the Levenstein distance algorithm with a perplexity check. | GoodWER 7.7% We are working on replacing the Levenshtein distance algorithm with a perplexity check. |
20General We were testing this eval. | PoorWER 100%truncated (no output) | PoorWER 40% We're testing this eval. | PoorWER 40% We're testing this eval. | PoorWER 60% I'm testing this eQual. |
21General Can you open up all of the examples and what the model decided in a viewer so I can look through all of them? | PoorWER 45.8% you open all the examples and I'm not all decided, you know, if you were, so I can look through all of them. | AcceptableWER 20.8% Can you open all the examples and all decided in a viewer, so I can look through all of them? | AcceptableWER 25% I can you open all of the examples come home decided in a viewer so I can look through all of them. | PoorWER 54.2% I can't help you open all these examples. I'm not decided, and a few were so I can look through all of them. |
22Numbers It looks like in your implementation we’re using 18 as the average. | PoorWER 61.5%truncated Looks like in your implementation. | AcceptableWER 30.8% It's like in your implementation, using 18 as the average. | AcceptableWER 30.8% Looks like in your implementation we're using eighteen as the average. | PoorWER 53.8% It looks like in your computation we're in 18th century average. |
23General I was surprised that it got the letter I in with wrong. | AcceptableWER 33.3%partly truncated . I was surprised that I got the with wrong. | AcceptableWER 16.7% I was surprised that I got the letter I in width wrong. | AcceptableWER 16.7% I was surprised that I got the letter I in width wrong. | GoodWER 8.3% I was surprised that I got the letter I in with wrong. |
24General Okay, I just tried to run all the sessions. Can you look at the folder and see how many we got through? | PoorWER 40.9% Okay, I just tried to. continue the boulder and see how many we got through. | GoodWER 9.1% Okay, I just tried to run all the sessions. Can you the folder and see how many we got through? | GoodWER 13.6% Okay, I just tried to run all the sessions. Can you put a folder and see how many we got through? | PoorWER 50% Okay, just tried to run all the sessions. Can you tell me if you even got through it? |
25Numbers Okay, yeah, can do number 1? | PoorWER 100%truncated (no output) | AcceptableWER 33.3% Okay. Yeah. Can you do number one? | PoorWER 66.7% Okay, back in unit number one. | PoorWER 50% Okay, can we do number one? |
26Numbers Choose 10 more terms. Choose a proxy for each term. Run 40 more examples for each one. And let me know if that threshold works for all of them. | PoorWER 89.7%truncated choose to more terms. | GoodWER 13.8% Choose ten more terms. Choose a proxy for each term. Run forty more examples for each of them, and let me know if that threshold works for all of them. | GoodWER 13.8% We'll choose ten more terms. Choose a proxy for each term. Run forty more examples for each other. And let me know if that threshold works for all of them. | PoorWER 44.8% Just enter more terms. Just probably for each term. 40 marks each will be given. Let me know if that approach works for all of them. |
27General Okay, in another branch we are adding a delta for the NLL comparisons. | PoorWER 92.3% Okay. You know. | AcceptableWER 15.4% In another branch, we are adding a delta for NLL comparisons. | AcceptableWER 15.4% Okay, in another branch we are adding a delta F4 NLL comparisons. | PoorWER 46.2% in another branch we are adding a down the four in a comparisons. |
28General The data is usually linearly inseparable, so long as you choose the right proxy. | PoorWER 100%truncated (no output) | PoorWER 35.7% Placement is usually linear and separable, so long as you choose the right proxy. | AcceptableWER 21.4% Replacement is usually inseparable, so long as you choose the right proxy. | PoorWER 121.4% I'm sorry, I'm not able to transcribe the speech in this audio. It appears to be unintelligible. |
29General The delta changes depending on the word in the proxy. | PoorWER 100% . That's that's. | GoodWER 0% The delta changes depending on the word in the proxy. | AcceptableWER 20% The delta changes depending on the current and the proxy. | AcceptableWER 30% The delta changes depending on the proxy. |
30General For example, if we use a bidirectional BART model, we can mask the target word and get an embedding vector. | PoorWER 100%truncated (no output) | GoodWER 5% For example, if we use a bidirectional part model, we can mask the target word and get an embedding vector. | AcceptableWER 35% For example, if we use a bi-directional mark model, we can mask key target word and ignite vector. | AcceptableWER 25% We can use a bidirectional part model. We can mask the target word and get an embedding vector. |
31General Oh sorry, can you do that? And then for all of the Onit examples, show me the replacement words that get generated by the model. | PoorWER 76%truncated replacement words that get generated by | GoodWER 8% Oh, sorry. Can you do that? And then, for all of the on it examples, show me the replacement words that get generated by the model. | GoodWER 8% Oh sorry, can you do that? And then for all up the common examples, show me the replacement words that get generated by the model. | AcceptableWER 28% I'm sorry, can you do that? And the problem of the on-and-example show me the replacement words that get generated by the model. |
32General Can you help me brainstorm some things that might be added to the dictionary that are not proper nouns? | AcceptableWER 26.3%partly truncated Can you help me answer on things that might add into the dictionary that are not proper nouns? | AcceptableWER 31.6% Can you help me transfer things in my data to the dictionary that are not proper nouns? | GoodWER 10.5% Can you help me grant some things that might be added to the dictionary that are not proper now? | PoorWER 57.9% I know my friends are things in the dictionary that are not proper nouns. |
33General Can you add latency logging around the CTC model inference? | PoorWER 70% Can you have latency live? | PoorWER 70% You had latency loss during the C D Z model inference. | PoorWER 90% You had latent life and my message department principles. | PoorWER 100% I can't hear anything. |
34General I'm using the app and I'm getting a ton of substitutions for Onit that should not be made. | PoorWER 50% using the app and I'm getting a ton of substitute. | GoodWER 11.1% I'm using the app, and I'm getting a ton of substitutions for on it that should not be made. | GoodWER 11.1% When using the app and I'm getting a ton of substitutions for admin that should not be made. | GoodWER 5.6% I'm using the app and I'm getting a ton of substitutions for it that should not be made. |
35General Can you add a skip button in addition to the yes it’s Onit and no keep as-is? | PoorWER 100%truncated (no output) | AcceptableWER 15.8% Can you add a skip button, in addition to the yes, it's on it and no, keep as is. | PoorWER 36.8% Can you add a skip button in addition to the assets on it and no key passes? | PoorWER 84.2% Yes, it's on and I'll keep talking. |
36General We were working on implementing something where when you add a new dictionary term, it scans through your history to find places where the term was used. | GoodWER 3.7%partly truncated Uh we were working on implementing something where when you add a new dictionary term, it scans through your history to find places where the term was used. | GoodWER 0% We were working on implementing something where, when you add a new dictionary term, it scans through your history to find places where the term was used. | GoodWER 3.7% Uh we were working on implementing something where when you add a new dictionary term, it scans through your history to find places where the term was used. | GoodWER 0% we were working on implementing something where when you add a new dictionary term, it scans through your history to find places where the term was used. |
37Numbers We got to 97 percent accuracy roughly. | GoodWER 14.3%partly truncated We got to 97% accuracy roughly. | AcceptableWER 28.6% We got to ninety-seven percent accuracy, roughly. | AcceptableWER 28.6% We got to ninety-seven percent accuracy, roughly. | GoodWER 14.3% We got to 97% accuracy roughly. |
38General However, for me to understand this, can you explain how did we choose which words to exclude as being autocorrected? Show me the exact function? | GoodWER 12%partly truncated However, for me to understand this, can you explain how do we choose which words to exclude as being auto corrected? Show me the exact function. | GoodWER 4% However, for me to understand this, can you explain how how did we choose which words to exclude, as being autocorrected? Show me the exact function. | GoodWER 12% However, for me to understand this, can you explain how we choose which words to exclude as being auto-corrected? Show me the exact function. | AcceptableWER 28% I have no understanding of this. Can you explain how to choose which words to exclude as being autocorrected? Show me the exact function. |
39General you run that now and open a viewer so I can see everything that it's flagging? | GoodWER 0% You run that now and open a viewer so I can see everything that it's flagging. | GoodWER 0% You run that now and open a viewer, so I can see everything that it's flagging. | GoodWER 6.2% You run that now and uh open a viewer so I can see everything that it's flagging. | AcceptableWER 31.2% You run that now in open review so I can see everything that's flagging. |
40General I want to evaluate if we can simulate typing data that looks like the actual typing data that we collected in our typing game. | PoorWER 100% weekend simulator. | GoodWER 0% I want to evaluate if we can simulate typing data that looks like the actual typing data that we collected in our typing game. | GoodWER 0% I want to evaluate if we can simulate typing data that looks like the actual typing data that we collected in our typing game. | PoorWER 91.7% I can't see any text in the image. |
41General Let's look in the CTC model, but then also the main parakeet model too. | PoorWER 100%truncated (no output) | AcceptableWER 21.4% Let's look in the C D Z model, but then also the main Parakeet model too. | AcceptableWER 28.6% Let's look in the syntaze model, but then I'll also mean parakeet model too. | PoorWER 92.9% I'm looking at this email with the model with the all the mean perky model two. |
42General for option B we want to not always fall back. | PoorWER 100%truncated (no output) | GoodWER 0% For option B, we want to not always fall back. | GoodWER 0% For option B we want to not always fall back. | PoorWER 110% I'm sorry, I can't transcribe the speech because it is unintelligible. |
43General Okay, the Onit keyboard is activated. | PoorWER 100%truncated (no output) | AcceptableWER 16.7% Okay, the onec keyboard is activated. | AcceptableWER 16.7% Okay, the iconic keyboard is activated. | PoorWER 200% I'm sorry, I am unable to transcribe the speech as it is not audible. |
44General pipeline that output only acoustic tokens without any semantic awareness? | PoorWER 80%truncated line that output. | AcceptableWER 20% Bedline that output only acoustic documents without any semantic awareness. | GoodWER 10% Deadline that output only acoustic tokens without any semantic awareness. | PoorWER 160% I'm sorry, I am unable to transcribe the speech in this audio as it is inaudible. |

