Text-to-Voice software - for CapTel users who cannot speak clearly

Mark Rejhon

Member
Joined
Oct 5, 2003
Messages
355
Reaction score
10
Hi,

There are many different software that can do Text-to-voice. This is already common for the blind, such as screen reader software.

I am wondering if this can be retrofitted for CapTel - so that people who cannot speak clearly can use CapTel.

Outgoing conversation would be done by text-to-voice software, person would type and the software will speak out what he types.

Incoming conversation would be handled by CapTel, for captioning of incoming voice.

I could probably jerry-rig something using a CapTel handset, and a Linux computer system (or even Windows system using TAPI programming).... I would probably need to connect the CapTel handset to the computer system, or do 3-way calling with a voicemodem installed in the Linux computer.

Then anybody who can't speak clearly, can use CapTel this way!
 
Mark Rejhon said:
Hi,

There are many different software that can do Text-to-voice. This is already common for the blind, such as screen reader software.

I am wondering if this can be retrofitted for CapTel - so that people who cannot speak clearly can use CapTel.

Outgoing conversation would be done by text-to-voice software, person would type and the software will speak out what he types.

Incoming conversation would be handled by CapTel, for captioning of incoming voice.

I could probably jerry-rig something using a CapTel handset, and a Linux computer system (or even Windows system using TAPI programming).... I would probably need to connect the CapTel handset to the computer system, or do 3-way calling with a voicemodem installed in the Linux computer.

Then anybody who can't speak clearly, can use CapTel this way!


Wouldn't this method make it no different than using a TTY?

By the way, are you the same Mark Rejhon from the AVS HTPC forum? Thanks.
 
rushabh said:
Wouldn't this method make it no different than using a TTY?
By the way, are you the same Mark Rejhon from the AVS HTPC forum? Thanks.


Duh - The difference is 200 word per minute relay service. :)

I want the relay operator to be able to type at between 100 and 200 words per minute, and that's only achievable with CapTel's voice recognition technology.

Incoming conversation = CapTel voice recognition operator
Outgoing conversation = Computer synthesized voice from, say perhaps a script running on a Linux box connected to voice modem, automatically dictating my typed sentences in realtime.

It would be vastly superior to an ordinary TDD
And about three times faster!
And no need to explain relay service! (no need for speaker to talk slow)
And no need for either end to say "GA" or "go ahead" (natural conversation)

CapTel uses the same voice recognition technology now being used by some closed captioning systems and court operators. (Called face-mask voice recognition technology, with a trained operator repeating everything they hear, into a face mask that is piped into voice recognition software trained to the operator's voice). The speed is between 100 and 200 words per minute. It is almost as fast as stenotype machines (For example, CNN uses those for 250 word per minute live captioning of live newscasts).

Ordinary relay services are typically much slower...since they use regular keyboards rather than special live captioning equipment (which is what CapTel is now testing).

I want the other party on the phone to think they're speaking directly to me, and not to even know that there is an operator in the mix. CapTel is starting to make that possible in the field testing trials, however your voice must be clear enough for the outgoing chain of communications since the other party hears you directly. Unfortunately, my voice is not good enough to be understood by a stranger, so I need to patch in a voice-synthesis solution on top of CapTel once CapTel becomes available.

I assume you now understand why CapTel is superior to TDD/Relay now?

Yep, I am the same Mark Rejhon as from AVSFORUM.
 
Last edited:
I don't know if the system you envision could be used for non-clear speaking people. It sounds more like something that would be used for people who have trouble typing or a disability preventing them from typing, but can still speak for themselves. This just assumes you speak in a way that a computer can recognize your words in a meaningful way, and non-clear people is an avenue that may benefit overall from speech recognicion.

Once we get to the point that context sensitive voice recognicion software becomes 99% effective, then we can use this in a relay service. Words like "bare" and "bear" sound the same yet are context sensitive and need someone who can understand the context behind the word to properly display it.
 
Dennis said:
I don't know if the system you envision could be used for non-clear speaking people. It sounds more like something that would be used for people who have trouble typing or a disability preventing them from typing, but can still speak for themselves. This just assumes you speak in a way that a computer can recognize your words in a meaningful way, and non-clear people is an avenue that may benefit overall from speech recognicion..

Dennis, Mark was talking about a text-to-voice machine from his end of the CapTel phone conversation. The user would type what needs to be said and the device would convert that into voice and speak into the CapTel phone. So to the hearing person at the other end, Mark would sound like The Terminator :rl: rather than a warm human being. Thus for this whole thing to work, the user would have to be a good/fast typer.
 
rushabh said:
Dennis, Mark was talking about a text-to-voice machine from his end of the CapTel phone conversation. The user would type what needs to be said and the device would convert that into voice and speak into the CapTel phone. So to the hearing person at the other end, Mark would sound like The Terminator :rl: rather than a warm human being. Thus for this whole thing to work, the user would have to be a good/fast typer.

rusbah is right. The system is meant for fast typists NOT slow typists!!!!

I can type up to 140 words per minute and I can type almost as fast as I can speak. I realize that it would be somewhat robotic, it would at least allow me to conduct conversations at much higher speed than through the relay services. There are some new voice synthesis software that sounds like real humans, and that could be taken advantage of. There would be the issue of having no emotions in the voice. But then again I want to conduct business as I am self employed. I would have to be careful about humourous solutions, unless the voice synthesis software has an emotion feature which I could program(!).

An alternative is 3-way VCO -- and simply tie in an 711 relay operator into a CapTel conversation -- use the traditional relay operator to voice out what I am typing. This will be a little weird in some ways because I'd have to explain that it is outgoing text only, like a HCO call and to say that the conversation is simply to be verbatim without any "go-aheads". That latter option may be what I would use for now, if it can work in a pretty much full-duplex manner.
 
In the late 1980s the IBM developed and marketed a text-voice conversion software for deaf people which only required a modem and PC to work. It did not exist too long because it was too expensive for deaf people to buy this product which cost 400 dollars or more. The IBM consequently stopped marketing this software which really worked so well.
 
KingCobra said:
In the late 1980s the IBM developed and marketed a text-voice conversion software for deaf people which only required a modem and PC to work. It did not exist too long because it was too expensive for deaf people to buy this product which cost 400 dollars or more. The IBM consequently stopped marketing this software which really worked so well.

IBM should have open-sourced the software and take a business tax deduction accordingly.
 
Any open-source computer programmer can probably already replicate what IBM did, but using Linux instead and a voicemodem, plus a modified version of open source voice-to-text software.

Anybody willing to volunteer to research what existing Linux open source would be needed, for slapping a solution for the deaf together?
 
I have now made some updates, and added some more documentation to the source code. Plus the speech is a little better now.

Download Version 2.1 of Real Time Text-To-Speech here:
http://www.marky.com/files/tts/SpeechToTextTest_v21.zip

Source code included -- derived from Microsoft SimpleTTS SDK sample. To run, just unzip and then run SimpleTTS.exe ....and begin typing. Hit Enter whenever it is time for the other person to go ahead.
 
Back
Top