Tuesday, June 13, 2023

The strange mystery behind Window Eyes and DECtalk, evidence and analysis

For those who don't know, Window Eyes was a Windows screen reader that was around from 1995 until 2017. starting with version 4.5, Window Eyes shipped with a version of the DECtalk Access32 text-to-speech engine. Now, I can already hear many of you asking, what is so special about a text-to-speech system that was included with a screen reader that is no longer being developed? I am glad you asked, because there is quite a mystery surrounding this particular topic. Thanks to a long-lost interview and some recently surfaced DECtalk source code, we can now take a deep dive into this interesting case.

For many years, it was rumored that Window Eyes shipped with version 4.62 of DECtalk, however version strings in the synthesizer DLL and over all voice quality suggest DECtalk 4.60. As far as I can tell, the 4.62 rumor got started because of an interview on ACB Radio's Main Menu, which is a weekly show about technology from a blindness perspective. The interview revolves around what was new in Window Eyes 4.5, which was released in late 2003. Let's listen to the segment on DECtalk and here what Doug Geoffray has to say on the matter.

If you listened to the above clip, you may have noticed GW Micro wasn't certain 4.62 was the DECtalk version they were going to ship with Window Eyes. When I cut the interview clip for the video, I added a short sample of DECtalk 4.62 at the end. This sample reveals that 4.62 wasn't the version that shipped with Window Eyes, and it also demonstrates an interesting bug with this particular version of DECtalk that we will look at later. With the 4.62 rumor out of the way, this leads us to the conclusion that Window Eyes shipped with DECtalk 4.60, and Fortunately we can do some scientific testing to back this up.

Once upon a time, Kurzweil 1000 came bundled with a copy of DECtalk 4.60, and this version sounds almost identical to the Window Eyes copy. In fact, if we generate 2 audio files of these different builds synthesizing the same text and mix them together with the polarity inverted on one of the tracks, the voiced portion of the waveform is what remains. Reducing the voiced gain a bit on the K1000 build and performing the same test results in a perfect null, so we can conclude Window Eyes shipped with a patched version of DECtalk 4.60. Interestingly, Window Eyes shipped with the DECtalk 4.61 dictionary, possibly because it had improvements that GW Micro liked. Another point of interest is the fact that this version of DECtalk was compiled on June 13, 2003, over 2 years after DECtalk 4.61's release. Also, the location of the registry key that DECtalk used to get the path of the main dictionary was changed to "GW Micro, Inc.\DECtalk-OEM" from "DECtalk Software\DECtalk". It seems incredibly strange that a company would update an older version of their product for a client when a newer version of said product is available. This next section is going to be more technical, as we are going to dive into some history and source code in order to understand how we got to this point.

In April 2001, Force Computers released DECtalk 4.61 to the world. It wasn't received well by most long time users, as it sounded very different when compared to the versions released by Digital Equipment Corporation and Smart Modular Technologies. Fortunately, some of the developers really cared about the end users of the product, and some attempts were made to improve the voice quality to meet customer demands. Sadly, most of these projects didn't really take off, and it isn't hard to understand why. speech synthesis technology had significantly improved since DECtalk's heyday in the 80s and 90s, and it was one of the highest quality options when compared to most other text-to-speech systems of the era. By the late 90s, Most application developers had moved on to more modern speech engines, and this is quite understandable as DECtalk is quite dated even by late 90s or early 2000s standards.

One of these DECtalk improvement projects is known as DECtalk 4.62, which is essentially version 4.61 with some updated components. Most of the files in the 4.62 source archive have a modification date of May 3, 2001, shortly after DECtalk 4.61's release. source code modification started in January 2003, and ended in May of that same year. These modifications included merging in updated versions of system level components such as the text-to-speech API and the audio driver, and making changes to the phonemic and letter to sound modules.

One of the changes to the phonemic module was adding in a mechanism that allows specific strings of phonemes to have a fixed duration for each phoneme in the string. The person who added this code must have made a typo, as one of these strings has a phoneme that is much longer when compared to all other phonemes in the string. This results in words such as never having an extremely elongated E sound, which is hilarious in some cases such as what I did at the end of the video I posted above. The file with this code is ph_sort.c, and was modified on March 11. The archive also contains a SAPI3 build that was compiled on February 28, and this build doesn't have the never problem. When examining the ph_sort.obj file that is associated with this particular build, it doesn't appear to have the array of phoneme strings and durations that the duration lookup feature relies on. With this information, we can safely assume this functionality wasn't implemented at this point in the development process. Once I understood the cause of the problem, It was trivial to remove the offending code and have a slightly improved listening experience.

In May, it appears some GW Micro specific defines were added to the letter to sound module. When GWMICRO is defined, some abbreviation processing code is skipped during compilation. If you remember from the interview, Geoffray stated that GW Micro was Working with Fonix to reduce the excessive abbreviation handling. These changes are proof that work has been done to address this, and it is worth noting that the interview was conducted in April before these changes were implemented.

As far as I know, DECtalk 4.62 was never officially released. This isn't surprising, as it doesn't sound too different from 4.61, except for some intonation and timing differences. However, this isn't the end of the story of DECtalk for GW Micro.

On June 13, 2003, DECtalk 4.60 Revision 8 was patched and built for GW Micro. Most of the source code for this version was last modified on December 13, 1999, which was a day before the K1000 version of DECtalk 4.60 was compiled. The components that were updated in 2003 were the audio driver and the vocal tract model, and the updates themselves were quite small. Some audio device reset fixes were merged into the audio driver, an the voicing gain was reduced by 2 dB in the vocal tract model to help reduce clipping. These are welcome changes, as they don't introduce any new problems. Also, if this code is compiled with the GWMICRO define, the synthesizer looks for the dictionary location under a different registry key. These findings in the source code Coincide with the tests and observations earlier in this post, so we can confidently say we have the source code for the version of DECtalk that ultimately shipped with Window Eyes. The screen reader itself may be abandoned, however the text-to-speech engine that many of it's users have gotten used to can potentially be used in other applications. As an interesting aside, this version of the code can be built with the NWSNOAA define to recreate the last version of DECtalk that was heard on NOAA Weather Radio.

So, there you have it. This mystery has been in the back of my mind for years, and I am glad to finally unravel it and share my findings. I hope you enjoyed reading this as much as I enjoyed figuring all of this out and writing this blog post.

No comments:

Post a Comment