Wednesday, July 9, 2025

My MIDI journey

This blog post was originally a text file included with one of my programming projects. I decided to publish it here, as it contains some rather significant historical information, and I wanted to make it available to a wider audience. As this was originally an ASCII text file, I would like to apologize for the lack of attention to formatting. In preparation for posting to this blog, I replaced new line characters with the br HTML tag. With that out of the way, on to the post itself.
Welcome to the Timidity DirectShow filter, a truly cursed piece of software with an interesting backstory. Timidity is a basic MIDI player that is rather light on resources, and DirectShow is a legacy DirectX based multimedia architecture in Microsoft Windows. An example of an application that uses DirectShow is Windows Media Player. I can already hear many of you saying something along the lines of "Windows Media Player can already play MIDI files. Why reinvent the wheel?" There is a story of how this came to be, so sit back and enjoy. This document started as a readme file, however it became much more than that.
Let's start with a somewhat long personal narrative. It is fair to say that I discovered MIDI (short for Musical Instrument Digital Interface) by Accident. For a bit of background, I am a totally blind person who has experience with specialized devices called note takers. These things were essentially personal digital assistants or PDAs for the blind, and they were extremely popular before smartphones and laptops became commonplace. Throughout my childhood, I used a note taker called the BrailleNote, which was manufactured by an assistive technology company known as HumanWare. Several BrailleNote models were produced over the years, however I am going to focus on the mPower and Apex (released in 2005 and 2009 respectively). An Important thing to note is that these devices ran Windows CE, which was a stripped down version of Windows for embedded devices.
In December of 2011, I downloaded an archive of old Windows sound schemes from a website called blind-geek-zone.net. My BrailleNote Apex was more or less my primary computing device at the time. I was browsing the archive (after a slow download and extraction process) and was randomly pressing enter on files to see what I could play. For files that didn't have any association, nothing happened which wasn't unexpected. Suddenly, after pressing enter on canyon.mid in the Windows 2000 folder, I got an error message stating the system was unable to open the file. I was rather puzzled, and didn't think about it for some time, as my 12 year old self most likely assumed the files were corrupted. Disappointed, I moved on.
Funnily enough in January of 2012, someone posted about the exact issue I was experiencing to the BrailleNote users email list (which was run by HumanWare at the time), so my curiosity was refreshed. To summarize what I learned from that discussion, the BrailleNote mPower could play MIDI files, while the Apex obviously couldn't. It is worth noting that for media playback, the BrailleNotes used Windows Media Player with a specialized user interface.
As a bit of an aside, when I learned that MIDI is instructions for a digital synthesizer rather than actual sound data, I was extremely amazed. The fact that I can fit thousands of songs in only a few MB without sacrificing audio quality still amazes me to this day. I also like the idea that I am witnessing a performance when listening to sequenced music such as MIDI. To clarify, I have had experience with MIDI based musical keyboards since I was much younger, however 2012 is when I first discovered MIDI and learned what it actually was. In a nutshell, MIDI is a protocol that allows computers and electronic instruments to communicate with each other. In other words, It is essentially digital sheet music for a hardware or software synthesizer, as it consists of commands for actions such as starting and stopping notes, selecting instruments, and setting various types of parameters such as channel volume and stereo pan position.
Getting back on track (no pun intended), when I discovered the mPower could play MIDI files, I was really curious about how this sounded. Unfortunately, I no longer had my mPower at the time, as I traded it in for my Apex. Fortunately, I went to a state school for the blind, so I convinced a teacher to bring over their mPower so I could conduct some quick tests. When I played canyon.mid on the mPower, I was exposed to the rather buggy MIDI support in Windows CE 4.2. Unfortunately I only got to mess with it for a couple minutes, but it was enough time to really get me intrigued.
When I got home that day, I discovered a computer program that could convert MIDI files to MP3 using soundfonts, so I had a bit of a cheap workaround to play MIDI files on my Apex. The soundfont I used was Chorium, and I forgot what the native BrailleNote mPower MIDI playback sounded like for several months.
In the summer of 2012, I was informed of a website that could convert MIDI files into MP3 using several different soundfonts, and this was crucial as I could access the website directly on my BrailleNote. I even made a recording demonstrating the process using the internal microphone of my Apex. In the recording, I stated that Chorium was the soundfont used on the mPower, however I was totally wrong on that one.
When the next school year came around, I found an opportunity to experiment with an mPower again. After closely listening to the MIDI playback on the device, I realized what I was hearing was really unique. Over the summer I had gotten rather familiar with the infamous Microsoft GS Wavetable synth on my Windows desktop computer I had at the time, and what I was hearing on CE was for lack of a better term, uniquely different. It was also around this time I figured out how to change the default MIDI sound set by replacing wince_gm.dls with the gm.dls file that is shipped with regular desktop Windows.
I also discovered the Windows CE Media player supported RMI files, which are MIDI files in a RIFF container. The RMI file association wasn't implemented on the BrailleNote, so I had to get a bit creative in order to play these files. I could either rename the extension to .mid or point the web browser directly to the file without renaming the extension. I also discovered that the media player technically supported Karaoke MIDI files, which have a .kar extension. It turns out the reason for this is quite simple, as KAR files are basically standard MIDI files with extra metadata. In order to play these files back, I had to rename the extension to .mid. On a somewhat related note, there was also an association for files with a .midi extension, however files with a .mid extension were much more common.
A month or two later, the school I attended had a power outage. As a result, A power serge shot the backup battery of my BrailleNote Apex, so it wouldn't keep the current time when the unit was reset (analogous to a system restart). A hilarious side effect of this problem was that when I set the system to the correct time, I apparently had over 18000 missed appointments, as the date was set to January of 2006 if I recall correctly. Shortly after this unfortunate incident occurred, I got a BrailleNote from my school's tech lab to use in the meantime, and of course it was an mPower thanks to my request. This gave me an opportunity to experiment with the MIDI support in Windows CE 4.2 for a couple months.
I spent that time downloading various MIDI files from all over the internet using the newly acquired mPower with a CompactFlash Ethernet card for connectivity. I switched between the default Windows CE sound set and the sound set used by the Microsoft GS Wavetable synth, and determined the MIDI player itself had rather limited GS capabilities. Although I could access GS drum kits, I couldn't access GS melodic patches.
I will admit, the day I got my Apex back was a bit of a sad one. It was also at this time the primary service I used for converting MIDI files into MP3s more or less stopped working properly on my Apex. It is worth mentioning that the web browser on these BrailleNotes was Internet Explorer 6 with a specialized user interface. Looking back on it, I am amazed how I got around on the internet in my childhood with such an ancient browser.
Backtracking a bit, in early 2012 I got to see a BrailleSense note taker at a convention. The BrailleSense was developed by a company called HIMS, and was a close Competitor to the BrailleNote. Of course, I asked if the device could play MIDI files, and I was told it could. I immediately plugged in an SD card with some MIDI files that I had previously downloaded, and played some of them back. Right away, I noticed The playback sounded generally better and more stable than the BrailleNote mPower. I was curious what the BrailleSense was using for MIDI playback, and this is something I figured out several years later.
I will admit, if I discover a mundane device such as a note taker for the blind can do something obscure such as play back MIDI files, I am immediately interested. I guess it is the performance aspect of the format that peeks my interest, as the combination of hardware and or software can have a huge impact on the audio output quality. I still listen to MIDI files to this day, however most of them are much more Sophisticated than I ever could have imagined when I first discovered MIDI back in 2012.
In 2013, most of my MIDI related activity was on my desktop computer, so the MIDI playing note taker phenomenon went a bit to the wayside. In august of 2013, I discovered line-in cables, so I wanted to make some recordings of the Windows CE MIDI playback. It turned out that one of my tech instructors had a VoiceNote mPower on lone from a friend of his, so he let me take it home for two weeks or so. I made many recordings of the synthesizer performing various files, however my 2013 self unfortunately recorded them at a low volume and exported the files as mono.
In late 2014 or early 2015, I got a chance to use an mPower again. This was one of the more difficult interactions I had, as the tech instructor that I was communicating with wasn't entirely sure what I was aiming to do. When I finally convinced her I wasn't going to break anything, she let me use the device for a few minutes. I quickly made a copy of wince_gm.dls on a thumb drive, and put a zipped version of it in the soundfonts section of the MIDI collection that was on my website at the time. My hope was that either myself or someone else would try to convert it into a soundfont, and it turns out that person was future me.
Later that day, I replaced the copy of gm.dls with wince_gm.dls in a Windows 2000 virtual machine I had set up on my computer, as I wanted to see what would happen. When I played canyon.mid, I only heard the kick drum, which I later learned was stored as an uncompressed PCM sample. But I am getting too ahead of myself.
Let's jump forward a few years. In early 2017, a friend of mine informed me that the BrailleSense used GUS (Gravis Ultrasound) patches for playing back MIDI files, and I quickly discovered Timidity and by that account GSPlayer. GSPlayer was an open source media player for Windows CE, and what the HIMS note takers used as their media playback engine. I also had a Freedom Scientific PAC Mate Omni, which was a note taker that ran Windows Mobile 6. Unlike the other note takers discussed so far, this one gave the user more or less full fledged access to the Windows Mobile interface. This allowed me to give GSPlayer a spin in it's native environment.
It was also around this time I learned a bit more about the MIDI support included with Windows CE 4.2. Apparently, Microsoft ported DirectMusic to Windows CE, which explained why the functionality existed in the first place. It was also discovered that There was a registry key which stored the path to the DLS file used by the synthesizer module. This meant it was possible to customize the sound set by editing the Windows CE registry. Interestingly, the desktop version of the DirectMusic synthesizer implemented reverb, and it appears this was either disabled or removed in the CE version. This isn't too surprising, as processing effects such as reverb can eat up significant amounts of CPU time, especially on embedded devices with limited processing power.
It is worth comparing and contrasting the two approaches the BrailleNote and BrailleSense took as far as media playback was concerned. HumanWare used the media player stack that was part of Windows CE, and HIMS used a 3rd party open source player. Perhaps HIMS felt the stock media player wasn't adequate for their needs, so they turned to a 3rd party and arguably better alternative. HumanWare on the other hand just used whatever they had easy access to and called it a day.
Later in 2017, another friend of mine soled me his BrailleNote mPower that he no longer had a use for, so I knew I found yet another opportunity to experiment with the mysterious Windows CE MIDI support, and this time indefinitely. It was also around this time I wanted to seriously get the Windows CE sound set converted into a soundfont so I could use it in more modern things. Back in early 2016, another person doing similar investigations determined the soundbank included with Windows CE 4.2 was in the mobile DLS format. This was a version of the DLS file format that allowed samples to be compressed with various codecs. Unfortunately, most tools designed to work with DLS files couldn't load wince_gm.dls, most likely because the proper codecs weren't installed on the system. I will have more to say on this a bit later.
In early 2018, I was experimenting with the effects of various audio compression formats using Audacity. When I exported a WMA (Windows Media Audio) file and listened to it, I noticed similar compression artifacts compared to what I was hearing with the default MIDI soundbank included with Windows CE 4.2. In the back of my mind, I wondered if wince_gm.dls used samples encoded in WMA. At the time, I didn't think much of this, however this experimentation is important context for a future discovery.
Fast forward to late 2019 when I discovered the Windows CE 4.2 platform builder. I did a bunch of hackery in order to attempt to get the old MIDI codec included with Windows CE 4.2 running on my Apex (which I still had at the time) and got as far as the media player loading the file and displaying the total time of the sequence. It didn't error out, however it was more or less silent.
In July of 2020 I was researching random things as I usually do when I am bored, and I found a tool that could dump the samples from a DLS soundbank such as the one included with Windows CE. As previously mentioned, it was discovered several years earlier that the soundbank included with CE used samples in a compressed format, and no DLS supporting tools tested thus far were able to decode the samples. The only tool that got remotely close was DirectMusic Producer, as it was only able to decode 28 out of 353 samples. I later discovered that these 28 samples were stored as raw PCM, so no decoding was necessary in that case. Crucially, the tool I just discovered would dump the samples without attempting to decode them. This allowed me a glimpse into the raw sample data.
I loaded the extracted samples into SoX, and I got an error saying the file couldn't be loaded, and this included a format ID (0x161 if I remember correctly). After doing a bit of research, I discovered I was in fact working with Windows Media Audio in a RIFF WAV container. This reminded me of my file compression experiments in early 2018, and I am amazed that it took me so long to put two and two together.
After finding a way to decode the sample data to raw PCM, I ended up extracting various types of instrument data from wince_gm.dls using a modified Python script that was initially written by another individual. This data included various things such as loop points, envelopes, and sample boundaries. Using this information along with the converted PCM sample data, I painstakingly created a Windows CE GM soundfont.
In August of 2020, I found an appropriate ACM (Audio Compression Manager) codec (DivX 3.11 alpha), which finally allowed me to load the original wince_gm.dls file into DirectMusic Producer without any errors. This newly discovered codec was essentially a hacked version of the Microsoft MPEG-4v3 codec, which ironically isn't actually MPEG-4 compliant. As I downloaded and installed this codec, I was rather skeptical, as I expected it not to work. To my absolute surprise, installing this codec resulted in an uncompressed DLS being created in seconds as soon as DirectMusic Producer loaded the compressed soundbank.
It turned out that all the manual work I had done previously was kind of pointless. It was then trivial to convert the uncompressed DLS file that was generated by DirectMusic Producer into an SF2 soundfont using Awave Studio. At long last, I had properly preserved the Windows CE GM sound set, which interestingly enough contains a special drum kit with compressed Windows 9x and CE sounds. This drum kit can be accessed on program 1 of channel 10.
It amazes me that all of the tools that I used to do the conversion were around for decades. DirectMusic Producer was from 2001, DivX 3.11 alpha was from 1999, and the version of Awave Studio I used was from 2007. I suspect the reason no one else outside the BrailleNote user community seemed to know about the Windows CE MIDI support was because most device manufacturers couldn't be bothered to include it in their platform images. Considering how flawed it was, this isn't too surprising. The dedication of a few nerds can go a long way.
It is worth noting the file size difference between the original DLS and the newly uncompressed version. The original wince_gm.dls file was around 500 KB in size, while the uncompressed soundfont is around 5 MB respectively. That is literally a file size increase by 10 times.
According to metadata left in some of the samples, this sound set was worked on throughout 1999. This metadata also revealed that Nathan Grigg and Ken Kato were involved in this project, and that SoundForge 4.5 was used to save some of the samples. Ken Kato is the name found in the samples for the Windows CE UI drum kit discussed above. Considering the timeframe when this was worked on, and the fact that it was intended for mobile devices of the time, it could have been a lot worse. Microsoft could have easily used an extremely small bank of raw PCM samples, but instead decided to leverage their sound design team as well as audio compression technology that was quite new at the time.
It is a shame that an uncompressed version of this unique soundbank never shipped with desktop versions of Windows. I view this as a missed opportunity, as this is essentially Microsoft scrapping their own sound set and making it exclusively available on Windows CE. For context, the Microsoft GS Wavetable synth on desktop Windows uses a collection of rather low quality Roland samples from various editions of their Sound Canvas line of synthesizers.
The amazing thing about this preservation effort is that the SF2 conversion of the Windows CE GM soundbank sounds much better than the original ever did. the WMA decoder on Windows CE produced pre-echo, which caused the output to sound off in terms of timing. The decoder would also inject bits of other samples already loaded in memory into current samples, which caused the audio to sound rather broken. It is also worth noting that seeking wasn't properly supported in the MIDI file playback component itself. A notable bug relating to this is when a tempo change is encountered, the time calculations get messed up, for lack of a better term. A side effect of this particular bug is the reporting of information such as elapsed time and track length become inaccurate.
The decoding process in DirectMusic Producer didn't result in any pre-echo in the generated PCM sample data, so everything is perfectly timed during playback using the newly created soundfont. The MIDI synthesizer on CE ran at 22050 hZ, and there are samples in this sound set that are sampled at 44100 hZ. For the first time in for ever, these samples can be heard in their full fidelity, not taking into account baked-in WMA compression artifacts in the samples themselves. The soundfont sounds even better with SINC interpolation, as well as a decent reverb and chorus added into the mix.
That was a long story. Now back to the Timidity DirectShow filter, or rather the motivation as to why this hacked together project exists. Native MIDI support was removed in Windows CE 5.0 (which is what older versions of the BrailleSense ran), and as the BrailleNote Apex ran CE 6.0, it is no surprise MIDI files don't play out of the box. It is worth noting that the MIDI support on the mPower was never officially documented by HumanWare, most likely dew to it's buggy nature. During the CE upgrade process when developing the Apex, they Apparently forgot to remove the MIDI file association. This isn't much of a surprise, as MIDI support is a rather niche feature for a device such as a note taker for the blind.
In late 2024, I created a VSTi (Virtual Studio Technology Instrument) version of Timidity. I used the source code for the GSPlayer MIDI plug-in as my starting point, and made it into a nice multi instance C library that can be used in both a live context such as the backbone of a VST instrument, or as a simple MIDI file player that can be integrated into a media player application. To clarify, this has absolutely nothing to do with the Windows CE GM soundbank preservation effort discussed previously. I wrote about that for context, and to document the process.
The source code for older versions of ASAP (Another Slight Atari Player) include a DirectShow filter, so I adapted it to work with my library version of Timidity. The resulting filter might work on a BrailleNote Apex, however I don't currently have access to one in order to test this. It works on my mPower though, and has proper seeking functionality unlike the native MIDI support in CE 4.2. As far as recognized file extensions, The Timidity filter can open mid, midi, and kar files.
There are some other notable differences in this new DirectShow filter when compared to the native Windows CE MIDI playback that I feel are worth mentioning. Firstly, the sample rate of the PCM stream generated by Timidity is shown when media file information is queried. I suspect the reason this information wasn't shown in the native MIDI filter is because it was most likely a more complex architecture. If I were to guess how it was implemented, the filter simply passed the MIDI file to DirectMusic for processing, so it makes sense why the filter wouldn't know about the audio output characteristics of the synthesizer. Relating to this, the default sample rate of Timidity is 44100 hZ, which is twice the sample rate of the DirectMusic synthesizer.
Another interesting quirk of the native MIDI support in CE 4.2 is that when the system was under load, the tempo would become unstable. My hypothesis is that Reading MIDI events and generating the audio stream most likely took place on separate threads, and one thread had a higher priority than the other. Timidity runs on a single thread and generates a sample accurate audio stream, so there are no tempo changes when the system is under significant load. At worst, audio dropouts will occur if Timidity can't generate audio in real time, which is highly likely if other demanding tasks are simultaneously being performed while playing MIDI files.
When compared to DirectMusic, The Timidity filter loads samples faster on startup, at least with the default settings. By default, dynamic instrument loading is enabled, which only loads the necessary instruments needed in order to play a particular MIDI file. This is crucial on embedded devices with limited RAM such as the BrailleNote. The anti-aliasing filter is also disabled by default, as it heavily uses floating point math. This can drastically slow down the loading process on less powerful CPUs, especially if a lot of samples are to be processed by the filter. Pre resampling of fixed pitch instruments (such as drums) is enabled by default, so this can result in a slightly longer load time if such instruments are used in the MIDI file that is going to be played. Considering the pre resampling algorithm uses fixed point arithmetic for the most part, I am willing to let this default setting slide as I feel the weight is worth the improved audio quality and playback performance.
Despite all it's flaws, Something I will give the native Windows CE MIDI support credit for is the fact that it stops rendering audio once all notes have decayed to silence. Most software based MIDI players I have experience with stop rendering audio as soon as the last MIDI event is reached, which results in an abrupt ending in many cases. The function in my Timidity library that renders audio from events in a MIDI file returns 0 once the end of the track is reached and all notes have stopped, so this abrupt ending problem doesn't exist in my Timidity filter.
I find it ironic that I found a potential solution to a problem that no longer impacts me. If this filter actually works on a BrailleNote Apex (no reason why it shouldn't as it uses self-contained MIDI routines), my past self would have really enjoyed the prospect of native MIDI file playback on their primary computing device. One of these days I hope to get my hands on a BrailleNote Apex and test this first hand. I have an installation file all ready to go for the occasion.
Speaking of installation, the process is quite straightforward. You register the DLL (Dynamic Link Library) using regsvr32 on desktop windows or a 3rd party equivalent for Windows CE such as GBRegsrv. After registration, you will have to extract gm16.zip to the root of your C drive if you are setting up on a Windows computer, or the root of the Flash Disk if setting up on a BrailleNote. If all set up correctly, you can enjoy BrailleSense like MIDI playback in applications that use DirectShow such as Windows Media Player.
GM16 is pretty much a higher fidelity version of the patch set that was distributed with GSPlayer. Unlike the restoration of the Windows CE GM soundbank that was found on devices such as the BrailleNote mPower, reconstructing the GSPlayer bank was quite simple in comparison. It involved digging through old archives of Gravis Ultrasound patch files (found in the ULTRASOUND BBS Archive), and finding the right samples. Furthermore, metadata included in the patch files such as title and author information wasn't lost during the 8-bit down-conversion, which made the process go by a lot faster than it would have otherwise. The only sample I was unable to find an original version of was the xylophone, so I used a better sounding patch that was apparently created by the same author as the original.
For a bit more background, the patches distributed with the GSPlayer MIDI plug-in were down-converted from 16 to 8 bits to save space on early mobile devices. Curiously, it appears the xylophone was already an 8-bit patch. It had a 1996 timestamp while every other patch had a March 2001 timestamp, which is when the 8-bit conversion presumably took place. I later found an archive named shominst-0409.zip, which contained the original versions of the GSPlayer patches. The 16-bit files in the archive match with my previous reconstruction project, and the xylophone patch was exactly the same as the version distributed with GSPlayer. Bringing all of this together, the GSPlayer developer most likely took the patches from shominst-0409.zip, and compressed them for use with GSPlayer as necessary.
Going back to my DirectShow filter, Settings are read from and written to the registry when the plug-in starts up and shuts down respectively. This means you can change the path of the sample library if desired. On desktop Windows you can use regedit, however on Windows CE you will need a 3rd party tool to edit the registry of the device from a desktop computer. This requires ActiveSync if you are running on Windows XP or older, or Windows Mobile Device Center if you are running on Windows Vista or newer. After getting this set up, you will need a Windows CE registry editor. The tool I recommend for this is CeRegEditor. At long last, I found a potential solution for playing MIDI files natively on a BrailleNote Apex. My past self would be extremely proud. I can almost imagine my past self showing off this project to the BrailleNote user community, and possibly notifying HumanWare of it's existence.
Now for some more technical info on the build requirements for this project. The minimum build requirements for the Win32 version are Visual C++ 6 and the DirectX 8 SDK. Before building, you will have to set an environment variable that contains the bass path of the DirectX SDK. The name of this environment variable is MSSDK, and it can be set in the system properties of the Windows Control Panel.
For the Windows CE build, you will need at minimum Embedded Visual C++ 4.0 and the Windows CE 4.2 SDK. I was surprised to find out that the standard SDK that ships with Embedded Visual C++ 4.0 includes the DirectX headers, however I had to source the libraries from the Windows CE 4.2 Platform Builder. The only supported processor architecture for CE is ARMv4, as that is what the BrailleNote uses.
Update: I got in touch with a friend who still has a working BrailleNote Apex, and they were willing to test my Timidity DirectShow filter. The installation process was smooth for the most part, as the DLL had to manually be registered after the installation. Apparently my shortcut for automatically registering the DLL on startup doesn't work on the Apex. After that was sorted out, it was time for the ultimate moment of truth. The installation includes some sample MIDI files for testing, so my friend pressed enter on one of the files to see what happened. There was a short delay which was expected, as the filter was presumably loading the necessary samples into RAM. Instead of music, we heard an application error happily announcing a crash, and the system froze which required a reset in order to get it back up and running.
Considering all the hurdles I faced up until this point, I almost gave up and determined native MIDI playback on a BrailleNote Apex was a lost cause. I had one more thing to try, and if this didn't work I would call it quits as I was running out of ideas. I got Visual Studio 2005 and the Windows Mobile 5.0 SDK installed, and after configuring the compiler to not treat wchar_t as a native type, I got my code compiled against the newer SDK.
It turns out that later versions of ASAP include a CE port of the DirectShow filter component, and when attempting to register it on my BrailleNote mPower, I got an error code of 193, which basically means the DLL failed to load. This was the same error I got when attempting to register the Timidity DirectShow filter that was compiled against the Windows Mobile 5.0 SDK on my mPower. I know another friend of mine got the ASAP filter running on an Apex in the past, so I am highly confident that this new build of my Timidity filter might work properly on an Apex. The fact that I consider an error message to be a good sign proves I spend way too much time in front of my computer messing with obscure software.
Another update: the new build was tested on an Apex. As with the previous attempt, there was a crash during or after the sample loading process. I have officially given up at this point, as I am out of ideas for getting this thing to run properly on the device that started this whole journey. Strangely, I am neither particularly sad or disappointed, as I have always felt the Apex was one of the glitchiest BrailleNote models ever produced. For a humorous take on the situation, it's as if the BrailleNote Apex never wanted to natively play MIDI files in the first place, and was trying to tell me it wasn't interested by displaying cryptic error messages, not playing anything at all, or in the worst case crashing and freezing the entire system.
The most important things I gained from this whole experience are that I tried everything I could think of to get MIDI files natively playing on the Apex, and that the software I ultimately put together to attempt to solve this problem runs on the mPower, which I will admit is my absolute favorite BrailleNote model. I definitely had fun working on this project, and am satisfied with the result despite it not working properly on the intended target device.
I hope you enjoyed reading this Memoir, as it documents a lot of things that I didn't have the motivation to write down until now. I believe this is the longest technical document I have ever written so far, and hopefully you learned something interesting by reading this.

Tuesday, June 13, 2023

The strange mystery behind Window Eyes and DECtalk, evidence and analysis

For those who don't know, Window Eyes was a Windows screen reader that was around from 1995 until 2017. starting with version 4.5, Window Eyes shipped with a version of the DECtalk Access32 text-to-speech engine. Now, I can already hear many of you asking, what is so special about a text-to-speech system that was included with a screen reader that is no longer being developed? I am glad you asked, because there is quite a mystery surrounding this particular topic. Thanks to a long-lost interview and some recently surfaced DECtalk source code, we can now take a deep dive into this interesting case.

For many years, it was rumored that Window Eyes shipped with version 4.62 of DECtalk, however version strings in the synthesizer DLL and over all voice quality suggest DECtalk 4.60. As far as I can tell, the 4.62 rumor got started because of an interview on ACB Radio's Main Menu, which is a weekly show about technology from a blindness perspective. The interview revolves around what was new in Window Eyes 4.5, which was released in late 2003. Let's listen to the segment on DECtalk and here what Doug Geoffray has to say on the matter.

If you listened to the above clip, you may have noticed GW Micro wasn't certain 4.62 was the DECtalk version they were going to ship with Window Eyes. When I cut the interview clip for the video, I added a short sample of DECtalk 4.62 at the end. This sample reveals that 4.62 wasn't the version that shipped with Window Eyes, and it also demonstrates an interesting bug with this particular version of DECtalk that we will look at later. With the 4.62 rumor out of the way, this leads us to the conclusion that Window Eyes shipped with DECtalk 4.60, and Fortunately we can do some scientific testing to back this up.

Once upon a time, Kurzweil 1000 came bundled with a copy of DECtalk 4.60, and this version sounds almost identical to the Window Eyes copy. In fact, if we generate 2 audio files of these different builds synthesizing the same text and mix them together with the polarity inverted on one of the tracks, the voiced portion of the waveform is what remains. Reducing the voiced gain a bit on the K1000 build and performing the same test results in a perfect null, so we can conclude Window Eyes shipped with a patched version of DECtalk 4.60. Interestingly, Window Eyes shipped with the DECtalk 4.61 dictionary, possibly because it had improvements that GW Micro liked. Another point of interest is the fact that this version of DECtalk was compiled on June 13, 2003, over 2 years after DECtalk 4.61's release. Also, the location of the registry key that DECtalk used to get the path of the main dictionary was changed to "GW Micro, Inc.\DECtalk-OEM" from "DECtalk Software\DECtalk". It seems incredibly strange that a company would update an older version of their product for a client when a newer version of said product is available. This next section is going to be more technical, as we are going to dive into some history and source code in order to understand how we got to this point.

In April 2001, Force Computers released DECtalk 4.61 to the world. It wasn't received well by most long time users, as it sounded very different when compared to the versions released by Digital Equipment Corporation and Smart Modular Technologies. Fortunately, some of the developers really cared about the end users of the product, and some attempts were made to improve the voice quality to meet customer demands. Sadly, most of these projects didn't really take off, and it isn't hard to understand why. speech synthesis technology had significantly improved since DECtalk's heyday in the 80s and 90s, and it was one of the highest quality options when compared to most other text-to-speech systems of the era. By the late 90s, Most application developers had moved on to more modern speech engines, and this is quite understandable as DECtalk is quite dated even by late 90s or early 2000s standards.

One of these DECtalk improvement projects is known as DECtalk 4.62, which is essentially version 4.61 with some updated components. Most of the files in the 4.62 source archive have a modification date of May 3, 2001, shortly after DECtalk 4.61's release. source code modification started in January 2003, and ended in May of that same year. These modifications included merging in updated versions of system level components such as the text-to-speech API and the audio driver, and making changes to the phonemic and letter to sound modules.

One of the changes to the phonemic module was adding in a mechanism that allows specific strings of phonemes to have a fixed duration for each phoneme in the string. The person who added this code must have made a typo, as one of these strings has a phoneme that is much longer when compared to all other phonemes in the string. This results in words such as never having an extremely elongated E sound, which is hilarious in some cases such as what I did at the end of the video I posted above. The file with this code is ph_sort.c, and was modified on March 11. The archive also contains a SAPI3 build that was compiled on February 28, and this build doesn't have the never problem. When examining the ph_sort.obj file that is associated with this particular build, it doesn't appear to have the array of phoneme strings and durations that the duration lookup feature relies on. With this information, we can safely assume this functionality wasn't implemented at this point in the development process. Once I understood the cause of the problem, It was trivial to remove the offending code and have a slightly improved listening experience.

In May, it appears some GW Micro specific defines were added to the letter to sound module. When GWMICRO is defined, some abbreviation processing code is skipped during compilation. If you remember from the interview, Geoffray stated that GW Micro was Working with Fonix to reduce the excessive abbreviation handling. These changes are proof that work has been done to address this, and it is worth noting that the interview was conducted in April before these changes were implemented.

As far as I know, DECtalk 4.62 was never officially released. This isn't surprising, as it doesn't sound too different from 4.61, except for some intonation and timing differences. However, this isn't the end of the story of DECtalk for GW Micro.

On June 13, 2003, DECtalk 4.60 Revision 8 was patched and built for GW Micro. Most of the source code for this version was last modified on December 13, 1999, which was a day before the K1000 version of DECtalk 4.60 was compiled. The components that were updated in 2003 were the audio driver and the vocal tract model, and the updates themselves were quite small. Some audio device reset fixes were merged into the audio driver, an the voicing gain was reduced by 2 dB in the vocal tract model to help reduce clipping. These are welcome changes, as they don't introduce any new problems. Also, if this code is compiled with the GWMICRO define, the synthesizer looks for the dictionary location under a different registry key. These findings in the source code Coincide with the tests and observations earlier in this post, so we can confidently say we have the source code for the version of DECtalk that ultimately shipped with Window Eyes. The screen reader itself may be abandoned, however the text-to-speech engine that many of it's users have gotten used to can potentially be used in other applications. As an interesting aside, this version of the code can be built with the NWSNOAA define to recreate the last version of DECtalk that was heard on NOAA Weather Radio.

So, there you have it. This mystery has been in the back of my mind for years, and I am glad to finally unravel it and share my findings. I hope you enjoyed reading this as much as I enjoyed figuring all of this out and writing this blog post.

Sunday, January 22, 2023

I am back from the dead!

It has been almost 11 years since I posted here, I totally forgot I had this blog set up. I recently got an archive of the defunct braullenoteusers.info site, and I was browsing old mailing list posts. I was reading messages that I had sent to the list back then, and I apparently posted a link to this blog. Perhaps I will post here more often.