Then I used "test mode" to dump the PARCOR data for the samples. I analyzed the data and came up with the following:
A sample is made up of frames sent every 12800 clocks (every 20ms at 640kHz).
Sample #0 has 20 frames, #1 has 22, #2 has 29, #3 has 23, #4 has 16, #5 has 18, #6 has 25, #7 has 99.
7378 bits are used out of 960*8 = 7680 in the chip's ROM, 96%.
A frame is one of 3 lengths: full frame=47 bits, short frame=12 bits, null frame=5 bits. The last frame of every sample is an end frame, made up of 5 one bits. There is also a frame of 5 zero bits that is likely a repeat frame.
A frame is made up of up to 10 subframes. A full frame has all 10 subframes, a short frame has the first 2 subframes, a null frame has the first 1 subframe:
subframe 0 is 120 clocks and has 5 DREQs
subframe 1 is 132 clocks and has 7 DREQs
subframe 2 is 128 clocks and has 6 DREQs
subframe 3 is 136 clocks and has 6 DREQs
subframe 4 is 128 clocks and has 4 DREQs
subframe 5 is 128 clocks and has 4 DREQs
subframe 6 is 128 clocks and has 4 DREQs
subframe 7 is 128 clocks and has 4 DREQs
subframe 8 is 132 clocks and has 4 DREQs
subframe 9 is ??? clocks and has 3 DREQs
A null frame is indicated by subframe 0 being 11111 for an end frame or 00000 for a repeat frame.
A short frame is indicated by the first bit of subframe 1 being 1.
I'm guessing each subframe is a speech parameter; amplitude, pitch, K-parameters, like the TI LPC data format:
Here is a CSV file with the ROM contents by frame.
I created a serial ROM in a CPLD to provide data to the chip's "external memory mode". When I load it with the data I dumped for samples 0 and 3, the M50805 plays back the same PWM as the internal ROM produces. The other samples output slightly different PWM, but when I convert that to WAV format, it sounds the same. So something must not be quite right with the other samples.
I decapped the chip and made this composite. I see the 312-byte decoding ROM left of center. The 960-byte paramter ROM is at the top, but I think it must be implant ROM because I can't see any bits. The top-left pad is pin 4, numbered counter-clockwise. The 2 pads in the lower right aren't connected to anything. There's an extra pad, probably because the 22-pin and 24-pin versions use the same die, and the clock output isn't pinned out in the 22-pin version.
Discussion on NesDev