MaixCAM MaixPy Continuous Chinese digit recognition
Update history
Date | Version | Author | Update content |
---|---|---|---|
2024-10-08 | 1.0.0 | 916BGAI | Initial document |
Introduction
MaixCAM
has ported the Maix-Speech
offline speech library, enabling continuous Chinese numeral recognition, keyword recognition, and large vocabulary speech recognition capabilities. It supports audio recognition in PCM
and WAV
formats, and can accept input recognition via the onboard microphone.
Maix-Speech
Maix-Speech
is an offline speech library specifically designed for embedded environments. It features deep optimization of speech recognition algorithms, achieving a significant lead in memory usage while maintaining excellent WER. For more details on the principles, please refer to the open-source project.
Continuous Chinese digit recognition
from maix import app, nn
speech = nn.Speech("/root/models/am_3332_192_int8.mud")
speech.init(nn.SpeechDevice.DEVICE_MIC, "hw:0,0")
def callback(data: str, len: int):
print(data)
speech.digit(640, callback)
while not app.need_exit():
frames = speech.run(1)
if frames < 1:
print("run out\n")
speech.deinit()
break
Usage
- Import the
app
andnn
modules
from maix import app, nn
- Load the acoustic model
speech = nn.Speech("/root/models/am_3332_192_int8.mud")
- You can also load the
am_7332
acoustic model; larger models provide higher accuracy but consume more resources.
- Choose the corresponding audio device
speech.init(nn.SpeechDevice.DEVICE_MIC, "hw:0,0")
- This uses the onboard microphone and supports both
WAV
andPCM
audio as input devices.
speech.init(nn.SpeechDevice.DEVICE_WAV, "path/audio.wav") # Using WAV audio input
speech.init(nn.SpeechDevice.DEVICE_PCM, "path/audio.pcm") # Using PCM audio input
- Note that
WAV
must be16KHz
sample rate withS16_LE
storage format. You can use thearecord
tool for conversion.
arecord -d 5 -r 16000 -c 1 -f S16_LE audio.wav
- When recognizing
PCM/WAV
, if you want to reset the data source, such as for the next WAV file recognition, you can use thespeech.devive
method, which will automatically clear the cache:
speech.devive(nn.SpeechDevice.DEVICE_WAV, "path/next.wav")
- Set up the decoder
def callback(data: str, len: int):
print(data)
speech.digit(640, callback)
Users can register several decoders (or none), which decode the results from the acoustic model and execute the corresponding user callback. Here, a
digit
decoder is registered to output the Chinese digit recognition results from the last 4 seconds. The returned recognition results are in string format and support0123456789 .(dot) S(ten) B(hundred) Q(thousand) W(thousand)
. For other decoder usages, please refer to the sections on Real-time voice recognition and keyword recognition.When setting the
digit
decoder, you need to specify ablank
value; exceeding this value (in ms) will insert a_
in the output results to indicate idle silence.After registering the decoder, use the
speech.deinit()
method to clear the initialization.
- Recognition
while not app.need_exit():
frames = speech.run(1)
if frames < 1:
print("run out\n")
speech.deinit()
break
- Use the
speech.run
method to run speech recognition. The parameter specifies the number of frames to run each time, returning the actual number of frames processed. Users can choose to run 1 frame each time and then perform other processing, or run continuously in a single thread, stopping it with an external thread.
Recognition Results
If the above program runs successfully, speaking into the onboard microphone will yield continuous Chinese digit recognition results, such as:
_0123456789