Human Robot Interface via Voice¶
RAI provides two ROS enabled agents for Speech to Speech communication.
Automatic Speech Recognition Agent¶
See examples/s2s/asr.py for an example usage.
The agent requires configuration of sounddevice and ros2 connectors as well as a required voice
activity detection (eg. SileroVAD) and transcription model e.g. (LocalWhisper), as well as
optionally additional models to decide if the transcription should start (e.g. OpenWakeWord).
The Agent publishes information on two topics:
/from_human: rai_interfaces/msg/HRIMessages - containing transcriptions of the recorded speech
/voice_commands: std_msgs/msg/String - containing control commands, to inform the consumer if
speech is currently detected ({"data": "pause"}), was detected, and now it stopped
({"data": "play"}), and if speech was transcribed ({"data": "stop"}).
The Agent utilises sounddevice module to access user's microphone, by default the "default" sound
device is used. To get information about available sounddevices use:
python -c "import sounddevice; print(sounddevice.query_devices())"
The device can be identifed by name and passed to the configuration.
TextToSpeechAgent¶
See examples/s2s/tts.py for an example usage.
The agent requires configuration of sounddevice and ros2 connectors as well as a required
TextToSpeech model (e.g. OpenTTS). The Agent listens for information on two topics:
/to_human: rai_interfaces/msg/HRIMessages - containing responses to be played to human. These
responses are then transcribed and put into the playback queue.
/voice_commands: std_msgs/msg/String - containing control commands, to pause current playback
({"data": "pause"}), start/continue playback ({"data": "play"}), or stop the playback and drop
the current playback queue ({"data": "play"}).
The Agent utilises sounddevice module to access user's speaker, by default the "default" sound
device is used. To get a list of names of available sound devices use:
python -c 'import sounddevice as sd; print([x["name"] for x in list(sd.query_devices())])'
The device can be identifed by name and passed to the configuration.
OpenTTS¶
To run OpenTTS (and the example) a docker server containing the model must be running.
To start it run:
docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak
Running example¶
To run the provided example of S2S configuration with a minimal LLM-based agent run in 4 separate terminals:
$ docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak
$ python ./examples/s2s/asr.py
$ python ./examples/s2s/tts.py
$ python ./examples/s2s/conversational.py