Manipulation tasks with natural language¶
This demo showcases the capabilities of RAI in performing manipulation tasks using natural language commands. The demo utilizes a robot arm (Franka Emika Panda) in a simulated environment, demonstrating how RAI can interpret complex instructions and execute them using advanced vision and manipulation techniques.
Setup¶
LLM model
The demo uses the complex_model
LLM configured in config.toml
. The model should be a
multimodal, tool-calling model. See Vendors.
ROS 2 Sourced
Make sure ROS 2 is sourced. (e.g. source /opt/ros/humble/setup.bash
)
- Follow the RAI setup instructions in the quick setup guide.
-
Download additional dependencies:
poetry install --with openset vcs import < demos.repos rosdep install --from-paths src/examples/rai-manipulation-demo/ros2_ws/src --ignore-src -r -y
-
Download the latest binary release
./scripts/download_demo.sh manipulation
-
Build the ROS 2 workspace:
colcon build --symlink-install
Running the Demo¶
Remain in sourced shell
Ensure that every command is run in a sourced shell using source setup_shell.sh
Ensure ROS 2 is sourced.
-
Start the demo
ros2 launch examples/manipulation-demo.launch.py game_launcher:=demo_assets/manipulation/RAIManipulationDemo/RAIManipulationDemo.GameLauncher
-
In the second terminal, run the streamlit interface:
streamlit run examples/manipulation-demo-streamlit.py
Alternatively, you can run the simpler command-line version, which also serves as an example of how to use the RAI API for you own applications:
python examples/manipulation-demo.py
-
Interact with the robot arm using natural language commands. For example:
Enter a prompt: Pick up the red cube and drop it on another cube
Changing camera view
To change camera in the simulation use 1-7 keys on your keyboard once it's window is focused.
How it works¶
The manipulation demo utilizes several components:
- Vision processing using Grounded SAM 2 and Grounding DINO for object detection and segmentation.
- RAI agent to process the request and plan the manipulation sequence.
- Robot arm control for executing the planned movements.
The main logic of the demo is implemented in the create_agent
function, which can be found in:
examples/manipulation-demo.py
Known Limitations¶
Grounding DINO
can't distinguish colors.- VLMs tend to struggle with spatial understanding (for example left/right concepts).
Building from source
If you are having trouble running the binary, you can build it from source here.