Manipulation tasks with natural language¶

This demo showcases the capabilities of RAI in performing manipulation tasks using natural language commands. The demo utilizes a robot arm (Franka Emika Panda) in a simulated environment, demonstrating how RAI can interpret complex instructions and execute them using advanced vision and manipulation techniques.

Manipulation Demo

Setup¶

LLM model

The demo uses the complex_model LLM configured in config.toml. The model should be a multimodal, tool-calling model. See Vendors.

ROS 2 Sourced

Make sure ROS 2 is sourced. (e.g. source /opt/ros/humble/setup.bash)

Follow the RAI setup instructions in the quick setup guide.

Download additional dependencies:

poetry install --with openset
vcs import < demos.repos
rosdep install --from-paths src/examples/rai-manipulation-demo/ros2_ws/src --ignore-src -r -y

Download the latest binary release

./scripts/download_demo.sh manipulation

Build the ROS 2 workspace:
```
colcon build --symlink-install
```

Running the Demo¶

Remain in sourced shell

Ensure that every command is run in a sourced shell using source setup_shell.sh Ensure ROS 2 is sourced.

Start the demo

ros2 launch examples/manipulation-demo.launch.py game_launcher:=demo_assets/manipulation/RAIManipulationDemo/RAIManipulationDemo.GameLauncher

In the second terminal, run the streamlit interface:
```
streamlit run examples/manipulation-demo-streamlit.py
```
Alternatively, you can run the simpler command-line version, which also serves as an example of how to use the RAI API for you own applications:
```
python examples/manipulation-demo.py
```
Interact with the robot arm using natural language commands. For example:
```
Enter a prompt: Pick up the red cube and drop it on another cube
```

Changing camera view

To change camera in the simulation use 1-7 keys on your keyboard once it's window is focused.

How it works¶

The manipulation demo utilizes several components:

Vision processing using Grounded SAM 2 and Grounding DINO for object detection and segmentation.
RAI agent to process the request and plan the manipulation sequence.
Robot arm control for executing the planned movements.

The main logic of the demo is implemented in the create_agent function, which can be found in:

examples/manipulation-demo.py

Known Limitations¶

Grounding DINO can't distinguish colors.
VLMs tend to struggle with spatial understanding (for example left/right concepts).

Building from source

If you are having trouble running the binary, you can build it from source here.