Skip to content

Manipulation tasks with natural language

This demo showcases the capabilities of RAI in performing manipulation tasks using natural language commands. The demo utilizes a robot arm (Franka Emika Panda) in a simulated environment, demonstrating how RAI can interpret complex instructions and execute them using advanced vision and manipulation techniques.

Manipulation Demo

Setup

LLM model

The demo uses the complex_model LLM configured in config.toml. The model should be a multimodal, tool-calling model. See Vendors.

ROS 2 Sourced

Make sure ROS 2 is sourced. (e.g. source /opt/ros/humble/setup.bash)

Local Setup

Setting up the demo

  1. Follow the RAI setup instructions in the quick setup guide.
  2. Download additional dependencies:

    uv sync --group perception --group simbench
    vcs import < demos.repos
    rosdep install --from-paths src/examples/rai-manipulation-demo/ros2_ws/src --ignore-src -r -y
    
  3. Download the latest binary release

    ./scripts/download_demo.sh manipulation
    
  4. Build the ROS 2 workspace:

    colcon build --symlink-install
    

Running the demo

Remain in sourced shell

Ensure that every command is run in a sourced shell using source setup_shell.sh Ensure ROS 2 is sourced.

  1. Run the Demo:

    streamlit run examples/manipulation-demo-streamlit.py
    

    Alternatively, you can run the simpler command-line version, which also serves as an example of how to use the RAI API for your own applications:

    1. Run Simulation
    ros2 launch examples/manipulation-demo.launch.py game_launcher:=demo_assets/manipulation/RAIManipulationDemo/RAIManipulationDemo.GameLauncher
    

    By default, perception services register both new model-agnostic service names (/detection, /segmentation) and legacy names (/grounding_dino_classify, /grounded_sam_segment) for backward compatibility. To disable legacy names, add the launch argument:

    ros2 launch examples/manipulation-demo.launch.py game_launcher:=... enable_legacy_service_names:=false
    
    1. Run cmd app
    python examples/manipulation-demo.py
    

    Note: manipulation-demo.py uses the new service names (/detection, /segmentation). Legacy examples like manipulation-demo-v1.py use legacy service names and require enable_legacy_service_names:=true (default).

  2. Interact with the robot arm using natural language commands. For example:

    "Place each apple on top of a cube"
    "Build a tower from cubes"
    "Arrange objects in a line"
    "Put two boxes closer to each other. Move only one box."
    "Move cubes to the left side of the table"
    

Changing camera view

To change camera in the simulation use 1-7 keys on your keyboard once it's window is focused.

Docker Setup

1. Setting up the demo

  1. Set up docker as outlined in the docker setup guide. Build base image with flag DEPENDENCIES=core_only

  2. Build docker manipulation demo image:

    docker build -t rai-manipulation-demo:jazzy --build-arg ROS_DISTRO=jazzy -f docker/Dockerfile.manipulation-demo .
    
  3. Enable X11 access for the docker container:

    xhost +local:root
    
  4. Set the $OPENAI_API_KEY environment variable.

    export OPENAI_API_KEY=YOUR_OPEN_AI_API_KEY
    

    Note

    The default vendor can be changed to a different provider via the RAI configuration tool

  5. Run the docker container with the following command:

    docker run -p 8501:8501 -e DISPLAY=$DISPLAY -e OPENAI_API_KEY=$OPENAI_API_KEY -v /tmp/.X11-unix:/tmp/.X11-unix --gpus all -it rai-manipulation-demo:jazzy # or rai-manipulation-demo:humble
    

    NVIDIA Container Toolkit

    In order to use the --gpus all flag, the NVIDIA Container Toolkit must be installed on the host machine.

  6. To access the demo web interface, open your web browser and navigate to :

    http://localhost:8501
    

How it works

The manipulation demo utilizes several components:

  1. Vision processing using Grounded SAM 2 and Grounding DINO for object detection and segmentation via ROS2 services (/detection, /segmentation).
  2. RAI agent to process the request and plan the manipulation sequence.
  3. Robot arm control for executing the planned movements.

The main logic of the demo is implemented in the create_agent function, which can be found in:

examples/manipulation-demo.py

Service Names and Backward Compatibility

Perception services support both new model-agnostic service names and legacy names for backward compatibility:

  • New service names (default): /detection, /segmentation
  • Legacy service names: /grounding_dino_classify, /grounded_sam_segment

By default, both sets of names are registered. To disable legacy names (for new applications only), launch with:

ros2 launch examples/manipulation-demo.launch.py game_launcher:=... enable_legacy_service_names:=false

Existing applications using legacy service names will continue to work with the default configuration.

Known Limitations

  • Grounding DINO can't distinguish colors.
  • VLMs tend to struggle with spatial understanding (for example left/right concepts).

Building from source

If you are having trouble running the binary, you can build it from source here.