RAI Open Set Vision¶
This package provides a ROS2 Node which is an interface to the Idea-Research GroundingDINO Model. It allows for open-set detection.
Installation¶
In your workspace you need to have an src folder containing this package rai_open_set_vision and the rai_interfaces package.
Preparing the GroundingDINO¶
Add required ROS dependencies:
rosdep install --from-paths src --ignore-src -r
Build and run¶
In the base directory of the RAI package install dependencies:
poetry install --with openset
Source the ros installation
source /opt/ros/${ROS_DISTRO}/setup.bash
Run the build process:
colcon build --symlink-install
Source the environment
source setup_shell.sh
Run the GroundedSamAgent and GroundingDinoAgent agents.
python run_vision_agents.py
Agents create two ROS 2 Nodes: grounding_dino and grounded_sam using ROS2Connector.
These agents can be triggered by ROS2 services:
grounding_dino_classify:rai_interfaces/srv/RAIGroundingDinogrounded_sam_segment:rai_interfaces/srv/RAIGroundedSam
Tip
If you wish to integrate open-set vision into your ros2 launch file, a premade launch
file can be found in rai/src/rai_bringup/launch/openset.launch.py
Note
The weights will be downloaded to ~/.cache/rai directory.
RAI Tools¶
rai_open_set_vision package contains tools that can be used by RAI LLM agents
enhance their perception capabilities. For more information on RAI Tools see
Tool use and development tutorial.
GetDetectionTool¶
This tool calls the grounding dino service to use the model to see if the message from the provided camera topic contains objects from a comma separated prompt.
Tip
you can try example below with rosbotxl demo binary.
The binary exposes /camera/camera/color/image_raw and /camera/camera/depth/image_raw topics.
Example call
from rai_open_set_vision.tools import GetDetectionTool
from rai.communication.ros2 import ROS2Connector, ROS2Context
with ROS2Context():
connector=ROS2Connector(node_name="test_node")
x = GetDetectionTool(connector=connector)._run(
camera_topic="/camera/camera/color/image_raw",
object_names=["chair", "human", "plushie", "box", "ball"],
)
Example output
I have detected the following items in the picture - chair, human
GetDistanceToObjectsTool¶
This tool calls the grounding dino service to use the model to see if the message from the provided camera topic contains objects from a comma separated prompt. Then it utilises messages from depth camera to create an estimation of distance to a detected object.
Example call
from rai_open_set_vision.tools import GetDetectionTool
from rai.communication.ros2 import ROS2Connector, ROS2Context
with ROS2Context():
connector=ROS2Connector(node_name="test_node")
connector.node.declare_parameter("conversion_ratio", 1.0) # scale parameter for the depth map
x = GetDistanceToObjectsTool(connector=connector)._run(
camera_topic="/camera/camera/color/image_raw",
depth_topic="/camera/camera/depth/image_rect_raw",
object_names=["chair", "human", "plushie", "box", "ball"],
)
Example output
I have detected the following items in the picture human: 3.77m away
Simple ROS2 Client Node Example¶
An example client is provided with the package as rai_open_set_vision/talker.py
You can see it working by running:
python run_vision_agents.py
cd rai # rai repo BASE directory
ros2 run rai_open_set_vision talker --ros-args -p image_path:=src/rai_extensions/rai_open_set_vision/images/sample.jpg
If everything was set up properly you should see a couple of detections with classes dinosaur, dragon, and lizard.