RAI Open Set Vision¶
This package provides a ROS2 Node which is an interface to the Idea-Research GroundingDINO Model. It allows for open-set detection.
Installation¶
In your workspace you need to have an src
folder containing this package rai_open_set_vision
and the rai_interfaces
package.
Preparing the GroundingDINO¶
Add required ROS dependencies:
rosdep install --from-paths src --ignore-src -r
Build and run¶
In the base directory of the RAI
package install dependencies:
poetry install --with openset
Source the ros installation
source /opt/ros/${ROS_DISTRO}/setup.bash
Run the build process:
colcon build --symlink-install
Source the environment
source setup_shell.sh
Run the GroundedSamAgent
and GroundingDinoAgent
agents.
python run_vision_agents.py
Agents create two ROS 2 Nodes: grounding_dino
and grounded_sam
using ROS2Connector.
These agents can be triggered by ROS2 services:
grounding_dino_classify
:rai_interfaces/srv/RAIGroundingDino
grounded_sam_segment
:rai_interfaces/srv/RAIGroundedSam
Tip
If you wish to integrate open-set vision into your ros2 launch file, a premade launch
file can be found in rai/src/rai_bringup/launch/openset.launch.py
Note
The weights will be downloaded to ~/.cache/rai
directory.
RAI Tools¶
rai_open_set_vision
package contains tools that can be used by RAI LLM agents
enhance their perception capabilities. For more information on RAI Tools see
Tool use and development tutorial.
GetDetectionTool
¶
This tool calls the grounding dino service to use the model to see if the message from the provided camera topic contains objects from a comma separated prompt.
Tip
you can try example below with rosbotxl demo binary.
The binary exposes /camera/camera/color/image_raw
and /camera/camera/depth/image_raw
topics.
Example call
from rai_open_set_vision.tools import GetDetectionTool
from rai.communication.ros2 import ROS2Connector, ROS2Context
with ROS2Context():
connector=ROS2Connector(node_name="test_node")
x = GetDetectionTool(connector=connector)._run(
camera_topic="/camera/camera/color/image_raw",
object_names=["chair", "human", "plushie", "box", "ball"],
)
Example output
I have detected the following items in the picture - chair, human
GetDistanceToObjectsTool
¶
This tool calls the grounding dino service to use the model to see if the message from the provided camera topic contains objects from a comma separated prompt. Then it utilises messages from depth camera to create an estimation of distance to a detected object.
Example call
from rai_open_set_vision.tools import GetDetectionTool
from rai.communication.ros2 import ROS2Connector, ROS2Context
with ROS2Context():
connector=ROS2Connector(node_name="test_node")
connector.node.declare_parameter("conversion_ratio", 1.0) # scale parameter for the depth map
x = GetDistanceToObjectsTool(connector=connector)._run(
camera_topic="/camera/camera/color/image_raw",
depth_topic="/camera/camera/depth/image_rect_raw",
object_names=["chair", "human", "plushie", "box", "ball"],
)
Example output
I have detected the following items in the picture human: 3.77m away
Simple ROS2 Client Node Example¶
An example client is provided with the package as rai_open_set_vision/talker.py
You can see it working by running:
python run_vision_agents.py
cd rai # rai repo BASE directory
ros2 run rai_open_set_vision talker --ros-args -p image_path:=src/rai_extensions/rai_open_set_vision/images/sample.jpg
If everything was set up properly you should see a couple of detections with classes dinosaur
, dragon
, and lizard
.