2. Dynamic Plate Detection and Localisation

Overview

This example demonstrates how to dynamically detect, localise, and retrieve an SBS plate in a changing laboratory environment using ArUco markers and a calibrated eye-in-hand camera setup.

The key idea is to:

Perform an initial scan of the workspace from multiple robot poses.
Detect ArUco markers attached to lab objects (e.g. a Mantis device and an SBS plate).
Estimate each marker pose in the robot base frame.
Refine these poses through active visual alignment.
Compute spatial relationships between objects (e.g. plate relative to Mantis).
Re-scan after the environment changes.
Reconstruct the missing plate pose from known spatial relationships.
Execute a pick operation using the reconstructed pose.

This workflow enables robust re-localisation of objects even when only a subset of markers is visible during a later scan.

The example uses:

A calibrated RGB camera (Intel RealSense).
A UR robot controlled via ur_commander.
The mapit package for camera calibration, ArUco detection, pose estimation, and frame transformations.

A video demonstration is available on YouTube.

Physical Setup

Before running this example, ensure that the following hardware and configuration steps are completed.

1. Robot and Camera

The system uses a UR robot equipped with a TCP-mounted RGB camera in an eye-in-hand configuration.

The following prerequisites must be satisfied:

A valid hand–eye calibration stored in calibrated_camera.yml.
A known rigid transformation between the camera and the TCP, defined by R_cam2tcp and T_cam2tcp.

These parameters allow detected marker poses to be expressed consistently in the robot base frame.

2. Marked Objects

All relevant objects in the workspace must carry an ArUco marker.

In this example, the following marker IDs are used:

mantis_id = 5
plate_id = 8

Markers must be generated from the same dictionary used by the detector.
Here, the cv.aruco.DICT_4X4_50 dictionary is used.

3. Scanning Poses

A set of predefined joint configurations (scanning_poses) must be provided.

These poses should:

Cover the relevant workspace area.
Ensure marker visibility from multiple viewpoints.
Minimise occlusions.
Maintain safe robot motion.

Careful selection of scanning poses improves robustness and pose estimation accuracy.

Initialisation

The system initialises the perception and robot-control pipeline in several steps.

# imports
import logging

import cv2 as cv
import numpy as np
import pyrealsense2 as rs
from ur_commander import CustomURRobot, JointPositions, TaskPose

from mapit import CalibratedCamera, Detector, Marker
from mapit.utils import (
    apply_transformation,
    build_pose,
    extract_pose_and_orientation,
    make_homogeneous,
    rot_tran_from_tool_pose,
)

1. RealSense Camera

A color stream from the RealSense camera is started and warmed up with the following configuration:

Resolution: 1920 × 1080
Format: BGR8
Frame rate: 30 FPS

A frame grabber function is then registered with CalibratedCamera, enabling the detector to retrieve live images directly from the camera stream.

# init realsense camera
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.color, 1920, 1080, rs.format.bgr8, 30)
profile = pipeline.start(config)
# Warm up
for _ in range(30):
    pipeline.wait_for_frames()


# define function for grabbing a frame.
# if window is specified, show the grabbed frame
def get_realsense_frame(pipeline, window=None):
    rawframes = pipeline.wait_for_frames()
    raw_color_frame = rawframes.get_color_frame()
    img = np.asanyarray(raw_color_frame.get_data())
    if window:
        cv.imshow(window, img)
        cv.waitKey(1)  # non-zero wait to allow image render
    return img


CV_NAMED_WINDOW = "feed"
cv.namedWindow(CV_NAMED_WINDOW)

A namedWindow is specified to be able to follow the video feed.

2. Loading a `CalibratedCamera` config

3. Robot Interface

A CustomURRobot instance is created to communicate with the UR robot controller.

During initialisation:

URScript programs for opening and closing the gripper are loaded.
The gripper is opened to ensure a safe starting configuration before scanning begins.

4. ArUco Detector

A Detector instance is created using:

The DICT_4X4_50 ArUco dictionary.
The CalibratedCamera instance loaded above.
Built-in pose estimation utilities.

This detector is responsible for identifying markers in each frame and estimating their 6D pose relative to the camera.

Performing a Scan

The scan() routine performs the initial localisation of all visible markers in the workspace.
This step combines robot motion, ArUco detection, pose estimation, and frame transformations to express each marker in the robot base frame.

Rather than showing the entire implementation, the key mapit interactions are highlighted below.

1. Moving to Predefined Scanning Poses

The robot iterates through a set of predefined joint configurations to observe the workspace from multiple viewpoints:

robot.movej(JointPositions(joint_pose))

Well-chosen scanning poses improve visibility, reduce occlusions, and increase pose accuracy.

2. Detecting ArUco Markers

At each pose, the detector processes the current camera frame:

detected_markers = d.detect_markers()

The Detector instance internally:

Retrieves the latest frame from CalibratedCamera
Detects ArUco markers
Returns image-space corner observations grouped by marker ID

If no markers are detected, the scan continues to the next robot pose.

3. Estimating Marker Pose in the Camera Frame

For each detected marker, a 6D pose is estimated relative to the camera:

H_marker_in_cam, ok = d.estimate_marker_pose(img_points)

If successful, this returns a homogeneous transformation matrix. This is one of the core capabilities provided by mapit: converting raw image detections into metric 3D poses using camera calibration.

4. Transforming to the Robot Base Frame

The marker pose must be expressed in the robot base coordinate system.
This is done in two steps using mapit.utils:

marker_in_tcp = apply_transformation(H_marker_in_cam, H_cam2tcp)
marker_in_base = apply_transformation(marker_in_tcp, H_tcp2base)

Where:

H_cam2tcp — fixed hand–eye calibration
H_tcp2base — current robot forward kinematics

The helper function apply_transformation ensures consistent and readable frame transformations.

5. Storing the Detected Objects

Each detected marker is stored together with the robot pose at detection time:

detected_ids[id] = {
    "M": Marker(id=id, H=marker_in_base),
    "robot": tool_pose
}

The Marker object encapsulates:

Marker ID
Full homogeneous transform in the base frame
Convenient access to position and orientation

Scan Result

The scan() function returns a dictionary:

detected_ids = scan()

Each entry contains a marker expressed in the robot base frame.
This representation provides a consistent geometric foundation for:

Pose refinement
Spatial relationship computation
Object re-localisation
Manipulation planning

At this stage, all visible objects have an initial 6D pose estimate in the shared base coordinate system.

Computing Spatial Relationships

After the initial scan, we refine each detected marker pose using the refine() routine.

Computing Plate–Mantis Relationship

Once both Mantis and plate are refined:

H_plate_in_mantis = H_plate_in_base @ np.linalg.inv(H_mantis_in_base)

This encodes the spatial relationship:

“Where is the plate relative to the Mantis?”

This relationship is assumed rigid and constant.

Re-discovery and Re-construction of Spatial Layout

Now we simulate a dynamic workspace:

The user moves the Mantis (and the plate moves with it).
The system no longer sees the plate during the next scan.

Step 1: Perform a New Scan

detected_ids = scan()
refine(detected_ids)

Assume only the Mantis marker is detected.

Step 2: Reconstruct Plate Pose

We compute the new plate pose using the stored relationship:

H_plate_in_base_new = H_mantis_in_base_new @ H_plate_in_mantis

Even without seeing the plate, its location is reconstructed from:

The newly detected Mantis pose
The previously learned rigid transform

This enables implicit plate localisation.

Plate Alignment and Pick

Once the reconstructed plate pose is available:

Move above the plate.
Align TCP orientation with the long side of the SBS plate.
Open gripper.
Descend.
Close gripper.
Lift plate.

The Z-axis alignment is computed by comparing:

TCP X-axis in base frame
Plate X-axis in base frame

A corrective rotation around Z is applied to ensure parallel alignment before grasping.