Spatially anchored Tactile Awareness for Robust Dexterous Manipulation

Giving Robots "Spatial Awareness" for Sub-Millimeter Dexterous Manipulation

The Problem: Rich Data, Poor Context

Modern robots use vision-based tactile sensors that provide high-resolution images of contact. However, current end-to-end learning frameworks treat these images as abstract features.

The "Isolated Measurement" Issue: Without spatial grounding, a robot might feel a "corner," but it doesn't know exactly where that corner is in 3D space relative to its wrist.
Visual Occlusion: During precise tasks like USB insertion, the robot's own fingers often block the cameras, leaving it "blind" at the most critical moment.

The Solution: Spatially-Anchored Tactile Awareness (SaTA)

The core insight of SaTA is that tactile measurements should be grounded in a stable reference frame, the hand’s kinematic frame (the wrist). This allows the robot to infer object geometry precisely, even without a pre-existing 3D model of the object.

How it Works:

Forward Kinematics: The system calculates the 6D pose of each fingertip sensor in real-time.
Fourier Encoding: This 6D pose is transformed using Fourier features. This captures everything from "coarse" alignment to "fine" sub-millimeter adjustments.
FiLM Modulation: Instead of just "adding" this spatial data to the tactile image, SaTA uses Feature-wise Linear Modulation (FiLM). This ensures the spatial context guides the interpretation of the tactile features.

This means the robot doesn't just know "contact occurred"; it knows "I am touching an edge 2mm to the left of the center at a 5-degree tilt".

Putting it to the Test: USBs, Light Bulbs, and Cards

The researchers validated SaTA on three "boundary-pushing" tasks that demand extreme precision:

Bimanual USB-C Mating: Plugging a cable into a phone in free space. This requires sub-millimeter alignment and coordinated "rubbing" motions between fingers to find the port, all while the port is completely occluded by the hand.
Light Bulb Installation: Screwing a bulb into a socket. A tiny angular error causes the threads to jam.
Card Sliding: Fanning out playing cards at specific angles. This requires delicate force modulation to slide the cards without bending them.

The Results:

SaTA significantly outperformed standard visuo-tactile methods:

Success Rate: Improved by up to 30%.
Efficiency: Reduced task completion time by 27-28% because the robot made fewer "trial-and-error" mistakes.
Precision: In the USB-C task, where most baselines failed completely (0% success), SaTA achieved a 35% success rate.

Read the Original Paper