exUMI: Extensible Robot Teaching System with Action-aware Task-agnostic Tactile Representation

CoRL 2025

Yue Xu1, Litao Wei1, Pengyu An1, Qingyu Zhang1, Yong-Lu Li1,2*
1Shanghai Jiao Tong University, 2Shanghai Innovation Institute

Download CAD Models

Configure your hardware setup to get the correct CAD files.

Please make a selection in all four dropdowns.
exUMI System Overview

We present a co-designed hardware and algorithm system for tactile-aware robot learning. The exUMI hardware enhances data collection robustness and extensibility, while the TPP algorithm captures tactile dynamics through predictive learning.

Abstract

Tactile-aware robot learning faces critical challenges in data collection and representation due to data scarcity and sparsity, and the absence of force feedback in existing systems. To address these limitations, we introduce a tactile robot learning system with both hardware and algorithm innovations. We present exUMI, an extensible data collection device that enhances the vanilla UMI with robust proprioception (via AR MoCap and rotary encoder), modular visuo-tactile sensing, and automated calibration, achieving 100% data usability. Building on an efficient collection of over 1 M tactile frames, we propose Tactile Prediction Pretraining (TPP), a representation learning framework through action-aware temporal tactile prediction, capturing contact dynamics and mitigating tactile sparsity. Real-world experiments show that TPP outperforms traditional tactile imitation learning. Our work bridges the gap between human tactile intuition and robot learning through co-designed hardware and algorithms, offering open-source resources to advance contact-rich manipulation research.

Hardware System: exUMI

exUMI Hardware System

exUMI extends the UMI framework by disentangling proprioception using an AR motion capture system (Meta Quest 3) and a magnetic rotary encoder for precise gripper width measurement. A central controller with automatic latency calibration enables seamless integration of additional sensors, such as visuo-tactile sensors, while maintaining maximum mobility.

Algorithm: Tactile Predictive Pretraining (TPP)

TPP Algorithm Pipeline

Our proposed TPP framework formulates tactile representation learning as a conditional future prediction task. By learning to predict future tactile frames conditioned on the action sequence and the current camera image, the model captures rich physical contact dynamics. The pretrained tactile encoder can be seamlessly integrated into downstream imitation learning policies.

BibTeX

@inproceedings{xu2025exumi,
  title     = {exUMI: Extensible Robot Teaching System with Action-aware Task-agnostic Tactile Representation},
  author    = {Xu, Yue and Wei, Litao and An, Pengyu and Zhang, Qingyu and Li, Yong-Lu},
  booktitle = {Conference on Robot Learning (CoRL)},
  year      = {2025}
}