DexSkills

DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Task

Abstract

Effective execution of long-horizon tasks with dexterous robotic hands remains a significant challenge in real-world problems. While learning from human demonstrations have shown encouraging results, they require extensive data collection for training. Hence, decomposing long-horizon tasks into reusable primitive skills is a more efficient approach. To achieve so, we developed DexSkills, a novel supervised learning framework that addresses long-horizon dexterous manipulation tasks using primitive skills. DexSkills is trained to recognize and replicate a select set of skills using human demonstration data, which can then segment a demonstrated long-horizon dexterous manipulation task into a sequence of primitive skills to achieve one-shot execution by the robot directly. Significantly, DexSkills operates solely on proprioceptive and tactile data, i.e., haptic data. Our real-world robotic experiments show that DexSkills can accurately segment skills, thereby enabling autonomous robot execution of a diverse range of tasks.

DexSkills: Framework Overview

Overview of the proposed long-horizon task segmentation approach. Individual skills are segmented and classified at each temporal window of the demonstration. The demonstrations are collected via the teleoperation system presented in Related Works: "Feeling Good: Validation of Bilateral Tactile Telemanipulation for a Dexterous Robot".

Example of primitive skills and long-horizon demonstration and autonomous execution.

Demonstration and Autonomous Control Architecture

The leader agent generates motor control commands for the end effector pose and finger joints of the hand. The follower robot executes corresponding actions based on these commands. During teleoperation, the follower robot provides haptic feedback. When operating the robot autonomously, we control the robot using a distinct MLP trained on the proprioceptive and tactile data (i.e. haptic data) of each separate skill.

DexSkills: Learning Framework

The architecture of our Neural Network for supervised representation learning incorporates an auto-regressive autoencoder and a label decoder. This network processes time-series feature data as input, with the encoder transforming these features into a latent space. The temporal decoder reconstructs the features along with their predictions, whereas the label decoder extracts labels from the latent vectors. The label decoder is jointly trained with the autoencoder generating latent features that improve the segmentation performance.

Classification Results

Confusion matrix (%) of the segmentation system on the Long-horizon demonstrations

T-SNE visualization of the classifier latent features. Each point in the graph corresponds to a primitive skill instance, differentiated by various colors to distinguish among the primitive skills.

BibTeX

@article{mao2024dexskills, title={DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks}, author={Mao, Xiaofeng and Giudici, Gabriele and Coppola, Claudio and Althoefer, Kaspar and Farkhatdinov, Ildar and Li, Zhibin and Jamone, Lorenzo}, journal={arXiv preprint arXiv:2405.03476}, year={2024} }

DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Task

Abstract

Video Presentation

DexSkills: Framework Overview

Example of primitive skills and long-horizon demonstration and autonomous execution.

Demonstration and Autonomous Control Architecture

Video of each primitive Skill

List of primitive skills

Long-Horizon tasks: primitive skill recombinations. Tasks are denoted by alphabetical letters ranging from A to T. Objects utilized for the demonstrations are indicated within parentheses: (s) sponge, (t) tomato passata package, and (b) bottle containing liquid.

DexSkills: Learning Framework

Classification Results

Confusion matrix (%) of the segmentation system on the Long-horizon demonstrations

T-SNE visualization of the classifier latent features. Each point in the graph corresponds to a primitive skill instance, differentiated by various colors to distinguish among the primitive skills.

Autonomous Robot Execution

Long-horizon Task A using a soft sponge

Long-horizon Task B using a cardboard package

Robot Autonomous Execution : Long-Horizon Task A

Robot Autonomous Execution : Long-Horizon Task B

BibTeX