R2HandoverSim: A Simulation Framework and Benchmark
for Robot-to-Human Object Handovers

1DANiLab, University of Leicester 2School of Computing and Mathematical Sciences,
University of Leicester
3School of Metallurgy and Materials, University of Birmingham
Accepted by IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2026)

Introduction

What is R2HandoverSim?

R2HandoverSim benchmark environment overview
Fig. 1: Overview of the R2HandoverSim benchmark environment. A UR5e manipulator delivers an object to a static MANO hand. The green sphere denotes the valid handover space used for reachability evaluation.

Benchmark

R2HandoverSim enables standardized, reproducible evaluation of robot-to-human handover methods.

Trial Protocol

Standardized evaluation protocol

Each method outputs a grasp pose and a handover pose; R2HandoverSim then runs grasping, planning, approach motion, and evaluation.

Trial protocol of R2HandoverSim
Fig. 2: Trial protocol of R2HandoverSim.
Objects and Data

16 daily objects

The benchmark uses 16 daily objects spanning ShapeNet, YCB, and ContactDB, from compact cups to bulky dispensers and functionally constrained tools.

Sixteen daily benchmark objects
Fig. 3: The 16 daily benchmark objects.

Evaluation Protocol

Each trial is evaluated with five binary metrics.

Plan

A kinematically feasible, collision-free trajectory to the handover pose must exist.

Reach

The object must be delivered near the receiver's hand.

Stability

The gripper must physically close on the object without dropping it.

Affordance

The robot's grasp must not occupy the receiver's intended grasp region.

Safe

The delivery motion must complete without contacting the human hand.

Failure Modes

Granular metrics reveal more than success rate alone

A trial is attributed to the first evaluated criterion that fails; consequently, the reported failure rates are mutually exclusive and sum to 1 - SR within each split.

Representative success and failure cases in simulation
Fig. 5: Representative success and failure cases in simulation.

Baseline Comparison

Four baselines are compared on predicted shared grasp poses under S0, S1, and average performance.

Evaluation Splits

S0 and S1

S0

Objects with relatively unconstrained handover configurations.

S1

Objects with stronger functional constraints on grasp and handover orientation.

Simulation Results

Intent-Handover achieves the highest average success rate

Contact-Handover leads in S0 success rate; Intent-Handover leads in average success rate and S1 success rate.

Baseline comparison on R2HandoverSim under S0 and S1
Table I: Baseline comparison on R2HandoverSim under S0 and S1.
Qualitative Comparison

Predicted handover configurations across four baselines

Each row shows grasp pose, approach trajectory, and final handover pose for the same object-hand pair.

Qualitative comparison of predicted handover configurations across four baselines
Fig. 4: Qualitative comparison of predicted handover configurations across four baselines.

Sim-to-Real Validation

Four baselines are deployed on the physical platform with 30 participants and 600 total trials.

Real Hardware

UR5e with Robotiq 2F-85 gripper

Intent-Handover achieves the highest real-world success rate (73.3%), followed by FC-Handover (63.3%).

Real hardware setup for the sim-to-real experiment
Fig. 6: Real hardware setup for the sim-to-real experiment.
Transfer Results

Real-world result minus simulation average

Contact-Handover is the only method with negative transfer.

Sim-to-real transfer results
Table II: Sim-to-real transfer results.
User Study

Subjective ratings reveal human-centric handover quality

Intent-Handover ranks first in simulation success rate and user ratings; Contact-Handover ranks second in simulation success rate but lowest in comfort and safety.

Research Question

To what extent does simulation success rate predict perceived handover quality beyond task completion?

Hypothesis

Higher simulation success rate should correspond to higher comfort, perceived safety, and naturalness.

Finding

Partially refuted: a higher success rate does not necessarily mean better user experience; functional-region affordance is critical to perceived handover quality.

Subjective ratings for handover comfort, safety, and naturalness
Fig. 7: Subjective ratings for handover comfort, safety, and naturalness.

BibTeX

@inproceedings{zhang2026r2handoversim,
  title={R2HandoverSim: A Simulation Framework and Benchmark for Robot-to-Human Object Handovers},
  author={Zhang, Hanxin and Dhafer, Abdulqader and Dong, Hongbiao and Hao, Zhou Daniel},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
  year={2026}
}