Intent-Handover: Grounding Language in Human-Usage Regions
for Trustworthy Robot-to-Human Handovers

1DANiLab, University of Leicester 2School of Computing and Mathematical Sciences,
University of Leicester
3School of Metallurgy and Materials, University of Birmingham
Accepted by IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2026)

Introduction

What is Intent-Handover?

Intent-guided robot-to-human object handover system overview
Fig. 1: Intent-Guided Robot-to-Human Object Handover. (a) Intent-Guided Grasping: A Vision–Language Model and diffusion model infer user intent and human grasp pose from speech and RGB inputs. The system then optimizes the robot grasp configuration. (b) Execution: The robot aligns the object with the receiving hand for ergonomic delivery.

Key Highlights

Intent Grounding

Inferring human handover intent from visual and verbal cues.

Grasp Optimization

Selecting robotic grasps based on affordance and safety constraints.

Design Validation

Validating the system design principles through user studies.

Methods

Intent-Handover is organized into two phases: intent-guided grasping and ergonomic execution.

Phase 1

Intent-Guided Grasping

Ground spoken instructions and visual cues into explicit grasp constraints for safe handover planning.

Intent-guided grasping phase
Phase 2

Execution

Execute the selected grasp while aligning the object with the receiving hand for ergonomic delivery.

Execution phase for robot-to-human handover

Experiments

Procedure / User Cases

Real-world Experiment

The front shows the staged handover procedure; the back shows representative user-study samples.

Handover experiment procedure stages
Objects

16 Object Handover Cases

Objects cover diverse shapes, affordances, and human-usage regions for evaluating grasp selection.

Sixteen object handover cases

Results

We studied how Intent-Handover affects trust, safety, and comfort during real handovers.

RQ1

To what extent does the robot's grasp region affect user trust during handover?

Correctly inferring the intended object alone is insufficient; grasping an inappropriate region may still undermine trust.

H1

Users trust the robot more when it delivers the object in a grasp that is easy for them to use.

Finding

These results support H1. Overall, competence is the most sensitive trust dimension; benevolence did not reach significance in either contrast after Bonferroni correction.

RQ2

To what extent does failing to consider the user's hand affect perceived comfort and safety?

Even if users can adjust their receiving pose, neglecting hand-gripper collision avoidance may reduce psychological safety.

H2

Users feel less safe and comfortable when the robot does not optimize for hand-gripper collision avoidance.

Finding

These findings partially support H2: disabling hand-gripper collision avoidance alone significantly reduces perceived safety, but comfort does not decrease until human-usage region awareness is also removed.

User Study

Trust, Safety, and Comfort Ratings

Human-usage region awareness improves perceived competence and reliability, while hand-gripper collision avoidance mainly improves perceived safety.

User-study results for trust, perceived safety, and interaction comfort
Statistics

Pairwise Comparisons

Full Strategy outperforms the ablations on competence, reliability, perceived safety, and combined comfort.

Pairwise comparison results for user-study conditions
Objective Logs

Pipeline Performance

Intent identification succeeds in 88.00% of trials, with an 11.30 s mean total pipeline time.

Objective performance of the full-strategy Intent-Handover pipeline

BibTeX

@inproceedings{zhang2026intenthandover,
  title={Intent-Handover: Grounding Language in Human-Usage Regions for Trustworthy Robot-to-Human Handovers},
  author={Zhang, Hanxin and Dhafer, Abdulqader and Dong, Hongbiao and Hao, Zhou Daniel},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
  year={2026}
}