JENGA: Object selection and pose estimation for robotic grasping from a stack

Abstract

Vision-based robotic object grasping is typically investigated in the context of isolated objects or unstructured object sets in bin picking scenarios. However, there are several settings, such as construction or warehouse automation, where a robot needs to interact with a structured object formation such as a stack. In this context, we define the problem of selecting suitable objects for grasping along with estimating an accurate 6DoF pose of these objects. To address this problem, we propose a camera-IMU based approach that prioritizes unobstructed objects on the higher layers of stacks and introduce a dataset for benchmarking and evaluation, along with a suitable evaluation metric that combines object selection with pose accuracy. Experimental results show that although our method can perform quite well, this is a challenging problem if a completely error-free solution is needed. Finally, we show results from the deployment of our method for a brick-picking application in a construction scenario.

Code, Dataset and more details coming soon...

BibTeX

@article{jeevanandam2025jenga,
  title={JENGA: Object selection and pose estimation for robotic grasping from a stack},
  author={Jeevanandam, Sai Srinivas and Inuganti, Sandeep and Govil, Shreedhar and Stricker, Didier and Rambach, Jason},
  journal={arXiv preprint arXiv:2506.13425},
  year={2025}
}

Acknowledgements

This work has been partially funded by the EU Horizon Europe Framework Program under grant agreement 101058236 (HumanTech) and by German Ministry for Economics and Climate Action (BMWK) under Grant Agreement 13IK010 (TWIN4TRUCKS). We thank our HumanTech partners for the collaboration: BAUBOT (Platform), SINTEF Manufacturing (Control) and ACCIONA (testing site and material)

JENGA: Object selection and pose estimation for robotic grasping from a stack

Abstract

Video Presentation

Code, Dataset and more details coming soon...

BibTeX

Acknowledgements