Object search is the problem of letting a robot find an object of interest. For this, the robot has to explore the environment it is placed into until the object is found. To explore an environment, current robotic methods use geometrical sensing, i.e. stereo cameras, LiDAR sensors or similar, such that they can create a 3D reconstruction of the environment which also has a clear distinction of 'known & occupied', 'known & unoccupied' and 'unknown' regions of space.
The problem of the classic geometric sensing approach is that it has no knowledge of e.g. doors, drawers, or other functional and dynamic elements. These however are easy to detect from images. We therefore want to extend prior object search methods such as https://naoki.io/portfolio/vlfm with an algorithm that can also search through drawers and cabinets. The project will require you to train your own detector network to detect possible locations of an object, and then implement a robot planning algorithm that explores all the detected locations.
experience with python, pytorch, ideally with open3d
please send your CV and transcript to blumh@uni-bonn.de
Currently there are three options to evaluate a robot for pick-and-place tasks in a home setting:
We want to use IKEA furniture, which is widely available worldwide, to make it possible to create comparable and randomized setups that every researcher can easily build up in their lab. The system would e.g. generate randomized challenges for a 4x4 KALLAX where every setup has different door knobs, drawers are moved to different shelves, and for each setup a task is generated such as "move the cup from the upper right shelf into the black drawer". The randomization makes sure that researchers cannot overfit to a specific setup while they can still run the whole experiment in their own lab and quickly get an evaluation how well their system copes with the tasks compared to existing works.
This project will use the ikea database to create a random generator for robotic pick-and-place tasks that involve furniture interaction. If successful and there is enough time left, the second part of the project involves setting up a challenge website and coming up with strategies to verify uploaded videos for a given task, e.g. by measuring response time and allowing peers to verify success or failure of the robot in the video.
Experience in python and ideally in web development.
Please send your CV and transcript to blumh@uni-bonn.de
In the active perception task we investigate how a robot can improve its understanding of the environment it operates in by actively deciding where and how to look at the things that it finds in its surroundings. For example, if the robot has high uncertainty (low information) about an object in the environment, we can apply active perception algorithms to guide it to gather more sensory data in this area. Representative methods using drone platforms include citation [1] for autonomous exploration of cluttered environments and [2] for top-down terrain monitoring using computer vision.
In this project we want to develop an active perception approach to find open-set objects in an environment. “Open-set” refers to any object that is unexpected and does not belong to the things that the robot knows, i.e. the prior data it was trained on. Examples of this are, in an increasing level of difficulty, obstacles on a road (see segmentmeifyoucan.com), litter in a public park, or invasive plant species in a forest. Our goal is to develop a robot that can explore a new environment, identify and inspect any objects of interest that could be open-set instances, and report back its findings. We will validate the approach using a legged platform searching for unknown objects in a cluttered scene, e.g. collecting a database of litter.
Investigate and refine an open-set detector based on literature research and benchmarks such as fishyscapes.com and segmentmeifyoucan.com Develop a baseline approach that combines exploration and open-set detection Refine different information gain metrics that steer the robot towards open-set detections Evaluate the final system on a real-world spot robot
[1] Receding Horizon “Next-Best-View” Planner for 3D Exploration - Receding Horizon "Next-Best-View" Planner for 3D Exploration | IEEE Conference Publication | IEEE Xplore [2] Active Learning of Robot Vision Using Adaptive Path Planning - 2410.10684 [3]
Experience with pytorch, some experience with deep learning for computer vision
Please send your CV and transcript to blumh@uni-bonn.de and M.Popovic@tudelft.nl
Robots have access to large amounts of data that they can collect from their deployment environments. We want to tap into this resource to optimize foundation models such as DINO to work optimally in these deployment environments, and to leverage the scale of long-term deployment to improve them for downstream applications such as object identification and tracking.
Experience with pytorch and python
Please send your CV and transcript to blumh@uni-bonn.de
Multimodal Large Language Models (MLLM) have pushed the applicability of scene understanding in robotics to new limits. They allow to directly link natural language instructions to robotic scene understanding. Sometimes however, MLLMs trained on internet data have troubles to understand more domain-specific language queries, such as "bring me the 9er wrench" or "pick all the plants that are not beta vulgaris". This project builds up on prior work that developed a mechanism to adapt open-vocabulary methods to new words and visual appearances (OpenDAS). Currently, the method is impractical as it requires a lot of densely annotated images from the target domain. We want to develop mechanisms that allow to do such adaptation in a self-supervised way, e.g. by letting the robot look at the same object from multiple viewpoints and enforcing consistency of representation.
Experience with python and pytorch.
Please send your CV and transcript to blumh@uni-bonn.de
To mine large-scale training data, LabelMaker can be used to automatically annotate phone scans. So far, the tool can only output pointclouds. Using the latest advancements in NeRFs and Gaussian Splatting, we develop an efficient pipeline to produce the most accurate possible 3D lifting of segmentation labels into a renderable scene representation.
With this, we build the largest existing indoor semantic segmentation dataset.
experience with a pytorch
please send your CV and transcript to blumh@uni-bonn.de