‘UniTeam’ wins robot in international AI challenge

Author: Jörg Heeren

Computer scientist Dr. Andrew Melnik from Bielefeld University and his team have developed an AI agent with which they won an international Open Vocabulary Mobile Manipulation (OVMM) challenge. AI agents are technical systems that act autonomously on the basis of artificial intelligence (AI) in order to achieve goals. The group called ‘UniTeam’ competed against 79 other entries from around the world at the Conference on Neural Information Processing Systems (NeurIPS) in New Orleans. The prize for this success in the field of home robotics is a robot itself, which will have an impact on research and teaching in Bielefeld.

‘We are very happy to receive the robot, so we can experiment with it on home-robotics tasks’, says Melnik, who researches and teaches in the Neuroinformatics group of Professor Dr Helge Ritter at CITEC. He is a regular at the NeurIPS that is currently the most important and largest conference on machine learning and AI with thousands of participants. ‘NeurIPS has the largest h5-index in the field of AI, which measures the impact and significance of scientific publications in academia’, explains Melnik. In the previous year, a team led by him came second in a Minecraft challenge for autonomous AI agents.

Robot in a foyer with two people touching  it and looking at it
The Hello Robot Stretch 2 has arrived in Bielefeld. Michael Büttner (left) and Lyon Brown (right) are looking forward to working with it.

This time, the focus was on home robotics. The OVMM challenge focuses on activities that are easy for humans but still very difficult for robots. These include adapting to unfamiliar environments, comprehending requests in natural language about objects to pick and place, where things are and where they should go, and carrying out these tasks reliably and accurately. Such robots can be used for cleaning, cooking, laundry, and to save energy by mechanically controlling heating systems.

‘An amazing feat’

Not only was the agent required to be demonstrated in a simulation, but also on a standardized and up-to-date robot platform – in this case a model of the award-robot ‘Hello Robot Stretch 2’, which was provided to the competitors. Professor Helge Ritter considers this a demanding barrier to entry: ‘It sets a high threshold for participation, and it is an even more amazing feat to come out on top in an international field of competitors that can meet these high standards.’ The OVMM competition was designed to allow the home robot to pick up any object, and place it in any location, in any home, using only natural language commands. Andrew Melnik says: ‘This task can be quite complicated because the object of interest could be, for example, find a toy car and move it. And this is where a problem for an object detection system can arise: Is it a toy? Or a car? Or a toy car?’

Large Language models at the base of research

Addressing these questions, the NeurIPS challenge perfectly fits Melnik’s work at Bielefeld University, which was funded until the end of 2023 by the KI-Starter programme of the state government of North Rhine-Westphalia. His KI-Starter project ‘Learning to plan with Deep Neural Networks’ centred on developing a goal-directed artificial intelligence agent that is capable of solving physical reasoning tasks such as making predictions or decisions based on its understanding of physical reasoning.

For his research Melnik uses Large Language Models (LLM) such as the well-known ChatGPT. ‘LLMs are good for planning the behaviour of intelligent agents but also for object detection’, he says. LLMs models initially learned from large amounts of written data, including human text and computer code. Now, researchers are advancing these models by integrating them with non-textual inputs like sound and images, enabling multimodal communication.

Melnik’s team is taking this a step further by developing architectures that can process not only speech and image sequences but also robot’s movement. This expands the capabilities of LLMs for robots, allowing them to take advantage of the vast knowledge stored in LLMs. ‘This uncaps the power of LLMs for robots’, Ritter notes. ‘It is giving robots access to the huge amount of everyday knowledge that is captured in LLMs and whose lack so far made it difficult for robots to perform seemingly mundane actions in unprepared environments, such as households.’ Industrial applications and human-AI interaction systems in robotics, smart homes and security systems will benefit from this research.

New opportunities with a Hello Robot Stretch

For the NeurIPS challenge, Melnik and students from Bielefeld University, supported by the German Academic Exchange Service (DAAD), worked together with colleagues from the Indian Institute of Information Technology Allahabad (IIIT-A) as the ‘UniTeam’. Melnik: ‘The students were very agile and focused on results. The exchange helps to proceed in this field faster, with more experiments and more results.’

Group photo with three persons
For the competition, Andrew Melnik (centre) and students from Bielefeld teamed up with researchers from the Indian Institute of Information Technology, Gaurav Kumar Yadav (left) and Arjun PS (right).

The prize in the form of the home robot has now arrived in Bielefeld. The Hello Robot Stretch 2 model weighs 23 kilos, making it a lightweight home robot. Its robotic arm can manipulate objects. ‘This model is very popular in the community’, Melnik says. ‘It opens up avenues for us and our research, and also students benefit from working with a real robot.’