A delicate touch: teaching robots to handle the unknown
William Xie, a first-year PhD student in computer science, is teaching a robot to reason how gently it should grasp previously unknown听objects by using large language models (LLMs).听
, Xie's project, is an intriguing step beyond the custom, piecemeal solutions currently used to avoid pinching or crushing novel objects.听
In addition, Deligrasp helps the robot translate what it can 'touch' into meaningful information for people.听
"William has gotten some neat results by leveraging common sense information from large language models. For example, the robot can estimate and explain the ripeness of various fruits after touching them." Said his advisor, Professor Nikolaus Correll.听
Let's learn more about DeliGrasp, Xie's journey to robotics, and his plans for the conference Japan and beyond.听
[video:https://www.youtube.com/watch?v=OMzTgY1gxLw]
How would you describe this research?听
As humans, we鈥檙e able to quickly intuit how exactly we need to pick up a variety of objects, including delicate produce or unwieldy, heavy objects. We鈥檙e informed by the visual appearance of an object, what prior knowledge we may have about it, and most importantly, how it feels to the touch when we initially grasp it.听
Robots don鈥檛 have this all-encompassing intuition though, and they don鈥檛 have end-effectors (grippers/hands) as effective as human hands. So solutions are piecemeal: the community has researched 鈥渉ands鈥 across the spectrum of mechanical construction, sensing capabilities (tactile, force, vibration, velocity), material (soft, rigid, hybrid, woven, etc鈥). And then the corresponding machine learning models and/or control methods to enable 鈥渁ppropriately forceful鈥 gripping are bespoke for each of these architectures.
Embedded in LLMs, which are trained on an internet鈥檚 worth of data, is common sense physical-reasoning that crudely approximates a human鈥檚 (as the saying goes: 鈥渁ll models are wrong, some are useful鈥). We use the LLM-estimated mass and friction to simplify the grasp controller and deploy it on a two-finger gripper, a prevalent and relatively simple architecture. Key to the controller working is the force feedback sensed by the gripper as it grasps an object, and knowing at what force threshold to stop鈥攖he LLM-estimated values directly determine this threshold for any arbitrary object, and our initial results are quite promising.
How did you get inspired to pursue this research?
I wouldn鈥檛 say that I was inspired to pursue this specific project. I think, like a lot of robotics research, I had been working away at a big problem for a while, and stumbled into a solution for a much smaller problem. My goal since I arrived here has been to research techniques for assistive robots and devices that restore agency for the elderly and/or mobility-impaired in their everyday lives. I鈥檓 particularly interested in shopping (but eventually generalist) robots鈥攐ne problem we found is that it is really hard to determine, let alone pick ripe fruits and produce with a typical robot gripper and just a camera. In early February, I took a day to try out picking up variably sized objects via hand-tuning our MAGPIE gripper鈥檚 force sensing (an affordable, open-source gripper developed by the Correll Lab). It worked well; I let ChatGPT calibrate the gripper which worked even better, and it evolved very quickly into DeliGrasp.
What would you say is one of your most interesting findings so far?
LLMs do a reasonable job of estimating an arbitrary object鈥檚 mass (friction, not as well) from just a text description. This isn鈥檛 in the paper, but when paired with a picture, they can extend this reasoning for oddballs鈥攇igantic paper airplanes, or miniature (plastic) fruits and vegetables.
With our grasping method, we can sense the contact forces on the gripper as it closes around an object鈥攖his is a really good measure of ripeness, it turns out. We can then further employ LLMs to reason about these contact forces to pick out ripe fruit and vegetables!
What does the day-to-day of this research look like?
Leading up to submission, I was running experiments on the robot and picking up different objects with different strategies pretty much every day. A little repetitive, but also exciting. Prior to that, and now that I鈥檓 trying to improve the project for the next conference, I spend most of my time reading papers, thinking/coming up with ideas, and setting up small, one-off experiments to try out those ideas.
How did you come to study at 蜜糖直播 Boulder?听
For a few years, I鈥檝e known that I really wanted to build robots that could directly, immediately help my loved ones and community. I had a very positive first research experience in my last year of undergrad and learned what it felt like to have true personal agency in pursuing work that I cared about. At the same time I knew I鈥檇 be relocating to Boulder after graduation. I was very fortunate that Nikolaus accepted me and let me keep pursuing this goal of mine.
It鈥檇 be unfathomable if I could keep doing this research in academia or industry, though of course that would be ideal. But I鈥檓 biased toward academia, particularly teaching. I鈥檝e been teaching high school robotics for 5 years now, and now teaching/mentoring undergrads at 蜜糖直播鈥攅ach day is as fulfilling as the first. I have great mentors across the robotics faculty and senior PhD students we work in ECES 111, a giant, well-equipped space that 3 robotics labs share, and it鈥檚 great for collaboration and brainstorming.听
What are your hopes for this international conference (and what conference is it?)
The venue is a workshop at the 2024 International Conference on Robotics and Automation (ICRA 2024), happening in Yokohama, Japan from May 13-17. The name of the workshop is a mouthful: Vision-Language Models for Navigation and Manipulation (VLMNM).
A workshop is detached from the main conference, and kind of is its own little bubble (like a big supermarket鈥攖he conference鈥攈osting a pop-up food tasting event鈥攖he workshop). I'm really excited to meet other researchers and pick their brains. As a first-year, I鈥檝e spent the past year reading papers from practically everyone on the workshop panel, and from their students. I鈥檒l probably also spend half my time exploring (eating) around the Tokyo area.
听