Robot Instruction by Human Demonstration

Sing Bing Kang
doctoral dissertation, tech. report CMU-RI-TR-94-44, Robotics Institute, Carnegie Mellon University, December, 1994

  • Adobe portable document format (pdf) (2MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Conventional methods for programming a robot either are inflexible or demand significant expertise. While the notion of automatic programming by high-level goal specification addresses these issues, the overwhelming complexity of planning manipulator grasps and paths remains a formidable obstacle to practical implementation. This thesis describes the approach of programming a robot by human demonstration. Our system observes a human performing the task, recognizes the human grasp, and maps it onto the manipulator. Using human actions to guide robot execution greatly reduces the planning complexity.

In analyzing the task sequence, the system first divides the observed sensory data into meaningful temporal segments, namely the pregrasp, grasp, and manipulation phases. This is achieved by analyzing the human hand motion profiles. The features used are the fingertip polygon area (the fingertip polygon being the polygon whose vertices are the fingertips), hand speed, and the volume sweep rate, which is the product of the first two. Segmentation is achieved by taking into consideration the hand motion profiles during the pregrasp (or reaching) phase having a characteristic bell shape.

Subsequent to task segmentation, a grasp taxonomy is used to recognize the human hand grasp. The grasp taxonomy is based on the contact web, which is a 3-D graphical structure of contact points between the hand and the grasped object. By considering the higher level concept of the virtual finger, which is the collection of physical fingers acting in a similar manner, we can recognize the type of human grasp used in the task.

The recognized grasp is used to guide the grasp planning of the manipulator. Grasp planning is done at two levels: the functional and physical levels. Initially, at the functional level, grasp mapping is achieved at the virtual finger level. Subsequently, at the physical level, the geometric properties of the object and manipulator are considered in fine-tuning the manipulator grasp. The trajectory of the manipulator approximately follows that of the human hand during the execution of the task. Once all these are accomplished, control signals are then produced for the robot system to replicate the task.

In summary, this thesis describes a novel way of programming robots - by direct human demonstration of the task. This thesis shows that by segmenting the stream of observed data and producing abstract representations of the task, we can enable the robot to replicate human grasping tasks.

Sponsor: ARPA
Grant ID: DAAH04-94-G-0006, F33615-90-C-1465, CDA-9121797
Associated Center(s) / Consortia: Vision and Autonomous Systems Center
Number of pages: 151

Text Reference
Sing Bing Kang, "Robot Instruction by Human Demonstration," doctoral dissertation, tech. report CMU-RI-TR-94-44, Robotics Institute, Carnegie Mellon University, December, 1994

BibTeX Reference
   author = "Sing Bing Kang",
   title = "Robot Instruction by Human Demonstration",
   booktitle = "",
   school = "Robotics Institute, Carnegie Mellon University",
   month = "December",
   year = "1994",
   number= "CMU-RI-TR-94-44",
   address= "Pittsburgh, PA",