perm filename P.DOC[1,VDS] blob sn#082277 filedate 1974-01-15 generic text, type T, neo UTF8
                              AND RELATED PROBLEMS

                  Aharon Gill, Richard Paul, Victor Scheinman


This paper  describes work  at the Stanford  Artificial Intelligence  Project on
Manipulation  and  Visual  feedback.   The  work  is  described  extensively  in
[Gill],[Paul]  and  [Scheinman] and  this  paper  is an  attempt  to  provide an
overview of the combined work.  We first describe the design of  the manipulator
shown in Figure 1 and then go on to describe in detail the trajectory generation
for manipulator motions and the software servo system.  Final  sections describe
the corner-finder  used by the  visual feedback system  and the  visual feedback


The  arm is  a manipulator  with  six degrees  of freedom  plus a  hand,  and is
characterized by two rotating "shoulder" joints, a linear motion  "elbow" joint,
and three  rotating "wrist"  joints.  A vise  grip hand  serves as  the terminal
device (see Figure 1).  The manipulator can handle a  4 kg. load within  its 1.0
meter diameter hemisphere working volume.  Average point to point servo  time is
less than two seconds.

The results of systems studies indicated that an all electric motor  powered arm
was best suited to our tasks and goals.  By making the axes of the  wrist joints
intersect  at  one  point,  this  arm  is  a  "solvable  configuration" [Piper].
Furthermore,  to facilitate  computation,  all axes  are made  either  normal or
parallel to adjacent axes.

All  joints are  characterized  by a  motor, torque  increaser  (speed reducer),
potentiometer position feedback element, analog tachometer  velocity transducer,


The authors wish to thank  Professor Jerome Feldman for his invaluable  help and
advice in relation to this work.

This research was supported in part by the Advanced Research Projects  Agency of
the Office of Defense under Contract No. SD-183.

The views and conclusions in this  document are those of the authors  and should
not be  interpreted as  necessarily representing  the official  policies, either
expressed  or implied,  of the  Advanced Research  Projects Agency  or  the U.S.

                                  Figure    1

and  an electromagnetic  brake locking  device.  All  joints  are back-drivable.
This means that  loads applied to  the output member  reflect back to  the motor
shaft.  Brakes are required to hold the gravity and external force loaded joints
of an arm of this type type when current to the servo motors is turned off. This
allows the motors to run at high peak torque levels because of  the intermittent
duty cycle.

The two shoulder joints are powered by low inertia permanent magnet d.c. motors.
An harmonic drive component set with  a reduction ratio of 100/1 gives  a torque
multiplication of  about 90 for  both shoulder joints.   This reduction  unit is
reversible and  has a  torque limit  of over  10 kg-m.   A slip  clutch prevents
damaging overloads from  large accelerations which may  be imposed when  the arm
accidentally bumps into  a solid object.   An electromagnetic brake,  mounted on
the high speed motor shaft,  holds the joint at any given  position, eliminating
the need for continuous application of motor torque.

The  two shoulder  joints  are constructed  of large  diameter  tubular aluminum
shafts which house  the harmonic drive and  slip clutch.  Each tubular  shaft is
mounted on two large diameter,  thin section ball bearings supported in  a solid
external  housing.   At  the shoulder,  heavy  sections  are  liberally employed
because inertia effects of added mass  are relatively small in this area  of the
arm.  Each of these two joints has integral, conductive  plastic potentiometers.
The geometrical  configuration of  joint #1  is such  that the  arm base  can be
bolted to any flat surface, but the design calculations have been  made assuming
a table top mount.

To obtain maximum useful motion, the elbow joint is offset from the intersection
of the axes  of joints #1 and  #2.  This extra link  added to the  arm geometry,
allows 360 degrees of rotation of both shoulder joints.

Joint #3, the linear motion elbow  joint, has a total travel of  75 centimeters,
making the second link length variable from 15 to 90 cm.  The main boom is a 6.3
cm square aluminum tube which rolls  on a set of sixteen phenolic  surfaced ball
bearing rollers.   These rollers  resist tube twisting  and bending  moments and
support  normal loads.   They  allow only  pure  translation of  the  tube.  The
sections of the square tube boom and the supporting assembly have  been designed
to optimize performance with respect to structure deflection, natural frequency,
and load carrying ability.

The use of  rollers provides a  larger bearing surface  area than balls  or ball
slides, etc., while the square, thin-walled tube provides better  roller support
near its  edges than a  comparable section  of round tube.   The drive  for this
joint is provided by  a gear driven rack mounted  along the neutral axis  of the
boom.The rack is directly driven by  a 0.22 kg-m torque motor making it  a back-
drivable assembly with a 9 kg. thrust capability. A strip of  conductive plastic
material is cemented along the centerline of the boom.  This is read by  a wiper
element mounted inside the roller  housing to give a positive indication  of the
boom position. A tachometer and brake are mounted on the motor shaft.

Design of the wrist joints (#4 and #5) is similar to that of the shoulder joints
except that all  components are smaller and  lighter.  Great attention  has been

paid to  obtaining the  required performance  with the  least mass.   Small size
harmonic drive  units are  used with  permanent magnet  torque motors.   A motor
shaft brake  is also  employed to hold  position.  These  joints are  also back-
drivable and have a 0.95 kg-m maximum torque capability.

Joint #6, the hand rotation, employs a small permanent magnet motor with  a spur
gear reduction.  It too  has an integral potentiometer element,  tachometer, and

The terminal device  is a vise  grip hand with  two opposed fingers.   Here, two
interchangeable plate like fingers slide together symmetrically, guided  on four
rails (two for each finger).  They  are driven by a rack and  pinion arrangement
utilizing one center gear and a  rack for each finger.  The maximum  jaw opening
is 10 cm with a holding force  of 9 kgs.  Thick, rubber jaw pads provide  a high
coefficient of friction for positive handling of most objects.  Each  finger pad
is provided with a  switch type touch sensor mounted  in the center of  the grip
area. These touch sensors have a 5 gram force threshold.

The potentiometer, tachometer and touch sensor signals are fed directly into the
computer (Digital Equipment Corp. PDP-6 and PDP-10), through a 64 channel 12 bit
A/D  interface.  The  servo loop  is closed  within the  computer.  The computer
output is a proportional drive command through an 8 bit DAC.  This signal is fed
into a  voltage to pulse  width converter which  drives the  switching transitor
power amplifiers. These pulse width modulated switches are directly connected to
the joint motors.


In order to move the arm we first calculate a trajectory. This is in the form of
a sequence of  polynomials, expressing joint angles  as a function of  time, one
for  each joint.   When the  arm starts  to move,  it is  normally  working with
respect to some surface,  for instance, picking up  an object from a  table. The
initial motion of the hand should be directly away from the surface.  We specify
a position on a normal to the surface out from the initial position, and require
that the hand  pass through this position.   By specifying the time  required to
reach this position, we control the speed at which the object is lifted.

For such an initial move, the differential change of joint angles  is calculated
for a move of 7.5 cm in the direction of the outward pointing normal.  A time to
reach this position based on a low arm force is then calculated. The same set of
requirements exists in the case of  the final position. Here we wish  once again
to approach the surface in the  direction of the normal, this time  passing down
through a letdown point.

This gives us four positions: initial,liftoff,letdown, and final.  If we were to
servo  the arm  from one  position to  the next  we would  not collide  with the
support.  We would, however, like the arm to start and end its motion  with zero
velocity and acceleration. Further, there is no need to stop the arm at  all the
intermediate positions.  We require only that the joints of the arm pass through

the trajectory points corresponding to these intermediate positions at  the same

The time for the  arm to move through  each trajectory segment is  calculated as
follows: For the  initial and final  segments the time is  based on the  rate of
approach  of the  hand to  the surface  and is  some fixed  constant.   The time
necessary  for  each  joint  to  move  through  its  mid-trajectory  segment  is
estimated, based on  a maximum joint velocity  and acceleration. The  maximum of
these times is then used for  all the joints to move through the  mid trajectory

We could  determine a  polynomial for each  joint which  passes through  all the
points and has  zero initial and final  velocity and acceleration. As  there are
four  points and  four velocity  and acceleration  constraints we  would  need a
seventh  order  polynomial.    Although  such  polynomials  would   satisfy  our
conditions, they often have extrema  between the initial and final  points which
must be evaluated to check that the joint has not exceeded its working range.

As the extrema are  difficult to evaluate for  high order polynomials, we  use a
different approach.  We specify three  polynomials for each  joint, one  for the
trajectory  from the  initial point  to  the liftoff  point, a  second  from the
liftoff to the setdown point, and  a third from the setdown to the  final point.
We specify  that velocity  and acceleration should  be zero  at the  initial and
final  points  and continuous  at  the intermediate  points.   This  sequence of
polynomials satisfies our conditions for a trajectory and has extrema  which are
easily evaluated.

If a joint exceeds its working range at an extremum, then the trajectory segment
in which it occurs is split in two, a new intermediate point equal to  the joint
range limit is specified at the break, and the trajectory recalculated.

A  collision  avoider will  modify  the arm  trajectory  in the  same  manner by
specifying  additional  intermediate  points.  If  a  potential  collision  were
detected, some additional  points would be specified  for one or more  joints to
pass through in order to avoid the collision.


The  servo program  which moves  the arm  is a  conventional sampled  data servo
executed  by the  computer with  the following  modifications.   Certain control
constants,  the  loop gain,  predicted  gravity and  external  torques  are pre-
calculated and varied with arm configuration.

We treat the system as continuous, and ignore the effects of  sampling, assuming
that the sampling period is much  less than the response time of the  arm.  Time
is normalized to the sampling period,  which has the effect of scaling  the link
inertia up by  the square of the  sampling frequency.  The Laplace  transform is
used throughout.

The  set  point  for  each  joint of  the  arm  is  obtained  by  evaluating the
appropriate trajectory  segment polynomial for  the required time.  The velocity
and  acceleration are  evaluated  as the  first  and second  derivatives  of the

The  position error  is the  observed  position O  less the  required  value O
                                                -                            -s.
Likewise the velocity error is the observed velocity less the required velocity.
Position feedback is applied to decrease position error and velocity feedback is
used to provide damping.

A simple feedback loop is shown  in Figure 2.  The arm is represented  by 1/s J,
where J is the effective link inertia, a function of arm configuration.  T(s) is
an  external disturbing  torque.   The set  point  R(s) is  subtracted  from the
current  position to  obtain the  position error  E(s) and  is multiplied  by s,
representing  differentiation,  to obtain  the  error velocity.   There  are two
feedback gains ke and kv, position and velocity respectively.

                                  Figure    2
                               Simple Servo Loop

By writing the loop equation we can obtain the system response:

                   2     2                         2
          E(s)= (-s J)/(s J + skv + ke)*R(s) + 1/(s J + skv + ke)*T(s)
                                                                         [Eq. 1]

and the condition for critical damping is:

                                kv = 2(J*ke)                             [Eq. 2]

It can be seen that the system response is dependent on J as would  be expected.
Because the effective link inertia J  can vary by 10:1 as the  arm configuration
changes, we are unable to maintain a given response (see Equation 2) independent
of arm configuration. If, however, we add a gain of -J as shown in Figure 3 then
we obtain:

                                  Figure    3
                             Compensated Servo Loop

                   2    2                        2
          E(s)= (-s )/(s  + skv + ke)*R(s) + 1/(s  + skv + ke)*T(s)/J
                                                                         [Eq. 3]

and the condition for critical damping is:

                                 kv = 2*(ke)                             [Eq. 4]

It can be seen that the servo response is now independent of arm configuration.

The principal disturbing torque is that due to gravity, causing a large position
error, especially in the case of joint 2. If we were able to add a term equal to
the negative of the gravity loading  Tg (see Figure 3) then we would  obtain the
same  system response  as  in Equation  3 except  that  T would  become  Te, the
external  disturbing torque,  less the  gravity dependent  torque,  reducing the
position error.

We can  compensate for the  effect of  acceleration of the  set point  R(s), the
first term  in Equation  3, if we  add a  term s R(s) (see  Figure 3)  to obtain
finally a system response:

                        E(s)=  1/(s  + skv + ke)*T(s)/J                  [Eq. 5]

The gain of -J and the torque Tg are obtained by evaluating the  coefficients of
the equations of motion [Paul] at intervals along the trajectory.

The servo has  uniform system response under  varying arm configurations  and is
compensated for gravity loading and for the acceleration of the set point r.

Although  these gains  give an  acceptable response  from the  point of  view of
stiffness, the gain is too low to maintain the high positional tolerance of ␈1.2
mm, which we are just able to  measure using the 12 bit A/D converter.  In order
to achieve this error tolerance,  the position error is integrated when  the arm
has reached the end  of its trajectory.  When the  position error of a  joint is
within tolerance the brake for that joint is applied and the joint is  no longer
servoed. When all the joints  are within the error tolerance the  trajectory has
been executed.

The output of the servo equation is  a torque to be applied at the  joint.  Each
joint motor is driven by  a pulse-width modulated voltage signal. The  output of
the computer  is this  pulse-width and  the polarity.  The drive  module relates
torque to drive voltage pulse-width.

The motors are driven by a 360 Hertz pulse-width modulated voltage  source.  The
program output  "h" is  the relative  "on" time of  this signal.  If we  plot an
experimental curve of  "h" vs. joint torque  we obtain two  discontinuous curves
depending on the joint velocity (see Figure 4).

This curve can  be explained in terms  of two friction effects:  load dependent,
causing the two curves to  diverge, and load independent, causing  separation at
the two curves at the  origin.  The electrical motor time constant  also affects
the shape  of the curve  near the origin.  Experimentally determined  curves are
supplied to the servo program in piecewise linear form

One other factor considered is the back  emf of the motor.  The value of  "h" is
the ratio of required voltage  to supply voltage.  The supply voltage  is simply
augmented by the computed back emf before "h" is calculated.


                                  Figure    4
                         Pulse Width vs. Output Torque


Two programs exist, one for planning "arm programs" and the other  for executing
the resulting trajectory  files.  This section  lists the arm  primitives, which
have meaning at two times: once  at planning, when the trajectory file  is being
created and feasibility must be checked, trajectories calculated etc.,  and once
at  execution  time  when the  primitives  are  executed in  the  same  way that
instructions are executed in a computer.

        OPEN (DIST) Plan to open or close the hand such that the gap between the
                finger tips is DIST.

        CLOSE (MINIMUM) Plan to close  the hand until it stops closing  and then
                check  that the  gap  between the  finger tips  is  greater than
                MINIMUM. If it is less, then give error 2.

        CENTER  (MINIMUM) This  is the  same as  CLOSE except  that the  hand is
                closed with  the touch  sensors enabled.  When the  first finger
                touches, the hand is  moved along with the fingers,  keeping the
                touching finger in contact.  When the other finger touches, both
                fingers are driven together as in CLOSE.

        CHANGE  (DX_DY_DZ,  VELOCITY) Plan  to  move the  arm  differentially to
                achieve  a  change of  hand  position of  vector  DX_DY_DZ  at a
                maximum speed of VELOCITY.
        PLACE Plan to  move the hand vertically  down until the hand  meets some
                resistance, that  is, the  minimum resistance  that the  arm can
                reliably detect.

        MOVE ( T  ) At planning  time check that  the position specified  by the
                hand transformation T  is clear. Plan to  move the hand  along a
                trajectory from its present position to | T |. The hand is moved
                up through a point LIFTOFF given by LIFTOFF = INITIAL_POSITION +
                                   _______          _______   ________________
                DEPART.  where DEPART is a global vector initialized to z  = 7.5
                ______         ______
                centimeters.   Similarly  on  arrival  the  hand  is  moved down
                through a point SET_DOWN  given by: SET_DOWN =  FINAL_POSITION +
                                ________            ________    ______________
                ARRIVE.  ARRIVE is also set to z =7.5 centimeters.
                ______   ______

        PARK Plan a move as in MOVE but to the "park" position.

        SEARCH(NORMAL,  STEP) Set  up  for a  rectangular box  search  normal to
                NORMAL of  step size STEP.  The search is  activated by  the AOJ

There are also control primitives which specify how the other primitives  are to
be carried out.

        STOP (FORCE, MOMENT)  During the next arm  motion stop the arm  when the
                feedback force is greater  than the equivalent joint  force.  If
                the arm  fails to  stop for this  reason before  the end  of the
                motion, generate error 23.

        SKIPE (ERROR) If error ERROR occurred during the previous primitive then
                skip the next primitive.

        SKIPN  (ERROR) if  error ERROR  occurred during  the  previous primitive
                execute the next primitive otherwise skip the next primitive.

        JUMP (LAB) Jump to the primitive whose label in LAB.

        AOJ (LAB) Restore the cumulative search increment and jump to LAB.

        WAIT Stop execution, update the  state variables and wait for  a proceed

        TOUCH (MASK)  Enable the touch  sensors specified by  mask for  the next

        SAVE  Save the  differential deviation  from the  trajectory  set point.
                This can be caused by CHANGE type primitives.

        RESTORE Cause the  arm to deviate from  the trajectory set point  at the
                end of the next motion by the deviation last saved.

With the exception of MOVE, which requires a trajectory file, most functions can
be  executed directly  by prefixing  the primitive  name by  "DO."  The planning

program plans the action and sends  it to the arm servo program to  be executed.
This does not  change the state of  the arm servo program  if it is in  a "wait"
state and execution can continue after any number of executed  primitives.  This
method is used by the interactive programs, which will plan a move to  bring the
hand close to the required place and then plan a "wait." When executed, the hand
position  will be  modified during  the wait  phase by  the  interacting program
executing a series of "DO" commands. Execution of the preplanned  trajectory can
then continue by calling "DO_PROCEED."

The  arm  system  has  been  programmed  to  provide  a  set  of  general  block
manipulation routines. With these routines it is necessary only to give the name
of  the  block  and  its desired  position  and  orientation;  the  program then
generates the  requires moves  and hand actions  to perform  the transformation.
These routines were used in conjunction with the vision and strategy  systems to
solve  the "Instant  Insanity"  puzzle [Feldman].  In the  case  of manipulation
tasks, this system has been  employed to screw a nut  onto a bolt and to  turn a
crank.  With the development of a corner operator visual feedback tasks could be


We will now describe the corner-finder and the visual feedback tasks in which it
is used.  The purpose of the  corner-finder is to find lines and  corners (which
are the main features of planar bounded objects) in a small area of the frame of
intensity values  read into the  computer memory from  the vidicon  camera.  The
corner-finder utilizes information about the features to the extent given to it;
it  is not  a general  scene analyzer  (even in  the context  of  planar bounded
objects),and although it can be used as part of one, it will be  uneconomical to
do so.  The corner-finder operates by analyzing part of the area (a window) at a
time and moving the analyzed window in a controlled search pattern when needed.

Two main types of scene  analyzers using simple intensity information  have been
developed over the years:

        (a)  The "gradient  follower" type  looks for  boundaries of  regions by
        analyzing intensity gradients at image points.

        (b) The "region grower" type aggregates points based on  some similarity
        criterion to form regions.

The corner-finder uses  ideas from both these  types.  It makes rough  checks on
the existence  of regions  in the analyzed  area.  For  this purpose  each point
within the area is processed simply to form the intensity histogram of the area.
It then  follows boundaries of  regions by using  a dissimilarity  criterion. No
gradient type processing  is used so  that continuity is  not lost at  points of
weak gradient, sharp corner, etc.   The corner-finder is described in  detail in

General scene  analyzers do not  use any prior  information because there  is no

reason for them to assume the  existence of such information. On the  other hand
the  corner-finder described  here  uses prior  information down  to  its lowest
levels.  The design philosophy is to use and check against prior  information at
the earliest possible  moment.  The corner-finder  can find only  simple corners
directly.  Complex corners,  with more than two  edges, can then  be constructed
from simpler corners.  Generally, the vertices and edges of simple corners found
in the image will not completely  coincide even if the simple corners  are parts
of the  same complex corner.   Therefore we  will merge them  to form  a complex
corner if they are "close"  (within some tolerance), and especially if  there is
some  external information  which indicates  the existence  of a  complex corner
rather than that of several separate simple corners.

The following assumptions guided the development of the corner-finder.  They are
not all necessary conditions for  its operation or success.  The  most important
assumption is that some of the properties of the corner (e.g location,  form and
orientation,  relative  inside  to   outside  intensity)  are  known   at  least
approximately.  The properties  of the object to  which this corner  belongs are
known (e.g.  the hand  or a  specific cube),  or because  this corner  was found
before by the same or similar programs.

Not all the properties need be given to the program. The user or a  higher level
program  can  give as  many  of the  properties  as he/she/it  decides  to give.
Actually the properties are  not only "given" to  the program, but the  user can
"demand" a match, within a  given tolerance, of these properties and  the actual
measured properties of the corner found.

Some comments about window size: the  window size which is regularly used  has a
dimension of 18*18 raster  units.  When the 50 mm  focal length lens is  used it
corresponds to a field of  view of approximately 1 degree which  incidentally is
the field of view of the sensitive part of the human eye, the fovea.   The fovea
however has about 5 times more  sensing elements in the same field of  view.  We
should also note the  human ability to resolve  between pairs of lines  that are
closer  than  the distance  between  the sensing  elements.  Carrying  the above
analogy a  little farther  we can  say that moving  the window  in the  frame is
similar to  the movement  of the  eye in the  head, while  moving the  camera is
similar to rotating the head.

This size of the window was chosen in order to fulfill the assumptions  that the
window is  smaller than  the image  of the object  so that  each line  or corner
intersects the  perimeter of the  window, but  big enough so  that we  will have
enough boundary  points to get  a good line  fit. Also we  want the size  of the
window to  be small enough  so that the  assumption of almost  uniform intensity
inside the window is justified.


The purpose of  the visual feedback  tasks is to  increase the precision  of the
manipulations done in the hand-eye system.  The feedback currently does not take
into account the dynamic aspects of the manipulation.

The tasks are carried out in the context of the general tasks that  the hand-eye
system can currently  perform, i.e. the  recognition and manipulation  of simple
planar bounded objects.  The manipulations  that we sought to make  more precise
with the  addition of  visual feedback are  grasping, placing  on the  table and
stacking.  The precision obtained is  better than 2.5 mm.  This value  should be
judged by comparing it with the limitations of the system. The resolution of the
imaging system with the 50 mm lens is 1 mrad. which, at an operating range of 75
centimeters, corresponds to 0.8 mm.  The resolution of the arm  position reading
(lower bit of  the A/D converter reading  the first arm joint  potentiometer) is
also 0.8 mm, but  the noise in the arm  position reading corresponds to  1.2 mm.
When we tried to achieve precision  of 1.2 mm, the feedback loop was  executed a
number of times until the errors randomly happened to be below the threshold.

The  question of  whether the  visual feedback,  or in  response in  general, is
dynamic or not,is sometimes more semantic than real.  What the person asking the
question means in  this case is,  does it seem  to be continuous?   The question
then is really that of cycle time  or sampling period.  A cycle time of  20 msec
will  suffice  to  fool the  human  eye  so that  the  response  will  be called
"dynamic." Since  the computations needed  and the computing  power that  we now
have cause the length  of the cycle time to  be several seconds, no  attempt was
made to speed it up by  programming tricks, use of machine language,  etc.  With
this cycle time  the movement of  the arm actually  stops before we  analyze the
situation again, so that we do not have to take into account the  dynamic aspect
of the error.

In  addition  to  computing  power,  the  vidicon  camera  also   presents  some
limitations to faster response. The short time memory of the vidicon which helps
us (persons) to view the TV monitor, will "smear" a fast moving object.   If the
scene  is bright  enough a  shutter  can be  used.  If  the vision  can  be made
sufficiently fast and  accurate, the control program  currently used to  run the
arm dynamically could be expanded to incorporate visual information.

One of the characteristics that distinguishes our visual feedback scheme is that
the analysis of the  scene, to detect the errors  to be corrected, is  done with
the hand (which is still holding the object) in the scene. In the  grasping task
the presence of the hand  is inevitable.  In other tasks, for  example stacking,
being able to analyze the scene  with the object still grasped helps  to correct
the  positional errors  before  they become  catastrophic (e.g  the  stack falls
down).  Also some time  is saved since there is  no need to release  the object,
move the arm away,  bring it back and grasp  the object again.  We pay  for this
flexibility with increased complexity of the scene analysis.

The difficulty  is lessened by  the fact  that the hand-mark  (which is  used to
identify the  hand see  Figure 5) has  known form  and relative  intensity which
helps to locate it in the image.  In the grasping task the ability  to recognize
the hand is  necessary. The task  is essentially to  position the hand  at fixed
location and orientation relative to the object to be grasped.

We have  found it to  our benefit to  locate the hand  first in the  other tasks
also.  After  the  hand-mark is  found,  we  use its  location  to  predict more
accurately the locations  in the image  of the edges of  the object held  by the

                                  Figure    5
                                   Hand Mark

Moreover, after the hand-mark has been found, it will not be confused with other
edges in the scene.   Since we are using only  one camera (one view),  we cannot
measure directly even differences of locations of two neighboring points.  Hence
the  three-dimensional  (3-D)  information  has to  be  inferred  from  the two-
dimensional  (2-D)  information  available  in  the  image  and   some  external

The external information, which is used in the visual feedback tasks is supplied
either by  the touch sensors  on the fingers  or by the  fact that an  object is
resting on the  table-top or on another  object of known dimensions.   The touch
sensors help us to determine the  plane containing the hand mark from  the known
position  of the  touched object.   The support  hypothesis gives  us  the plane
containing the bottom edges of the top object.

Before an object is to be  grasped or stacked upon, the arm is  positioned above
the object  and the  hand-mark is sought.   The hand  is positioned  high enough
above the object so that  the corner-finder does not confuse the  hand-mark with
the  object.   After  the  hand-mark  is  found,  the  difference   between  the
coordinates of the predicted location of the hand-mark and the location where it
was actually found is stored. The same is done for the place on the  table where
an object is going to be placed.

The table is  divided into 10-centimeter squares,  (there are 100  squares), and
the corrections are stored with the square over which they were found.   When we

subsequently look  for the  hand-mark over this  part of  the table,  the stored
differences are used to correct  the prediction.  Since we now have  a corrected
prediction, the dimension of the window,  or the search space used, can  be made

Each time that the hand-mark  is found again, the differences  between predicted
(before correction) and  actual locations in the  image are also used  to update
the stored corrections.

To find the hand we look for both corners, since the scene is complicated by the
presence of other objects.  The  camera is centered on the hand-mark.  Using the
camera and arm models, the locations  of the images of the two lower  corners of
the hand-mark  are predicted.   Also the  form of  the image  of the  corners is
computed.  The predicted width of the hand-mark in the image is stored.

Using the  information computed above,  the right side  corner is  sought first,
using the corner-finder. If the corner is found, the error between its predicted
and actual locations  is used to  update the prediction  of the location  of the
left side corner which is now sought.  If the right corner is not found  we look
for the left one first

This algorithm is an example of the use of information about a  relation between
two features to be found, in addition to information pertaining to  each feature

We check that  we found the  corners belonging to  the hand-mark, and  not those
belonging  to  a cube  which  might have  very  similar form,  by  comparing the
distance between the corners in the image with the stored predicted width.


The "Grasping" task is to grasp precisely a cube of approximately known position
and orientation in order to move it and place it somewhere else, or stack  it on
another cube.  The  precision is needed  in order not to  drop the cube  in mid-
trajectory (which can happen if the  cube is grasped too close to an  edge), and
in order that its position relative to the hand will be known.  This information
is used in the other two visual feedback tasks.  We try to grasp the cube on the
mid-line between  the faces  perpendicular to  the fingers,  half way  above the
center of  gravity.  Note  that in  one direction  (namely perpendicular  to the
fingers) the hand  does its own error  correcting.  When the hand  is positioned
over the cube with maximum opening between the fingers (7 cm between the tips of
the touch sensors) and then closed, the cube will be moved and always end in the
same position relative to the  hand, independent of the initial  position.  This
motion is sometimes disturbing (e.g when  grasping the top cube of a  stack, the
movement can cause it to become unstable before it is fully gripped and  it will
fall  off the  stack), and  hence no  use is  made of  this feature.  Instead we
correct errors in this direction as well, such that when a cube is grasped it is
moved by less than the tolerance of the feedback loop.

The grasping is done in the following steps:

        (a) The  fingers are fully  opened and  and the hand  is moved  over the
        center of the cube so that the fingers are parallel to the cube`s sides.

        (b) The  touch sensors  are enabled and  the hand  is closed  slowly (at
        about 1/4  of the  usual speed  or about  2.5 cm/sec)  until one  of the
        fingers touches the face of the cube. The touch is light enough  so that
        the cube is not moved.  The touch sensors are then disabled.

        (c) Using the distance between the tips of the sensors after the closing
        motion of the fingers is stopped, the equation for the  plane containing
        the hand-mark facing the camera is computed.

        (d) The hand-mark  is then sought.  After  the two corners of  the hand-
        mark have been found, the  camera transformation is used to  compute the
        corresponding rays.  These rays are intersected with the plane  found in
        step (c)  to give  the coordinates of  the corners.  To verify  that the
        corners  found do  belong  to the  hand-mark,  we check  that  they have
        approximately  the  same  height, and  that  the  distance  between them
        corresponds to the width of the hand-mark.

        (e) Using the information already used in step (c), the  position errors
        of the hand are computed.  If the magnitudes of the errors in  all three
        directions are less  than a threshold (currently  0.25 cm), the  task is
        finished, we go to  step (f) and then  exit.  If the errors  are larger,
        the hand is opened and the errors are corrected by changing the position
        of the arm appropriately.  We then  go back to step (b) to  check errors


The placing  task is  to place  the cube precisely  at a  given location  on the
table.  With very minor modifications, it could be used to place the cube on any
relatively large  horizontal surface  of known  height which  does not  have any
reference marks near the location where the cube is to be placed.  In  this case
the  support hypothesis  is the  only external  information used.   The  task is
carried out in the following steps:

        (a) The cube is  grasped and moved to  a position above the  table where
        the cube is to be placed.

        (b) The cube is placed.

        (c) The hand-mark is located in the image.

        (d) The camera is centered on the visible bottom edges of the cube.

        (e) The locations of mid-points and the orientation of the images of the

        two  visible  bottom edges  of  the  cube are  computed  using  the hand
        transformation and the size of the cube. The predicted location  is then
        corrected by the amounts computed in step (c).

        (f) The corner-finder is then used to locate the two lines in the image.
        The two lines are intersected to find the corner location in  the image.
        Using the support hypothesis, the location of the corner is computed and
        compared with the  required location.  If  the magnitudes of  the errors
        are less than a threshold (0.25 cm) in both directions then the  task is
        completed.  Otherwise the  cube  is lifted  and the  error  corrected by
        changing the position of the arm appropriately.  We then go back to step
        (b) to check the errors again.


The stacking task is to stack one cube on top of another cube so that  the edges
of the bottom face of the top cube will be parallel to the edges of the top face
of the  bottom cube,  at offsets  specified to the  program by  the user  or the
calling module.

The task is carried out in the following steps:

        (a) The top cube is grasped.

        (b) The camera is centered on the top face of the bottom cube.  The mid-
        points and orientations of the images  of the two edges of the  top face
        of the bottom cube, belonging also to the most visible vertical face and
        other visible vertical face, are computed.  The corner-finder is used to
        locate these  two lines  in the image.   The locations  and orientations
        found are  then stored.   The two  lines are  intersected and  using the
        known height of the cube, the location of the corner is found. Using the
        given offsets, the coordinates  of the required positions of  the corner
        of the bottom face  of the top cube and  the center of the top  cube are

        (c) The  top cube is  moved to  a location just  above the  bottom cube,
        oriented so that the hand-mark is parallel to most visible vertical face
        of the bottom cube.

        (d) The top cube is placed on the bottom cube.

        (e) The hand-mark is located in the image and then the two edges  of the
        bottom face of the  top cube are located as  in steps (c) to (f)  of the
        placing task.  In this case, however,  the edges of the top face  of the
        bottom cube will  also appear in view.  A simple algorithm is  used with
        the information computed  in step (b) to  decide which of the  lines are
        the edges of the bottom face of the top cube.  This simple algorithm can
        be deceived sometimes by the presence of shadows and "doubling" of edges
        in the image.  We could  make the algorithm more immune by  locating the
        vertical edges of cubes also if they could be found.
        (f) The two edges of the bottom  face of the top cube found in  the last
        step are intersected  to find the corner  location in the  image.  Using
        the support hypothesis,  the coordinates of  the location of  the corner
        are computed and  compared with the  required location computed  in step
        (b).  If the  magnitudes of the errors  are less than a  threshold (0.25
        cm) in both directions then the task is completed. Otherwise the cube is
        lifted  and the  error corrected  by changing  the position  of  the arm
        appropriately. We then go back to step (d) to check the errors again.

Instead of  a bottom  cube, we can  specify to  the program a  square hole  in a
bottom object in which the top cube  is to be inserted.  In this case,  when the
top  cube is  placed on  the bottom  object we  have to  check how  much  it was
lowered.  If it was lowered past some threshold this means that it is already in
the hole and can be released.  We  make sure that the grip of the hand  is tight
enough so that  the cube grasped  will not rotate  when placed partly  above the

The programs  described here are  presently being expanded  to provide  a system
capable of discrete component assembly tasks.


[Feldman] J.Feldman with others,"The Use of Vision and Manipulation to Solve the
        `Instant  Insanity' Puzzle,"  Second International  Joint  Conference on
        Artificial Intelligence, London September 1-3, 1971.

[Gill] A.  Gill, "Visual  Feedback and Related  Problems in  Computer Controlled
        Hand-Eye  Coordination,"  Stanford  Artificial   Intelligence  Memo 178,
        October 1972.

[Paul] R.P.C.Paul, "Modelling, Trajectory Calculation and Servoing of a Computer
        Controlled Arm," Stanford Artificial Intelligence Memo 177, March 1973.

[Pieper] D. L. Pieper, "The Kinematics of Manipulators Under  Computer Control",
        Stanford Artificial Intelligence Memo 72, October 1968.

[Scheinman]  V.  D.  Scheinman, "Design  of  a  Computer  Manipulator", Stanford
        Artificial Intelligence Memo 92, June 1969.