perm filename P.DOC[1,VDS] blob sn#082277
filedate 1974-01-15 generic text, type T, neo UTF8
COMPUTER MANIPULATOR CONTROL,VISUAL FEEDBACK
AND RELATED PROBLEMS
Aharon Gill, Richard Paul, Victor Scheinman
This paper describes work at the Stanford Artificial Intelligence Project on
Manipulation and Visual feedback. The work is described extensively in
[Gill],[Paul] and [Scheinman] and this paper is an attempt to provide an
overview of the combined work. We first describe the design of the manipulator
shown in Figure 1 and then go on to describe in detail the trajectory generation
for manipulator motions and the software servo system. Final sections describe
the corner-finder used by the visual feedback system and the visual feedback
2 ARM DESIGN
The arm is a manipulator with six degrees of freedom plus a hand, and is
characterized by two rotating "shoulder" joints, a linear motion "elbow" joint,
and three rotating "wrist" joints. A vise grip hand serves as the terminal
device (see Figure 1). The manipulator can handle a 4 kg. load within its 1.0
meter diameter hemisphere working volume. Average point to point servo time is
less than two seconds.
The results of systems studies indicated that an all electric motor powered arm
was best suited to our tasks and goals. By making the axes of the wrist joints
intersect at one point, this arm is a "solvable configuration" [Piper].
Furthermore, to facilitate computation, all axes are made either normal or
parallel to adjacent axes.
All joints are characterized by a motor, torque increaser (speed reducer),
potentiometer position feedback element, analog tachometer velocity transducer,
The authors wish to thank Professor Jerome Feldman for his invaluable help and
advice in relation to this work.
This research was supported in part by the Advanced Research Projects Agency of
the Office of Defense under Contract No. SD-183.
The views and conclusions in this document are those of the authors and should
not be interpreted as necessarily representing the official policies, either
expressed or implied, of the Advanced Research Projects Agency or the U.S.
and an electromagnetic brake locking device. All joints are back-drivable.
This means that loads applied to the output member reflect back to the motor
shaft. Brakes are required to hold the gravity and external force loaded joints
of an arm of this type type when current to the servo motors is turned off. This
allows the motors to run at high peak torque levels because of the intermittent
The two shoulder joints are powered by low inertia permanent magnet d.c. motors.
An harmonic drive component set with a reduction ratio of 100/1 gives a torque
multiplication of about 90 for both shoulder joints. This reduction unit is
reversible and has a torque limit of over 10 kg-m. A slip clutch prevents
damaging overloads from large accelerations which may be imposed when the arm
accidentally bumps into a solid object. An electromagnetic brake, mounted on
the high speed motor shaft, holds the joint at any given position, eliminating
the need for continuous application of motor torque.
The two shoulder joints are constructed of large diameter tubular aluminum
shafts which house the harmonic drive and slip clutch. Each tubular shaft is
mounted on two large diameter, thin section ball bearings supported in a solid
external housing. At the shoulder, heavy sections are liberally employed
because inertia effects of added mass are relatively small in this area of the
arm. Each of these two joints has integral, conductive plastic potentiometers.
The geometrical configuration of joint #1 is such that the arm base can be
bolted to any flat surface, but the design calculations have been made assuming
a table top mount.
To obtain maximum useful motion, the elbow joint is offset from the intersection
of the axes of joints #1 and #2. This extra link added to the arm geometry,
allows 360 degrees of rotation of both shoulder joints.
Joint #3, the linear motion elbow joint, has a total travel of 75 centimeters,
making the second link length variable from 15 to 90 cm. The main boom is a 6.3
cm square aluminum tube which rolls on a set of sixteen phenolic surfaced ball
bearing rollers. These rollers resist tube twisting and bending moments and
support normal loads. They allow only pure translation of the tube. The
sections of the square tube boom and the supporting assembly have been designed
to optimize performance with respect to structure deflection, natural frequency,
and load carrying ability.
The use of rollers provides a larger bearing surface area than balls or ball
slides, etc., while the square, thin-walled tube provides better roller support
near its edges than a comparable section of round tube. The drive for this
joint is provided by a gear driven rack mounted along the neutral axis of the
boom.The rack is directly driven by a 0.22 kg-m torque motor making it a back-
drivable assembly with a 9 kg. thrust capability. A strip of conductive plastic
material is cemented along the centerline of the boom. This is read by a wiper
element mounted inside the roller housing to give a positive indication of the
boom position. A tachometer and brake are mounted on the motor shaft.
Design of the wrist joints (#4 and #5) is similar to that of the shoulder joints
except that all components are smaller and lighter. Great attention has been
paid to obtaining the required performance with the least mass. Small size
harmonic drive units are used with permanent magnet torque motors. A motor
shaft brake is also employed to hold position. These joints are also back-
drivable and have a 0.95 kg-m maximum torque capability.
Joint #6, the hand rotation, employs a small permanent magnet motor with a spur
gear reduction. It too has an integral potentiometer element, tachometer, and
The terminal device is a vise grip hand with two opposed fingers. Here, two
interchangeable plate like fingers slide together symmetrically, guided on four
rails (two for each finger). They are driven by a rack and pinion arrangement
utilizing one center gear and a rack for each finger. The maximum jaw opening
is 10 cm with a holding force of 9 kgs. Thick, rubber jaw pads provide a high
coefficient of friction for positive handling of most objects. Each finger pad
is provided with a switch type touch sensor mounted in the center of the grip
area. These touch sensors have a 5 gram force threshold.
The potentiometer, tachometer and touch sensor signals are fed directly into the
computer (Digital Equipment Corp. PDP-6 and PDP-10), through a 64 channel 12 bit
A/D interface. The servo loop is closed within the computer. The computer
output is a proportional drive command through an 8 bit DAC. This signal is fed
into a voltage to pulse width converter which drives the switching transitor
power amplifiers. These pulse width modulated switches are directly connected to
the joint motors.
In order to move the arm we first calculate a trajectory. This is in the form of
a sequence of polynomials, expressing joint angles as a function of time, one
for each joint. When the arm starts to move, it is normally working with
respect to some surface, for instance, picking up an object from a table. The
initial motion of the hand should be directly away from the surface. We specify
a position on a normal to the surface out from the initial position, and require
that the hand pass through this position. By specifying the time required to
reach this position, we control the speed at which the object is lifted.
For such an initial move, the differential change of joint angles is calculated
for a move of 7.5 cm in the direction of the outward pointing normal. A time to
reach this position based on a low arm force is then calculated. The same set of
requirements exists in the case of the final position. Here we wish once again
to approach the surface in the direction of the normal, this time passing down
through a letdown point.
This gives us four positions: initial,liftoff,letdown, and final. If we were to
servo the arm from one position to the next we would not collide with the
support. We would, however, like the arm to start and end its motion with zero
velocity and acceleration. Further, there is no need to stop the arm at all the
intermediate positions. We require only that the joints of the arm pass through
the trajectory points corresponding to these intermediate positions at the same
The time for the arm to move through each trajectory segment is calculated as
follows: For the initial and final segments the time is based on the rate of
approach of the hand to the surface and is some fixed constant. The time
necessary for each joint to move through its mid-trajectory segment is
estimated, based on a maximum joint velocity and acceleration. The maximum of
these times is then used for all the joints to move through the mid trajectory
We could determine a polynomial for each joint which passes through all the
points and has zero initial and final velocity and acceleration. As there are
four points and four velocity and acceleration constraints we would need a
seventh order polynomial. Although such polynomials would satisfy our
conditions, they often have extrema between the initial and final points which
must be evaluated to check that the joint has not exceeded its working range.
As the extrema are difficult to evaluate for high order polynomials, we use a
different approach. We specify three polynomials for each joint, one for the
trajectory from the initial point to the liftoff point, a second from the
liftoff to the setdown point, and a third from the setdown to the final point.
We specify that velocity and acceleration should be zero at the initial and
final points and continuous at the intermediate points. This sequence of
polynomials satisfies our conditions for a trajectory and has extrema which are
If a joint exceeds its working range at an extremum, then the trajectory segment
in which it occurs is split in two, a new intermediate point equal to the joint
range limit is specified at the break, and the trajectory recalculated.
A collision avoider will modify the arm trajectory in the same manner by
specifying additional intermediate points. If a potential collision were
detected, some additional points would be specified for one or more joints to
pass through in order to avoid the collision.
The servo program which moves the arm is a conventional sampled data servo
executed by the computer with the following modifications. Certain control
constants, the loop gain, predicted gravity and external torques are pre-
calculated and varied with arm configuration.
We treat the system as continuous, and ignore the effects of sampling, assuming
that the sampling period is much less than the response time of the arm. Time
is normalized to the sampling period, which has the effect of scaling the link
inertia up by the square of the sampling frequency. The Laplace transform is
The set point for each joint of the arm is obtained by evaluating the
appropriate trajectory segment polynomial for the required time. The velocity
and acceleration are evaluated as the first and second derivatives of the
The position error is the observed position O less the required value O
Likewise the velocity error is the observed velocity less the required velocity.
Position feedback is applied to decrease position error and velocity feedback is
used to provide damping.
A simple feedback loop is shown in Figure 2. The arm is represented by 1/s J,
where J is the effective link inertia, a function of arm configuration. T(s) is
an external disturbing torque. The set point R(s) is subtracted from the
current position to obtain the position error E(s) and is multiplied by s,
representing differentiation, to obtain the error velocity. There are two
feedback gains ke and kv, position and velocity respectively.
Simple Servo Loop
By writing the loop equation we can obtain the system response:
2 2 2
E(s)= (-s J)/(s J + skv + ke)*R(s) + 1/(s J + skv + ke)*T(s)
and the condition for critical damping is:
kv = 2(J*ke) [Eq. 2]
It can be seen that the system response is dependent on J as would be expected.
Because the effective link inertia J can vary by 10:1 as the arm configuration
changes, we are unable to maintain a given response (see Equation 2) independent
of arm configuration. If, however, we add a gain of -J as shown in Figure 3 then
Compensated Servo Loop
2 2 2
E(s)= (-s )/(s + skv + ke)*R(s) + 1/(s + skv + ke)*T(s)/J
and the condition for critical damping is:
kv = 2*(ke) [Eq. 4]
It can be seen that the servo response is now independent of arm configuration.
The principal disturbing torque is that due to gravity, causing a large position
error, especially in the case of joint 2. If we were able to add a term equal to
the negative of the gravity loading Tg (see Figure 3) then we would obtain the
same system response as in Equation 3 except that T would become Te, the
external disturbing torque, less the gravity dependent torque, reducing the
We can compensate for the effect of acceleration of the set point R(s), the
first term in Equation 3, if we add a term s R(s) (see Figure 3) to obtain
finally a system response:
E(s)= 1/(s + skv + ke)*T(s)/J [Eq. 5]
The gain of -J and the torque Tg are obtained by evaluating the coefficients of
the equations of motion [Paul] at intervals along the trajectory.
The servo has uniform system response under varying arm configurations and is
compensated for gravity loading and for the acceleration of the set point r.
Although these gains give an acceptable response from the point of view of
stiffness, the gain is too low to maintain the high positional tolerance of ␈1.2
mm, which we are just able to measure using the 12 bit A/D converter. In order
to achieve this error tolerance, the position error is integrated when the arm
has reached the end of its trajectory. When the position error of a joint is
within tolerance the brake for that joint is applied and the joint is no longer
servoed. When all the joints are within the error tolerance the trajectory has
The output of the servo equation is a torque to be applied at the joint. Each
joint motor is driven by a pulse-width modulated voltage signal. The output of
the computer is this pulse-width and the polarity. The drive module relates
torque to drive voltage pulse-width.
The motors are driven by a 360 Hertz pulse-width modulated voltage source. The
program output "h" is the relative "on" time of this signal. If we plot an
experimental curve of "h" vs. joint torque we obtain two discontinuous curves
depending on the joint velocity (see Figure 4).
This curve can be explained in terms of two friction effects: load dependent,
causing the two curves to diverge, and load independent, causing separation at
the two curves at the origin. The electrical motor time constant also affects
the shape of the curve near the origin. Experimentally determined curves are
supplied to the servo program in piecewise linear form
One other factor considered is the back emf of the motor. The value of "h" is
the ratio of required voltage to supply voltage. The supply voltage is simply
augmented by the computed back emf before "h" is calculated.
Pulse Width vs. Output Torque
Two programs exist, one for planning "arm programs" and the other for executing
the resulting trajectory files. This section lists the arm primitives, which
have meaning at two times: once at planning, when the trajectory file is being
created and feasibility must be checked, trajectories calculated etc., and once
at execution time when the primitives are executed in the same way that
instructions are executed in a computer.
OPEN (DIST) Plan to open or close the hand such that the gap between the
finger tips is DIST.
CLOSE (MINIMUM) Plan to close the hand until it stops closing and then
check that the gap between the finger tips is greater than
MINIMUM. If it is less, then give error 2.
CENTER (MINIMUM) This is the same as CLOSE except that the hand is
closed with the touch sensors enabled. When the first finger
touches, the hand is moved along with the fingers, keeping the
touching finger in contact. When the other finger touches, both
fingers are driven together as in CLOSE.
CHANGE (DX_DY_DZ, VELOCITY) Plan to move the arm differentially to
achieve a change of hand position of vector DX_DY_DZ at a
maximum speed of VELOCITY.
PLACE Plan to move the hand vertically down until the hand meets some
resistance, that is, the minimum resistance that the arm can
MOVE ( T ) At planning time check that the position specified by the
hand transformation T is clear. Plan to move the hand along a
trajectory from its present position to | T |. The hand is moved
up through a point LIFTOFF given by LIFTOFF = INITIAL_POSITION +
_______ _______ ________________
DEPART. where DEPART is a global vector initialized to z = 7.5
centimeters. Similarly on arrival the hand is moved down
through a point SET_DOWN given by: SET_DOWN = FINAL_POSITION +
________ ________ ______________
ARRIVE. ARRIVE is also set to z =7.5 centimeters.
PARK Plan a move as in MOVE but to the "park" position.
SEARCH(NORMAL, STEP) Set up for a rectangular box search normal to
NORMAL of step size STEP. The search is activated by the AOJ
There are also control primitives which specify how the other primitives are to
be carried out.
STOP (FORCE, MOMENT) During the next arm motion stop the arm when the
feedback force is greater than the equivalent joint force. If
the arm fails to stop for this reason before the end of the
motion, generate error 23.
SKIPE (ERROR) If error ERROR occurred during the previous primitive then
skip the next primitive.
SKIPN (ERROR) if error ERROR occurred during the previous primitive
execute the next primitive otherwise skip the next primitive.
JUMP (LAB) Jump to the primitive whose label in LAB.
AOJ (LAB) Restore the cumulative search increment and jump to LAB.
WAIT Stop execution, update the state variables and wait for a proceed
TOUCH (MASK) Enable the touch sensors specified by mask for the next
SAVE Save the differential deviation from the trajectory set point.
This can be caused by CHANGE type primitives.
RESTORE Cause the arm to deviate from the trajectory set point at the
end of the next motion by the deviation last saved.
With the exception of MOVE, which requires a trajectory file, most functions can
be executed directly by prefixing the primitive name by "DO." The planning
program plans the action and sends it to the arm servo program to be executed.
This does not change the state of the arm servo program if it is in a "wait"
state and execution can continue after any number of executed primitives. This
method is used by the interactive programs, which will plan a move to bring the
hand close to the required place and then plan a "wait." When executed, the hand
position will be modified during the wait phase by the interacting program
executing a series of "DO" commands. Execution of the preplanned trajectory can
then continue by calling "DO_PROCEED."
The arm system has been programmed to provide a set of general block
manipulation routines. With these routines it is necessary only to give the name
of the block and its desired position and orientation; the program then
generates the requires moves and hand actions to perform the transformation.
These routines were used in conjunction with the vision and strategy systems to
solve the "Instant Insanity" puzzle [Feldman]. In the case of manipulation
tasks, this system has been employed to screw a nut onto a bolt and to turn a
crank. With the development of a corner operator visual feedback tasks could be
We will now describe the corner-finder and the visual feedback tasks in which it
is used. The purpose of the corner-finder is to find lines and corners (which
are the main features of planar bounded objects) in a small area of the frame of
intensity values read into the computer memory from the vidicon camera. The
corner-finder utilizes information about the features to the extent given to it;
it is not a general scene analyzer (even in the context of planar bounded
objects),and although it can be used as part of one, it will be uneconomical to
do so. The corner-finder operates by analyzing part of the area (a window) at a
time and moving the analyzed window in a controlled search pattern when needed.
Two main types of scene analyzers using simple intensity information have been
developed over the years:
(a) The "gradient follower" type looks for boundaries of regions by
analyzing intensity gradients at image points.
(b) The "region grower" type aggregates points based on some similarity
criterion to form regions.
The corner-finder uses ideas from both these types. It makes rough checks on
the existence of regions in the analyzed area. For this purpose each point
within the area is processed simply to form the intensity histogram of the area.
It then follows boundaries of regions by using a dissimilarity criterion. No
gradient type processing is used so that continuity is not lost at points of
weak gradient, sharp corner, etc. The corner-finder is described in detail in
General scene analyzers do not use any prior information because there is no
reason for them to assume the existence of such information. On the other hand
the corner-finder described here uses prior information down to its lowest
levels. The design philosophy is to use and check against prior information at
the earliest possible moment. The corner-finder can find only simple corners
directly. Complex corners, with more than two edges, can then be constructed
from simpler corners. Generally, the vertices and edges of simple corners found
in the image will not completely coincide even if the simple corners are parts
of the same complex corner. Therefore we will merge them to form a complex
corner if they are "close" (within some tolerance), and especially if there is
some external information which indicates the existence of a complex corner
rather than that of several separate simple corners.
The following assumptions guided the development of the corner-finder. They are
not all necessary conditions for its operation or success. The most important
assumption is that some of the properties of the corner (e.g location, form and
orientation, relative inside to outside intensity) are known at least
approximately. The properties of the object to which this corner belongs are
known (e.g. the hand or a specific cube), or because this corner was found
before by the same or similar programs.
Not all the properties need be given to the program. The user or a higher level
program can give as many of the properties as he/she/it decides to give.
Actually the properties are not only "given" to the program, but the user can
"demand" a match, within a given tolerance, of these properties and the actual
measured properties of the corner found.
Some comments about window size: the window size which is regularly used has a
dimension of 18*18 raster units. When the 50 mm focal length lens is used it
corresponds to a field of view of approximately 1 degree which incidentally is
the field of view of the sensitive part of the human eye, the fovea. The fovea
however has about 5 times more sensing elements in the same field of view. We
should also note the human ability to resolve between pairs of lines that are
closer than the distance between the sensing elements. Carrying the above
analogy a little farther we can say that moving the window in the frame is
similar to the movement of the eye in the head, while moving the camera is
similar to rotating the head.
This size of the window was chosen in order to fulfill the assumptions that the
window is smaller than the image of the object so that each line or corner
intersects the perimeter of the window, but big enough so that we will have
enough boundary points to get a good line fit. Also we want the size of the
window to be small enough so that the assumption of almost uniform intensity
inside the window is justified.
7 VISUAL FEEDBACK TASKS
The purpose of the visual feedback tasks is to increase the precision of the
manipulations done in the hand-eye system. The feedback currently does not take
into account the dynamic aspects of the manipulation.
The tasks are carried out in the context of the general tasks that the hand-eye
system can currently perform, i.e. the recognition and manipulation of simple
planar bounded objects. The manipulations that we sought to make more precise
with the addition of visual feedback are grasping, placing on the table and
stacking. The precision obtained is better than 2.5 mm. This value should be
judged by comparing it with the limitations of the system. The resolution of the
imaging system with the 50 mm lens is 1 mrad. which, at an operating range of 75
centimeters, corresponds to 0.8 mm. The resolution of the arm position reading
(lower bit of the A/D converter reading the first arm joint potentiometer) is
also 0.8 mm, but the noise in the arm position reading corresponds to 1.2 mm.
When we tried to achieve precision of 1.2 mm, the feedback loop was executed a
number of times until the errors randomly happened to be below the threshold.
The question of whether the visual feedback, or in response in general, is
dynamic or not,is sometimes more semantic than real. What the person asking the
question means in this case is, does it seem to be continuous? The question
then is really that of cycle time or sampling period. A cycle time of 20 msec
will suffice to fool the human eye so that the response will be called
"dynamic." Since the computations needed and the computing power that we now
have cause the length of the cycle time to be several seconds, no attempt was
made to speed it up by programming tricks, use of machine language, etc. With
this cycle time the movement of the arm actually stops before we analyze the
situation again, so that we do not have to take into account the dynamic aspect
of the error.
In addition to computing power, the vidicon camera also presents some
limitations to faster response. The short time memory of the vidicon which helps
us (persons) to view the TV monitor, will "smear" a fast moving object. If the
scene is bright enough a shutter can be used. If the vision can be made
sufficiently fast and accurate, the control program currently used to run the
arm dynamically could be expanded to incorporate visual information.
One of the characteristics that distinguishes our visual feedback scheme is that
the analysis of the scene, to detect the errors to be corrected, is done with
the hand (which is still holding the object) in the scene. In the grasping task
the presence of the hand is inevitable. In other tasks, for example stacking,
being able to analyze the scene with the object still grasped helps to correct
the positional errors before they become catastrophic (e.g the stack falls
down). Also some time is saved since there is no need to release the object,
move the arm away, bring it back and grasp the object again. We pay for this
flexibility with increased complexity of the scene analysis.
The difficulty is lessened by the fact that the hand-mark (which is used to
identify the hand see Figure 5) has known form and relative intensity which
helps to locate it in the image. In the grasping task the ability to recognize
the hand is necessary. The task is essentially to position the hand at fixed
location and orientation relative to the object to be grasped.
We have found it to our benefit to locate the hand first in the other tasks
also. After the hand-mark is found, we use its location to predict more
accurately the locations in the image of the edges of the object held by the
Moreover, after the hand-mark has been found, it will not be confused with other
edges in the scene. Since we are using only one camera (one view), we cannot
measure directly even differences of locations of two neighboring points. Hence
the three-dimensional (3-D) information has to be inferred from the two-
dimensional (2-D) information available in the image and some external
The external information, which is used in the visual feedback tasks is supplied
either by the touch sensors on the fingers or by the fact that an object is
resting on the table-top or on another object of known dimensions. The touch
sensors help us to determine the plane containing the hand mark from the known
position of the touched object. The support hypothesis gives us the plane
containing the bottom edges of the top object.
Before an object is to be grasped or stacked upon, the arm is positioned above
the object and the hand-mark is sought. The hand is positioned high enough
above the object so that the corner-finder does not confuse the hand-mark with
the object. After the hand-mark is found, the difference between the
coordinates of the predicted location of the hand-mark and the location where it
was actually found is stored. The same is done for the place on the table where
an object is going to be placed.
The table is divided into 10-centimeter squares, (there are 100 squares), and
the corrections are stored with the square over which they were found. When we
subsequently look for the hand-mark over this part of the table, the stored
differences are used to correct the prediction. Since we now have a corrected
prediction, the dimension of the window, or the search space used, can be made
Each time that the hand-mark is found again, the differences between predicted
(before correction) and actual locations in the image are also used to update
the stored corrections.
To find the hand we look for both corners, since the scene is complicated by the
presence of other objects. The camera is centered on the hand-mark. Using the
camera and arm models, the locations of the images of the two lower corners of
the hand-mark are predicted. Also the form of the image of the corners is
computed. The predicted width of the hand-mark in the image is stored.
Using the information computed above, the right side corner is sought first,
using the corner-finder. If the corner is found, the error between its predicted
and actual locations is used to update the prediction of the location of the
left side corner which is now sought. If the right corner is not found we look
for the left one first
This algorithm is an example of the use of information about a relation between
two features to be found, in addition to information pertaining to each feature
We check that we found the corners belonging to the hand-mark, and not those
belonging to a cube which might have very similar form, by comparing the
distance between the corners in the image with the stored predicted width.
The "Grasping" task is to grasp precisely a cube of approximately known position
and orientation in order to move it and place it somewhere else, or stack it on
another cube. The precision is needed in order not to drop the cube in mid-
trajectory (which can happen if the cube is grasped too close to an edge), and
in order that its position relative to the hand will be known. This information
is used in the other two visual feedback tasks. We try to grasp the cube on the
mid-line between the faces perpendicular to the fingers, half way above the
center of gravity. Note that in one direction (namely perpendicular to the
fingers) the hand does its own error correcting. When the hand is positioned
over the cube with maximum opening between the fingers (7 cm between the tips of
the touch sensors) and then closed, the cube will be moved and always end in the
same position relative to the hand, independent of the initial position. This
motion is sometimes disturbing (e.g when grasping the top cube of a stack, the
movement can cause it to become unstable before it is fully gripped and it will
fall off the stack), and hence no use is made of this feature. Instead we
correct errors in this direction as well, such that when a cube is grasped it is
moved by less than the tolerance of the feedback loop.
The grasping is done in the following steps:
(a) The fingers are fully opened and and the hand is moved over the
center of the cube so that the fingers are parallel to the cube`s sides.
(b) The touch sensors are enabled and the hand is closed slowly (at
about 1/4 of the usual speed or about 2.5 cm/sec) until one of the
fingers touches the face of the cube. The touch is light enough so that
the cube is not moved. The touch sensors are then disabled.
(c) Using the distance between the tips of the sensors after the closing
motion of the fingers is stopped, the equation for the plane containing
the hand-mark facing the camera is computed.
(d) The hand-mark is then sought. After the two corners of the hand-
mark have been found, the camera transformation is used to compute the
corresponding rays. These rays are intersected with the plane found in
step (c) to give the coordinates of the corners. To verify that the
corners found do belong to the hand-mark, we check that they have
approximately the same height, and that the distance between them
corresponds to the width of the hand-mark.
(e) Using the information already used in step (c), the position errors
of the hand are computed. If the magnitudes of the errors in all three
directions are less than a threshold (currently 0.25 cm), the task is
finished, we go to step (f) and then exit. If the errors are larger,
the hand is opened and the errors are corrected by changing the position
of the arm appropriately. We then go back to step (b) to check errors
The placing task is to place the cube precisely at a given location on the
table. With very minor modifications, it could be used to place the cube on any
relatively large horizontal surface of known height which does not have any
reference marks near the location where the cube is to be placed. In this case
the support hypothesis is the only external information used. The task is
carried out in the following steps:
(a) The cube is grasped and moved to a position above the table where
the cube is to be placed.
(b) The cube is placed.
(c) The hand-mark is located in the image.
(d) The camera is centered on the visible bottom edges of the cube.
(e) The locations of mid-points and the orientation of the images of the
two visible bottom edges of the cube are computed using the hand
transformation and the size of the cube. The predicted location is then
corrected by the amounts computed in step (c).
(f) The corner-finder is then used to locate the two lines in the image.
The two lines are intersected to find the corner location in the image.
Using the support hypothesis, the location of the corner is computed and
compared with the required location. If the magnitudes of the errors
are less than a threshold (0.25 cm) in both directions then the task is
completed. Otherwise the cube is lifted and the error corrected by
changing the position of the arm appropriately. We then go back to step
(b) to check the errors again.
The stacking task is to stack one cube on top of another cube so that the edges
of the bottom face of the top cube will be parallel to the edges of the top face
of the bottom cube, at offsets specified to the program by the user or the
The task is carried out in the following steps:
(a) The top cube is grasped.
(b) The camera is centered on the top face of the bottom cube. The mid-
points and orientations of the images of the two edges of the top face
of the bottom cube, belonging also to the most visible vertical face and
other visible vertical face, are computed. The corner-finder is used to
locate these two lines in the image. The locations and orientations
found are then stored. The two lines are intersected and using the
known height of the cube, the location of the corner is found. Using the
given offsets, the coordinates of the required positions of the corner
of the bottom face of the top cube and the center of the top cube are
(c) The top cube is moved to a location just above the bottom cube,
oriented so that the hand-mark is parallel to most visible vertical face
of the bottom cube.
(d) The top cube is placed on the bottom cube.
(e) The hand-mark is located in the image and then the two edges of the
bottom face of the top cube are located as in steps (c) to (f) of the
placing task. In this case, however, the edges of the top face of the
bottom cube will also appear in view. A simple algorithm is used with
the information computed in step (b) to decide which of the lines are
the edges of the bottom face of the top cube. This simple algorithm can
be deceived sometimes by the presence of shadows and "doubling" of edges
in the image. We could make the algorithm more immune by locating the
vertical edges of cubes also if they could be found.
(f) The two edges of the bottom face of the top cube found in the last
step are intersected to find the corner location in the image. Using
the support hypothesis, the coordinates of the location of the corner
are computed and compared with the required location computed in step
(b). If the magnitudes of the errors are less than a threshold (0.25
cm) in both directions then the task is completed. Otherwise the cube is
lifted and the error corrected by changing the position of the arm
appropriately. We then go back to step (d) to check the errors again.
Instead of a bottom cube, we can specify to the program a square hole in a
bottom object in which the top cube is to be inserted. In this case, when the
top cube is placed on the bottom object we have to check how much it was
lowered. If it was lowered past some threshold this means that it is already in
the hole and can be released. We make sure that the grip of the hand is tight
enough so that the cube grasped will not rotate when placed partly above the
The programs described here are presently being expanded to provide a system
capable of discrete component assembly tasks.
[Feldman] J.Feldman with others,"The Use of Vision and Manipulation to Solve the
`Instant Insanity' Puzzle," Second International Joint Conference on
Artificial Intelligence, London September 1-3, 1971.
[Gill] A. Gill, "Visual Feedback and Related Problems in Computer Controlled
Hand-Eye Coordination," Stanford Artificial Intelligence Memo 178,
[Paul] R.P.C.Paul, "Modelling, Trajectory Calculation and Servoing of a Computer
Controlled Arm," Stanford Artificial Intelligence Memo 177, March 1973.
[Pieper] D. L. Pieper, "The Kinematics of Manipulators Under Computer Control",
Stanford Artificial Intelligence Memo 72, October 1968.
[Scheinman] V. D. Scheinman, "Design of a Computer Manipulator", Stanford
Artificial Intelligence Memo 92, June 1969.