Home Page for the EU ActIPret project at COGS
Project acronym: ActIPret
Project full title: Interpreting and Understanding Activities of Expert
Operators for Teaching and Education
Proposal/Contract no.: IST-2001-32184
People Involved in COGS
Project Summary
Abstract
Rapid technical development generates the need to train many people in expert
operations. To teach many users a system to interpret the expert's activities
is required. The user can replay activities at any time and from any
viewpoint. Due to the cognitive framework of the vision system envisioned it
is possible to index activities and objects involved. The index is based on
natural language terms and allows user-driven retrieval. The system provides
feedback to motivate the trainee and to enhance the training effect. The
cognitive vision framework builds on purposive and reactive vision techniques
which focus processing to obtain real-time performance. Integration and active
selection of techniques realises robust interpretation. The final presentation
will interpret activities involved in an assembly scenario, e.g., changing a
car wheel. Seven industrial companies have expressed interest to exploit the
results for training and long-term documentation.
Objectives
The objective of ActIPret is to develop a vision methodology that interprets
and records the activities of people handling tools. The tasks considered are
observable by video streams. Focus is on active observation and interpretation
of activities, on parsing the sequences into constituent behaviour elements,
and on extracting the essential activities and their functional dependence. By
providing this functionality ActIPret will enable observation of experts
executing intricate tasks such as repairing machines and maintaining plants.
The expert activities are interpreted and stored using natural language
expressions in an activity plan. The activity plan is an indexed manual in the
form of 3D reconstructed scenes, which can be replayed at any time and
location to many users using Augmented Reality equipment. Due to the
interpretive level of the system, ActIPret can provide the trainee with
feedback when repeating the operation (in simulation or reality), which
results in a superior training effect compared to repetition without feedback.
Current Progress
Large diagram of the ActIPret demonstrator
system. The research group at COGS
is responsible for components highlighted in red.
Current work at COGS is in three main work packages:
- WP 1: Cognitive Vision framework
- WP 3: Detection of deictic, spatial and temporal relationships
- WP 5: Synthesis of task and behaviour representation
WP 1: Cognitive Vision framework
- Generic Cognitive Vision (CV) framework can be applied to a wide range of
vision tasks:
- memory,
- learning,
- control,
- and reasoning
- Specific framework used to structure components to make up the ActIPret
Demonstrator (AD):
- first vision task (scenario): putting CD in player
How do we apply cognitive
principles in computer vision?
- Reactive (bottom up) processing limited in scope by task driven (top down)
control and constrained by task driven knowledge
- Attentional control:
- Perception is guided by expectation: covert attention
- Control resources such as views: overt attention
- Pre-attentive and attentive vision continuum:
- Quality of service
- Measure of computational cost
WP
3: Detection of deictic, spatial and temporal relationship
- Gesture Recogniser (GR): task-relevant behaviour trajectory
- Object Relation Generator (ORG): general spatial/spatio-temporal
relationships:
- Mutual proximity between two objects
- Object near trajectory of moving object
Purposive behaviour trajectory
- Extracted by the Gesture Recogniser (GR)
- Initially defined as basic hand trajectories:
- Hand moving away from torso (predictive cue)
- Hand moving towards torso (supporting evidence)
- Lateral movement (supporting evidence)
- Used
by the Reasoning Engine to predict/support the 'pick up' or 'put down'
activities
Picking up an object
(2.4Mb
MPEG movie) - example of the 'Picking up an object'
activity.
Object near trajectory of another
Mutual proximity between two objects
WP
5: Synthesis of task and behaviour representation
- Set of plan concept functions representing activities:
- comprise a conceptual language
- for building activity plans
- Pick up an object (from recognised location)
- Put down an object (at recognised location)
- Manipulate
a recognised object e.g. press a specific button (alters internal
state)
Definition of an activity plan
- Sequence of plan concept functions, derived from
- behaviour models for activities, actions or events observed in the
training exemplars
- abstract models defined within the reasoning engine
- Domain specific data necessary for the correct interpretation of the
scenario
- CD player buttons: single presses of on/off or open/close buttons will
change internal state of CD player according to order of button pressing
Example activity plan
(6.2Mb
MPEG movie)
button_press(button1,cdplayer0);
pick_up(cd0,nondef);
put_down(cd0,cdplayer0);
button_press1(button1,cdplayer0);
Summary
- CV
conceptual level framework to support task-dependent reasoning and control
- Pre-reasoning relationships to support both prediction and validation of
activities
- Initial conceptual language and control policy to support synthesis of
Activity Plan
Future Work
- Full conceptual language and notation for activity plans (D5.1)
- Produce
first prototype of conceptual processing engines within CV framework
(D1.2)
- Learning and recognition in M1:
- HMM / TDRBF techniques for 3-D hand trajectory analysis (WP 3)
- Behavioural and task models for CD scenario (WP 5)
Publications arising out of the ActIPret
Project
2003
- Howell, A.J., Sage, K.,
and Buxton, H. (2003) `Developing Task-Specific RBF Hand Gesture Recognition',
presented at GW2003 - 5th International Workshop on Gesture and Sign
Language based Human-Computer Interaction, Genova, Italy, April 2003.
- Sage, K., Howell, A.J.
and Buxton, H. (2003) `Developing Context Sensitive HMM Gesture Recognition',
presented at GW2003 - 5th International Workshop on Gesture and Sign
Language based Human-Computer Interaction, Genova, Italy, April 2003.
- Buxton, H. (2003) `Learning and
Understanding Dynamic Scene Activity: A Review', Image & Vision
Computing (21), pp.125-136.
2002
- Howell, A.J. and
Buxton, H. (2002) `Active Vision Techniques for Visually Mediated
Interaction', Image & Vision Computing (20), pp.861-871.
- Buxton, H., Howell, A.J.
and Sage, K. (2002) `The Role of Task Control and
Context in Learning to Recognise Gesture', Cognitive Vision
Workshop, Zürich, Switzerland, September 2002.
- Buxton, H. (2002)`Learning and understanding dynamic scene activity',
ECCV Generative Model Based Vision Workshop, Copenhagen, 2002.
- Howell, A.J. and Buxton, H. (2002) `RBF Network Methods for Face
Detection and Attentional
Frames', Neural Processing Letters (15), pp.197-211.
- Howell, A.J. and Buxton,
H. (2002) `Active Vision Techniques for Visually Mediated Interaction',
Proc. 16th International Conference on Pattern Recognition (ICPR 2002),
Volume II, pp.296-299, Québec City, Canada, August 2002.
- Howell, A.J. and
Buxton, H. (2002) `Visually Mediated Interaction using Learnt Gestures and
Camera Control', Gesture and Sign Language in Human-Computer Interaction,
Proc. International Gesture Workshop, GW 2001, Springer Lecture Notes in
Artificial Intelligence Vol. 2298, London, April 2001, pp.272-284.
- Vassilakis, H., Howell,
A.J. and Buxton, H. (2002) `Comparison of
Feedforward (TDRBF) and Generative (TDRGBN) Network for Gesture Based
Control', Gesture and Sign Language in Human-Computer Interaction,
Proc. International Gesture Workshop, GW 2001, Springer Lecture Notes in
Artificial Intelligence Vol. 2298, London, April 2001, pp.317-322.
2001
- Howell, A.J. (2001) `Face
Recognition using RBF Networks', in Howlett, R. J. and Jain, L.C. (Eds.)
Radial Basis Function Networks 2: New Advances in Design,
Physica-Verlag, 2001, pp.103-142.
hits since December 2001