Please Enter a Search Term
Departmental Seminar Feb 9th: Dr. Malcolm Heywood
Antonina Kolokolova

                               Dr. Malcolm Heywood
                       Department of Computer Science
                                Dalhousie University

  Symbiosis, Complexification and Generalization: A case study in temporal
                             sequence learning

                      Department of Computer Science
          Thursday, February 9, 2012, 1:00 p.m., Room EN-2022



Abstract

Hierarchical reinforcement learning traditionally represents a framework in
which a machine learning algorithm is applied to build solutions to temporal
sequence style problems under the guidance of a priori identified sub-tasks.
Once learning relative to one set of subtasks is complete, these can then be
reused to build more complex behaviours. The principal caveat is that
appropriate subtasks can be identified, preferably without requiring a priori
knowledge. This work proposes a generic architecture for evolving hierarchical
policies through symbiosis. Specifically, symbionts define an action and an
evolved context, whereas each host identifies a subset of symbionts. Symbionts
effectively coevolve within a host. Natural selection operates on the hosts,
with symbiont existence a function of host performance. It is now possible to
support hierarchical policies as a symbiotic process by letting hosts evolved
in an earlier population become the symbiont actions at the next. Two
benchmarking studies are performed to illustrate the approach. An initial
tutorial is conducted using a truck reversal domain in which the benefits of
evolving a hierarchical solution over non-hierarchical solutions is clearly
demonstrated. A second benchmarking study is then performed using the Acrobot
handstand task. Solutions to date from reinforcement learning have not been
able to approach those established 13 years ago using an “A*” search and a
priori knowledge regarding the Acrobot energy equations. The proposed symbiotic
approach is able to match and, for the first time, better these results.
Moreover, unlike previous works, solutions are tested under a broad range of
Acrobot initial conditions, with hierarchical solutions providing significantly
better generalization performance.


Feb 1st, 2012

Bookmark and Share

Share