UNIVERSITY OF HERTFORDSHIRE COMPUTER SCIENCE RESEARCH COLLOQUIUM presents "The reward is not the task: Non-stationary Policy Synthesis for Temporal Logic Planning with Compositional Value Functions" Thomas Ringstrom (Department of Computer Science, University of Minnesota, USA) 13 March 2019 (Wednesday) 13:00 - 14:00 Hatfield, College Lane Campus Seminar Room C152 Everyone is welcome to attend Refreshments will be available Abstract: The notion of `generalization' in reinforcement learning and control refers to the capacity of a system to abstract useful knowledge from one solution and use it in a new context to solve a different problem. A related idea is `compositionality', which means that primitive partial solutions can be combined in a combinatorial manner to achieve a task, making infinite use of finite means. In this talk I will present Constraint Satisfaction Propagation (CSP), a new algorithm for constructing value functions and policies that are inherently compositional and generalizable across classes of logically posed tasks. CSP solves problems with time-constrained sequentially dependent sub-goals which would normally be computationally intractable to solve using reward maximization methods. CSP adopts the perspective that one should maximize the probability of solving the task through combinatorial composition rather than maximizing an expectation of accumulated reward. The reward is not the task, but it is often used as a proxy for it, and hierarchical control architectures that respect this distinction stand to benefit from task-abstraction and computational efficiency.