New Ideas and Algorithms for Real-time, Real-world Machine Learning

Photo

Our work on continual and contextual co­adaptation of a prosthesis to its user has been largely rooted in the machine learning of forecasts of future sensorimotor outcomes through processes of temporal-­difference learning (see Pilarski et al. 2017). A main fundamental investigation has therefore been extending and enhancing the idea of Generalized Value Functions (GVFs) and the Horde architecture initially proposed in Sutton et al. (2011). One major contribution has been new algorithms for GVF step­size adaptation and feature selection, namely the extension of supervised learning step­size adaptation methods to the RL setting in the form of Temporal-­Difference Incremental Delta­-Bar­-Delta (TIDBD) (Kearney et al. 2019), TIDBD’s use in robot domains for tuning-free knowledge architectures and detecting broken or stuck robotic actuators (Gunther et al. 2020), and finally, an off­-policy variant of TIDBD and its use in evaluating knowledge acquired by a learning agent (Kearney et al. 2022).

We have further uncovered approaches that allow GVF architectures to estimate their own introspective measures of surprise and form predictions of surprise (Gunther et al. 2018), and introduced the use of meta­gradient descent to select GVFs used in predictive-­representation-of-­state-­based policy learning (Kearney et al. 2022). We have also pioneered new methods that allow GVFs to be learned faster and with more flexibility from real­-world data streams, including Gamma-­Nets (Sherstan et al. 2019) and accelerating GVF learning in constructive predictive frameworks via the successor representation (Sherstan et al. 2018).

Patrick M. Pilarski
Patrick M. Pilarski
Ph.D., ICD.D, Canada CIFAR AI Chair & Professor of Medicine

BLINC Lab, University of Alberta.