A Reinforcement-learning Framework for Interpreting Trial-by-trial Motor Adaptation to Novel Haptic Environments
Date of Award
Doctor of Philosophy (PhD)
Motor adaptation is often considered to occur under the influence of sensory signals, which is usually readily available for humans performing most motor tasks. However, humans can also use reward or other qualitative feedback to reinforce previous actions and perform adaptation. In these experiments, we introduce reward feedback to a traditional motor adaptation experiment: reach adaptation to a velocity-dependent force field. Drawing from the literature of computer science and machine learning, we use a reinforcement-learning framework to interpret the pattern of force generation and reward-prediction errors and observe the effects of concurrent and isolated reward and sensory feedback.
It is important to understand how motor adaptation occurs in the absence of sensory feedback. If neurological damage occurs in the cerebellum, which is responsible for much of motor adaptation via sensed errors, it will become necessary to recruit other areas of the brain to assist in motor relearning. Learning from reward prediction errors appears to happen in the human brain and occurs mostly in the basal ganglia and striatum (Schultz, 1993; Bayer & Glimcher, 2005). If we can understand how the reinforcement-learning system influences motor adaptation, then we can leverage it to help those who cannot recover sufficiently under sensed feedback alone.
In Chapter 2, we develop an in silico model of adaptation to a viscous field when the reward signal is the only available feedback. We make predictions about the behavior of the model from the published mathematics and algorithms. In particular, we develop two predictive models that explain how value (i.e. reward predictions) and force generation change on a trial-by-trial basis.
In Chapter 3, we design a psychophysical experiment that mirrors the in silico model conditions. Subjects are restricted to a straight path to a target while receiving reward. The reward signal is maximal when the subject generates velocity-dependent forces into the virtual walls that restrict them. These are forces that would perfectly compensate a viscous curl field. Our subjects never actually experience perturbation from the viscous field, but still learn to generate appropriate forces just from the reward signal alone.
In Chapter 4, we use what we know about adaptation to a viscous field with isolated reward feedback and determine how this learning process interacts with sensed error feedback; that is, we allow are subjects to be perturbed by a real viscous field and layer reward feedback on top of this experience. Whether or not the subject has been exposed to a viscous field affects the rate at which you adapt to an oppositely signed field with the additional reward feedback signal. The reward signal seems to prevent anterograde interference that would normally occur when switching between viscous environments with opposite strengths and without reward feedback.
Overall, we find that (1) a verbal report of the expectation of reward serves as a useful measure when calculating the relevant teaching signal, the reward prediction error, (2) subjects learn a value function in a manner predicted by a reinforcement-learning algorithm & (3) the magnitude of the reward prediction error correlates with the magnitude of trajectory and force change and (4), subjects are able to learn to produce forces that would compensate a viscous field without ever experiencing the actual perturbation, (5) learning from reward and sensory errors concurrently leads to a different memory formation than learning from the senses alone.
Dennis L Barbour
Ian G Dobbins, Lawrence H Snyder
Permanent URL: https://doi.org/10.7936/K7WH2N4W