Reinforcement Learning/Value Iteration

Policy iteration vs Value iteration edit

  • Policy iteration computes optimal value and policy
  • Value iteration:
    • Maintain optimal value of starting in a state s if have a finite number of steps   left in the episode
    • Iterate to consider longer and longer episodes

Policy iteration and value iteration will converge to the same optimal policy.


Algorithm edit

Value function of a policy is the solution to the Bellman equation

 
Bellman-backup operator is an operator that is applied to a value function and returns a new value function. The Bellman-backup operator improves the value if it is possible
 
  yields a value function over all states  .