Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorFlow2] Critic Loss Calculation for actor_critic #41

Open
srihari-humbarwadi opened this issue Jan 25, 2022 · 0 comments
Open

[TensorFlow2] Critic Loss Calculation for actor_critic #41

srihari-humbarwadi opened this issue Jan 25, 2022 · 0 comments

Comments

@srihari-humbarwadi
Copy link

If I understand correctly, the code in tensorflow2/actor_critic.py implements the One-step Actor-Critic (episodic) algorithm given on page 332 of RLbook2020 by Sutton/barto (picture given below).

image

Here we can see that the critic parameters w are updated only using the gradient of the value function for the current state S
which is represented as grad(V(S, w)) in the pseudocode shown above. The update skips the gradient of the value function for the next state S'. This can again be seen in the pseudocode above, there is no grad(V(S', w)) present in the update rule for critic parameters w.

In the code given below, including state_value_, _ = self.actor_critic(state_) (L43) inside the GradientTape would result in grad(V(S', w)) appearing in the update for w, which contradicts the pseudocode shown above.

reward = tf.convert_to_tensor(reward, dtype=tf.float32) # not fed to NN
with tf.GradientTape(persistent=True) as tape:
state_value, probs = self.actor_critic(state)
state_value_, _ = self.actor_critic(state_)
state_value = tf.squeeze(state_value)
state_value_ = tf.squeeze(state_value_)

Please let me know if there are some gaps in my understanding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant