Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in tf_agents.bandits.policies.linalg.conjugate_gradient? #852

Open
td20002 opened this issue Jul 27, 2023 · 0 comments
Open

Bug in tf_agents.bandits.policies.linalg.conjugate_gradient? #852

td20002 opened this issue Jul 27, 2023 · 0 comments

Comments

@td20002
Copy link

td20002 commented Jul 27, 2023

Hello,

I tried using the conjugate_gradient in tf_agents.bandits.policies.linalg with different batch_size but with the same example (b_mat is batch_size columns of the same example) and for each batch_size, conjugate_gradient returns a different result. This is incorrect since columns in b_mat are the same, so the result matrix should be the same. This affects the predicted rewards in _predict_mean_reward_and_variance.

Then I tried replacing this conjugate_gradient implementation with tf.matmul(tf.linalg.inv(a_mat), b_mat) and got the correct result (the result matrix is the same).

Can you check if this is a bug? If yes, why you guys didn't simply use tf.matmul(tf.linalg.inv(a_mat), b_mat) and reinvented the wheel here?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant