Bug in tf_agents.bandits.policies.linalg.conjugate_gradient? #852

td20002 · 2023-07-27T01:05:07Z

Hello,

I tried using the conjugate_gradient in tf_agents.bandits.policies.linalg with different batch_size but with the same example (b_mat is batch_size columns of the same example) and for each batch_size, conjugate_gradient returns a different result. This is incorrect since columns in b_mat are the same, so the result matrix should be the same. This affects the predicted rewards in _predict_mean_reward_and_variance.

Then I tried replacing this conjugate_gradient implementation with tf.matmul(tf.linalg.inv(a_mat), b_mat) and got the correct result (the result matrix is the same).

Can you check if this is a bug? If yes, why you guys didn't simply use tf.matmul(tf.linalg.inv(a_mat), b_mat) and reinvented the wheel here?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in tf_agents.bandits.policies.linalg.conjugate_gradient? #852

Bug in tf_agents.bandits.policies.linalg.conjugate_gradient? #852

td20002 commented Jul 27, 2023

Bug in tf_agents.bandits.policies.linalg.conjugate_gradient? #852

Bug in tf_agents.bandits.policies.linalg.conjugate_gradient? #852

Comments

td20002 commented Jul 27, 2023