You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried using the conjugate_gradient in tf_agents.bandits.policies.linalg with different batch_size but with the same example (b_mat is batch_size columns of the same example) and for each batch_size, conjugate_gradient returns a different result. This is incorrect since columns in b_mat are the same, so the result matrix should be the same. This affects the predicted rewards in _predict_mean_reward_and_variance.
Then I tried replacing this conjugate_gradient implementation with tf.matmul(tf.linalg.inv(a_mat), b_mat) and got the correct result (the result matrix is the same).
Can you check if this is a bug? If yes, why you guys didn't simply use tf.matmul(tf.linalg.inv(a_mat), b_mat) and reinvented the wheel here?
Thanks.
The text was updated successfully, but these errors were encountered:
Hello,
I tried using the conjugate_gradient in tf_agents.bandits.policies.linalg with different batch_size but with the same example (b_mat is batch_size columns of the same example) and for each batch_size, conjugate_gradient returns a different result. This is incorrect since columns in b_mat are the same, so the result matrix should be the same. This affects the predicted rewards in _predict_mean_reward_and_variance.
Then I tried replacing this conjugate_gradient implementation with tf.matmul(tf.linalg.inv(a_mat), b_mat) and got the correct result (the result matrix is the same).
Can you check if this is a bug? If yes, why you guys didn't simply use tf.matmul(tf.linalg.inv(a_mat), b_mat) and reinvented the wheel here?
Thanks.
The text was updated successfully, but these errors were encountered: