Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean rewards are not calculated properly #157

Open
nikolaradulov opened this issue Jun 20, 2024 · 0 comments
Open

Mean rewards are not calculated properly #157

nikolaradulov opened this issue Jun 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@nikolaradulov
Copy link

Description

The mean rewards are computed by adding the mean of all stored cumulative rewards to the self.tracking_data dictionary
self.tracking_data["Reward / Total reward (mean)"].append(np.mean(track_rewards)). Then every time the data is mean to be written the mean of all the rewards store in self.tracking_data["Reward / Total reward (mean)"] is written
self.writer.add_scalar(k, np.mean(v), timestep) and the tracking data is cleared. The issue is that points are appended every time there is data inside self._track_rewards the cumulative rewards storage. This results all the cumulative rewards that have been added in storage since the last write to be meaned, added to the tracking data and meaned again on write.

eg. say each episode is 3 steps, only 1 env instance running. Writing done every 9 steps

step1: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[]
step2: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[]
step3: Episode finishes with cumulative reward -30: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30]
step4: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30]
step5: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30]
step6: Episode finished with cumulative reward -4: step3: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17]
step7: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17]
step8: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -17]
step9 : Episode finished with reward -10: self._track_rewards = [-30, -4, -10] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -22]

At the end of step 9 the mean cumulative reward of the past 3 episodes is 14.(6) . The cumulative reward that is being written to tensorboard is -29.2 VERY DIFFERENT

SOLUTION: self._track_rewards.clear() after every time data is added to self.tracking_data["Reward / Total reward (mean)"]

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

pytorch

Additional system information

No response

@nikolaradulov nikolaradulov added the bug Something isn't working label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant