Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapter09 PG_baseline_cartpole #44

Open
charles-bu opened this issue Apr 4, 2019 · 0 comments
Open

chapter09 PG_baseline_cartpole #44

charles-bu opened this issue Apr 4, 2019 · 0 comments

Comments

@charles-bu
Copy link

Hi,

thank for you the book that brought me into the world of reinforcement learning. Though there is quite a lot material available in the internet, it is still quite hard for a beginner to catch the whole ideas about the reinforcement learning. your book provides a systematic way for those beginners like me.

my question is on your cp09 Policy gradient baseline code(cartpole). I would like to have the honour to get your further advice.

in the PG code, after the agent interacts with the environment, I noticed that you just record the S, A, R, S', discard the output value of the PGN net for every time stamp. in that case, at the training stage, the below codes were needed for the loss function,

states_v = torch.FloatTensor(batch_states)
logits_v = net(states_v)

My question is whether I could use store the net(state_v) during the first time Agent interacts with the environment. then, those the logit_v stored could be extracted for the loss function computing instead of making another computing for the logit_v. in that case, we could waive one round of net forward computing.

The reason I raised that question is that the pytorch spends a lot of time for converting CPU tensor to GPU. therefore, I thought if I could avoid that step, for accelerating the whole computing.

however, I am just a beginner only with 2 months learning on your books. I did not have the confidence on that.

your advice is highly appreciated!

Best Regards,

Charles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant