Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconvenience during training #5

Closed
AlejoCarpentier007 opened this issue Jun 26, 2024 · 3 comments
Closed

inconvenience during training #5

AlejoCarpentier007 opened this issue Jun 26, 2024 · 3 comments

Comments

@AlejoCarpentier007
Copy link

I was doing training using dueling networks and after episode 582 an error occurred and the training continued as if nothing had happened but I lost all the progress and the agent as if the weights had been restored

episode 582 ep score 141.0 average score 181.4 n steps 111244
/home/edo/projects/protorl/protorl/examples/protorl/memory/sum_tree.py:104: RuntimeWarning: divide by zero encountered in double_scalars
weights = np.array([(1 / self.counter * 1 / prob)**self.beta
/home/edo/projects/protorl/protorl/examples/protorl/memory/sum_tree.py:106: RuntimeWarning: invalid value encountered in multiply
weights *= 1 / max(weights)
episode 583 ep score 161.0 average score 181.0 n steps 111405

The training was going well, it reached +350 average reward in episode 292 but then it started to drop in performance, something that is normal but then that happened
there was a division by zero there and then another error in a multiplication

source code

from protorl.agents.dueling import DuelingDQNAgent as Agent
from protorl.actor.dueling import DuelingDQNActor as Actor
from protorl.learner.dueling import DuelingDQNLearner as Learner
from protorl.loops.single import EpisodeLoop
from protorl.policies.epsilon_greedy import EpsilonGreedyPolicy
from protorl.utils.network_utils import make_dqn_networks
from protorl.wrappers.common import make_env
from protorl.memory.generic import initialize_memory

def main():
env_name = 'CartPole-v1'
# env_name = 'PongNoFrameskip-v4'
use_prioritization = True
use_double = True
use_dueling = True
use_atari = False
layers=[32]
env = make_env(env_name, use_atari=use_atari)
n_games = 1500
bs = 64
# 0.3, 0.5 works okay for cartpole
# 0.25, 0.25 doesn't seem to work
# 0.25, 0.75 doesn't work
memory = initialize_memory(max_size=100_000,
obs_shape=env.observation_space.shape,
batch_size=bs,
n_actions=env.action_space.n,
action_space='discrete',
prioritized=use_prioritization,
alpha=0.3,
beta=0.5
)

policy = EpsilonGreedyPolicy(n_actions=env.action_space.n, eps_dec=1e-4)

q_eval, q_target = make_dqn_networks(env,hidden_layers=layers, use_double=use_double,
                                     use_dueling=use_dueling,
                                     use_atari=use_atari)
dqn_actor = Actor(q_eval, q_target, policy)
q_eval, q_target = make_dqn_networks(env, hidden_layers=layers,use_double=use_double,
                                     use_dueling=use_dueling,
                                     use_atari=use_atari)
dqn_learner = Learner(q_eval, q_target,use_double=use_double,
                      prioritized=use_prioritization, lr=1e-4)

agent = Agent(dqn_actor, dqn_learner, prioritized=use_prioritization,)
sample_mode = 'prioritized' if use_prioritization else 'uniform'
ep_loop = EpisodeLoop(agent, env, memory, sample_mode=sample_mode,
                      prioritized=use_prioritization)
scores, steps_array = ep_loop.run(n_games)

if name == 'main':
main()

@AlejoCarpentier007
Copy link
Author

I have trained two dueling agents, one without double and the other with double, both without prioritized replay and in both the training was successful, I am almost sure that the problem I experienced before was due to using experienzed replay, at least this is using dueling. The problem is most likely in the sum tree, although the training initially works perfectly, it is hyper efficient much more than when it is not used, there comes a point where performance declines and never recovers, I don't know what could be being the cause, I played with changing alpha and beta and the behavior of epsilon greedy and all without success. I will try to make Beta grow gradually, like epsilon does, which decreases with steps, and make Beta take the value of 1 as the episodes pass. It may be something else, the truth is I don't know very well how Priorized replay works. dueling without double took 3361 episodes to achieve 500 average reward dueling with double did it faster, at 2874. all of this was with only one agent at a time.

@AlejoCarpentier007
Copy link
Author

WhatsApp Image 2024-06-29 at 13 54 59
Hi, I saw you made some recent updates. I was testing with those new changes and the error keeps appearing, it is in the calculate_weight function of sum_tree, the value of prob at some point during training is zero. To solve this I put an if that if it is zero probs directly places a zero in weight and thus I avoid the division by zero, with that it does not happen to me that the performance plummets and the training improves, I wanted to know if it happens to you the same as after a few episodes a division between zero appears

source code

def _calculate_weights(self, probs: List):
if self.counter == 0:
# avoid division by zero
print("Counter is zero, returning ones")
return np.ones(len(probs))

	weights = []
	for prob in probs:
    	if prob > 0:
        		weight = (1 / self.counter * 1 / prob) ** self.beta
    	else:
        		print(f"Prob is zero for prob value: {prob}")
        		weight = 0
    	weights.append(weight)

	weights = np.array(weights)

	max_weight = max(weights)
	if max_weight > 0:
    	weights *= 1 / max_weight
	else:
    	print("Max weight is zero or less")

	return weights

@philtabor
Copy link
Owner

This should be fixed in the latest release, where I have a minimum value for the probabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants