inconvenience during training #5

AlejoCarpentier007 · 2024-06-26T06:46:32Z

I was doing training using dueling networks and after episode 582 an error occurred and the training continued as if nothing had happened but I lost all the progress and the agent as if the weights had been restored

episode 582 ep score 141.0 average score 181.4 n steps 111244
/home/edo/projects/protorl/protorl/examples/protorl/memory/sum_tree.py:104: RuntimeWarning: divide by zero encountered in double_scalars
weights = np.array([(1 / self.counter * 1 / prob)**self.beta
/home/edo/projects/protorl/protorl/examples/protorl/memory/sum_tree.py:106: RuntimeWarning: invalid value encountered in multiply
weights *= 1 / max(weights)
episode 583 ep score 161.0 average score 181.0 n steps 111405

The training was going well, it reached +350 average reward in episode 292 but then it started to drop in performance, something that is normal but then that happened
there was a division by zero there and then another error in a multiplication

source code

from protorl.agents.dueling import DuelingDQNAgent as Agent
from protorl.actor.dueling import DuelingDQNActor as Actor
from protorl.learner.dueling import DuelingDQNLearner as Learner
from protorl.loops.single import EpisodeLoop
from protorl.policies.epsilon_greedy import EpsilonGreedyPolicy
from protorl.utils.network_utils import make_dqn_networks
from protorl.wrappers.common import make_env
from protorl.memory.generic import initialize_memory

def main():
env_name = 'CartPole-v1'
# env_name = 'PongNoFrameskip-v4'
use_prioritization = True
use_double = True
use_dueling = True
use_atari = False
layers=[32]
env = make_env(env_name, use_atari=use_atari)
n_games = 1500
bs = 64
# 0.3, 0.5 works okay for cartpole
# 0.25, 0.25 doesn't seem to work
# 0.25, 0.75 doesn't work
memory = initialize_memory(max_size=100_000,
obs_shape=env.observation_space.shape,
batch_size=bs,
n_actions=env.action_space.n,
action_space='discrete',
prioritized=use_prioritization,
alpha=0.3,
beta=0.5
)

policy = EpsilonGreedyPolicy(n_actions=env.action_space.n, eps_dec=1e-4)

q_eval, q_target = make_dqn_networks(env,hidden_layers=layers, use_double=use_double,
                                     use_dueling=use_dueling,
                                     use_atari=use_atari)
dqn_actor = Actor(q_eval, q_target, policy)
q_eval, q_target = make_dqn_networks(env, hidden_layers=layers,use_double=use_double,
                                     use_dueling=use_dueling,
                                     use_atari=use_atari)
dqn_learner = Learner(q_eval, q_target,use_double=use_double,
                      prioritized=use_prioritization, lr=1e-4)

agent = Agent(dqn_actor, dqn_learner, prioritized=use_prioritization,)
sample_mode = 'prioritized' if use_prioritization else 'uniform'
ep_loop = EpisodeLoop(agent, env, memory, sample_mode=sample_mode,
                      prioritized=use_prioritization)
scores, steps_array = ep_loop.run(n_games)

if name == 'main':
main()

The text was updated successfully, but these errors were encountered:

AlejoCarpentier007 · 2024-06-28T02:30:41Z

I have trained two dueling agents, one without double and the other with double, both without prioritized replay and in both the training was successful, I am almost sure that the problem I experienced before was due to using experienzed replay, at least this is using dueling. The problem is most likely in the sum tree, although the training initially works perfectly, it is hyper efficient much more than when it is not used, there comes a point where performance declines and never recovers, I don't know what could be being the cause, I played with changing alpha and beta and the behavior of epsilon greedy and all without success. I will try to make Beta grow gradually, like epsilon does, which decreases with steps, and make Beta take the value of 1 as the episodes pass. It may be something else, the truth is I don't know very well how Priorized replay works. dueling without double took 3361 episodes to achieve 500 average reward dueling with double did it faster, at 2874. all of this was with only one agent at a time.

AlejoCarpentier007 · 2024-06-29T23:32:04Z

Hi, I saw you made some recent updates. I was testing with those new changes and the error keeps appearing, it is in the calculate_weight function of sum_tree, the value of prob at some point during training is zero. To solve this I put an if that if it is zero probs directly places a zero in weight and thus I avoid the division by zero, with that it does not happen to me that the performance plummets and the training improves, I wanted to know if it happens to you the same as after a few episodes a division between zero appears

source code

def _calculate_weights(self, probs: List):
if self.counter == 0:
# avoid division by zero
print("Counter is zero, returning ones")
return np.ones(len(probs))

	weights = []
	for prob in probs:
    	if prob > 0:
        		weight = (1 / self.counter * 1 / prob) ** self.beta
    	else:
        		print(f"Prob is zero for prob value: {prob}")
        		weight = 0
    	weights.append(weight)

	weights = np.array(weights)

	max_weight = max(weights)
	if max_weight > 0:
    	weights *= 1 / max_weight
	else:
    	print("Max weight is zero or less")

	return weights

philtabor · 2024-08-06T13:15:24Z

This should be fixed in the latest release, where I have a minimum value for the probabilities.

AlejoCarpentier007 closed this as completed Jun 29, 2024

AlejoCarpentier007 reopened this Jun 29, 2024

philtabor closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconvenience during training #5

inconvenience during training #5

AlejoCarpentier007 commented Jun 26, 2024

AlejoCarpentier007 commented Jun 28, 2024

AlejoCarpentier007 commented Jun 29, 2024

philtabor commented Aug 6, 2024

inconvenience during training #5

inconvenience during training #5

Comments

AlejoCarpentier007 commented Jun 26, 2024

AlejoCarpentier007 commented Jun 28, 2024

AlejoCarpentier007 commented Jun 29, 2024

philtabor commented Aug 6, 2024