My Custom env training go for ever #1032

Abdelrahman-Alkhodary · 2022-08-25T18:32:17Z

Hello everyone,

I have created my own custom environment following the example in the docs and ran the env checker and it went well except for a warning about box bound precision. Now I have been trying to run the model training using the td3 algorithm but the training runs forever although I put timesteps=1 just to check there is nothing wrong with my env or the training. You can find my gym env here
https://github.com/Abdelrahman-Alkhodary/Custom-gym-env

and this the system info
sb3.get_system_info()
OS: Linux-5.15.0-46-generic-x86_64-with-debian-bullseye-sid #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022
Python: 3.7.9
Stable-Baselines3: 1.6.0
PyTorch: 1.11.0+cu102
GPU Enabled: True
Numpy: 1.19.2
Gym: 0.21.0

({'OS': 'Linux-5.15.0-46-generic-x86_64-with-debian-bullseye-sid #4920.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022', 'Python': '3.7.9', 'Stable-Baselines3': '1.6.0', 'PyTorch': '1.11.0+cu102', 'GPU Enabled': 'True', 'Numpy': '1.19.2', 'Gym': '0.21.0'}, 'OS: Linux-5.15.0-46-generic-x86_64-with-debian-bullseye-sid #4920.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022\nPython: 3.7.9\nStable-Baselines3: 1.6.0\nPyTorch: 1.11.0+cu102\nGPU Enabled: True\nNumpy: 1.19.2\nGym: 0.21.0\n')

Abdelrahman-Alkhodary · 2022-08-26T13:15:56Z

Hello @araffin
The custom environment that I made is about reaching I have a goal and my agent tries to reach it. The step function is driven by neural network, the neural network reads the observation and give the action. I have trained a TD3 manually without using baseline and it worked. When I tried to use stable-baseline, the function model().learn() run forever although I put total_timesteps=1 just to check. Also, I put a dummy counter inside the step function and print the count every 1000 steps to see if the model is not stuck in somewhere else. Please if there's nothing clear tell me exactly what it's and also I have put my code

araffin · 2022-08-26T14:03:26Z

Hello,
please provide a minimal code example to reproduce the error as requested by the custom env issue template.

See #982 (comment) for a definition of what is "minimal" ;)

Abdelrahman-Alkhodary · 2022-08-26T16:27:53Z

import random
import gym
from gym import spaces
import numpy as np
import pandas as pd
class SofaArmEnv(gym.Env):
    """Custom Environment that follows gym interface"""
    metadata = {'render.modes': ['human']}
    def __init__(self):
        super(SofaArmEnv, self).__init__()
        # Define action and observation space
        # The action will be tha displacement in the cables
        self.action_space = spaces.Box(low=-1.0, high=1.0, shape=(8,), dtype=np.float32)
        # observation or the state of the env will be the tip position, goal position, cables' displacement 
        # Eff_X, Eff_Y, Eff_Z, g_X, g_Y, g_Z, c_L0, c_L1, c_L2, c_L3, c_S0, c_S1, c_S2, c_S3
        self.observation_space = spaces.Box(
            low=np.array([-150.0, -150.0, -30.0, -150.0, -150.0, -30.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
            high=np.array([150.0, 150.0, 195.0, 150.0, 150.0, 195.0, 60.0, 60.0, 60.0, 60.0, 40.0, 40.0, 40.0, 40.0]), 
            shape=(14,),
            dtype=np.float32
        )

        # dataframe that contains all the possible goals
        self.goals_df = pd.read_csv('./goals_xyz.csv')
        # set the done as False
        self.done = False 
        self.dummy_count = 1

    def step(self, delta_action):
        self.dummy_count += 1
        if self.dummy_count % 1000 == 0:
            print('1000 step')
        # here should be the neural network model that take the action and produce the tip
        self.new_tip_pos = np.random.random((3,))
        # The observation or the state is the stacking of the goal position, tip position and the actuations of the cables
        observation = np.hstack([self.goal_pos, self.new_tip_pos, delta_action]).astype(np.float32)
        # the reward is the negative distance between the tip position and the goal
        reward = self.get_reward(self.new_tip_pos)
        # update the tip position for the next step
        self.tip_pos = self.new_tip_pos
        if self.distance(self.new_tip_pos) < 5:
            self.done = True
        else: 
            self.done = False
        info = {}
        return observation, reward, self.done, info
    
    def reset(self):
        x, y, z = self.goals_df.iloc[random.randrange(0,len(self.goals_df)),:]
        self.goal_pos = [x, y, z]
        self.tip_pos = [0, 0, 195]
        self.action = [0, 0, 0, 0, 0, 0, 0, 0]
        observation = np.hstack([self.goal_pos, self.tip_pos, self.action]).astype(np.float32)
        self.done = False
        return observation  # reward, done, info can't be included

    def get_reward(self, tip_pos):
        current_distance = self.distance(tip_pos)
        return - current_distance 

    def distance(self, tip_pos):
        eff_goal_dist = np.sqrt((self.goal_pos[0] - tip_pos[0]) ** 2 +
                                (self.goal_pos[1] - tip_pos[1]) ** 2 +
                                (self.goal_pos[2] - tip_pos[2]) ** 2) 
        return eff_goal_dist

    def close(self):
        print('close method called')

and this is the training code


import numpy as np
from sofa_arm_env import SofaArmEnv

from stable_baselines3 import TD3
from stable_baselines3.common.noise import NormalActionNoise

env = SofaArmEnv()
env.reset()

n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
model = TD3('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=1, reset_num_timesteps=False)

araffin · 2022-08-26T20:47:46Z

Please read carefully the link i provided (provided code is not functional) and please use markdown codeblock to format your code (also shown in the link and the issue template).

araffin · 2022-08-26T20:50:07Z

my guess is that train freq is set to (1, "episode") by default and your episode never finishes. A solution to that is to set train freq to 1. You are also 'ot using the action noise.

Abdelrahman-Alkhodary · 2022-08-26T21:04:37Z

@araffin you are right when train_freq set to 1, it ran normally and didn't get stuck. Thanks

Abdelrahman-Alkhodary added custom gym env Issue related to Custom Gym Env question Further information is requested labels Aug 25, 2022

araffin added the more information needed Please fill the issue template completely label Aug 26, 2022

araffin mentioned this issue Aug 26, 2022

Model does not get updated when using DDPG and TD3 #1034

Closed

Abdelrahman-Alkhodary closed this as completed Aug 26, 2022

araffin mentioned this issue Aug 28, 2023

Problem on multiprocessing with TD3 #1659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

My Custom env training go for ever #1032

My Custom env training go for ever #1032

Abdelrahman-Alkhodary commented Aug 25, 2022

Abdelrahman-Alkhodary commented Aug 26, 2022

araffin commented Aug 26, 2022

Abdelrahman-Alkhodary commented Aug 26, 2022 •

edited

Loading

araffin commented Aug 26, 2022

araffin commented Aug 26, 2022

Abdelrahman-Alkhodary commented Aug 26, 2022

My Custom env training go for ever #1032

My Custom env training go for ever #1032

Comments

Abdelrahman-Alkhodary commented Aug 25, 2022

Abdelrahman-Alkhodary commented Aug 26, 2022

araffin commented Aug 26, 2022

Abdelrahman-Alkhodary commented Aug 26, 2022 • edited Loading

araffin commented Aug 26, 2022

araffin commented Aug 26, 2022

Abdelrahman-Alkhodary commented Aug 26, 2022

Abdelrahman-Alkhodary commented Aug 26, 2022 •

edited

Loading