Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

action lists in get_action are different #10

Open
Junem360 opened this issue Oct 31, 2018 · 0 comments
Open

action lists in get_action are different #10

Junem360 opened this issue Oct 31, 2018 · 0 comments

Comments

@Junem360
Copy link

Junem360 commented Oct 31, 2018

@titu1994
hello,
when I use your code, there is a difference between random action generating part and predicting action from controller part in get_action function.

def get_action(self, state):
    '''
    Gets a one hot encoded action list, either from random sampling or from
    the Controller RNN

    Args:
        state: a list of one hot encoded states, whose first value is used as initial
            state for the controller RNN

    Returns:
        A one hot encoded action list
    '''
    if np.random.random() < self.exploration:
        print("Generating random action to explore")
        actions = []

        for i in range(self.state_size * self.num_layers):
            state_ = self.state_space[i]
            size = state_['size']

            sample = np.random.choice(size, size=1)
            sample = state_['index_map_'][sample[0]]
            action = self.state_space.embedding_encode(i, sample)
            actions.append(action)
        return actions

    else:
        print("Prediction action from Controller")
        initial_state = self.state_space[0]
        size = initial_state['size']

        if state[0].shape != (1, size):
            state = state[0].reshape((1, size)).astype('int32')
        else:
            state = state[0]

        print("State input to Controller for Action : ", state.flatten())

        with self.policy_session.as_default():
            K.set_session(self.policy_session)

            with tf.name_scope('action_prediction'):
                pred_actions = self.policy_session.run(self.policy_actions, feed_dict={self.state_input: state})

            return pred_actions

the results of random part are the vectors consist of [0, 1+index number, 0, ...]
but the results of prediction part are the vectors consist of [0, 1, 0, ...] which is one hot encoding.

is it your intention? or just a mistake?

waiting for your answer.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant