Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it reasonable that pred is better that gt on text-motion-matching metrics on HumanML3D #60

Open
CDLCHOI opened this issue Jul 5, 2024 · 2 comments

Comments

@CDLCHOI
Copy link

CDLCHOI commented Jul 5, 2024

I notice that GT R precision top1~ top3 is about 51% 70% 79% in HumanML3D dataset.
But MoMask gets 52% 71% 81% Rprecision. So is MMDist.
Could you explain that? Thanks.
@EricGuo5513 @Murrol

@Murrol
Copy link
Collaborator

Murrol commented Jul 5, 2024

Good question.

It probably results from the distribution difference between the text-motion training set and the testing set. The evaluator is also a network trained on the training set to fit its distribution, which might have variance/bias error on the testing set. If the generated text-motion data aligns with the training set distribution better, the evaluation metrics could be even better than the GT testing set.

The quantitative evaluation of motion generation performance would be an interesting topic to discuss and explore.

@CDLCHOI
Copy link
Author

CDLCHOI commented Jul 5, 2024

Good question.

It probably results from the distribution difference between the text-motion training set and the testing set. The evaluator is also a network trained on the training set to fit its distribution, which might have variance/bias error on the testing set. If the generated text-motion data aligns with the training set distribution better, the evaluation metrics could be even better than the GT testing set.

The quantitative evaluation of motion generation performance would be an interesting topic to discuss and explore.

Thanks for your reply.
Another question:The evaluator was designed for test set. But , why the evaluator was trained on train set instead of test set. If evaluator was trained on test set, maybe we can get a more credible text-motion-matching metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants