Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about git_id of fitlog. #62

Open
Dingseewhole opened this issue Feb 23, 2023 · 2 comments
Open

Question about git_id of fitlog. #62

Dingseewhole opened this issue Feb 23, 2023 · 2 comments

Comments

@Dingseewhole
Copy link

Dingseewhole commented Feb 23, 2023

I'm a big fan of fitlog and often recommend it to my colleagues. Thank you for creating such a great repository.
I found that when viewing the log file on the webpage, there is a column called git_id, for example, ID: log_20230221_075043 with git_id: 395daa36. I also use Github to manage the same set of code while using fitlog, I thought git_id would match the commit ID of Github repository. But I found that the digits of git_id of fitlog and commit ID of Github repository are inconsistent. For example, "git_id of fitlog = 395daa36" but "commit ID of Github repository = e9ca0d4".
I'm curious, how can I use git_id of fitlog to identify the commit ID of Github repository?

@ScarlettChan

This comment was marked as off-topic.

@WillQvQ
Copy link
Member

WillQvQ commented Feb 28, 2023

I'm sorry that we were unable to reproduce your issue and cannot provide very suitable suggestions. However, we found that multiple commits may occur simultaneously on different processes in multi-card training. You may check whether your problem was caused by this situation.

If you encounter this situation, we recommend that you set the fitlog to debug mode on non-main processes by using fitlog.debug() . If your training framework is fastNLP , you can use os.environ.get(FASTNLP_GLOBAL_RANK, 0)==0 to determine if the program is running on the main process.


很抱歉,我们没能复现您的问题,无法给出非常准确的建议。但我们发现,在多卡训练的过程中多个进程上的 fitlog 可能会同时进行 commit 操作,您可以检查一下问题是否是由这种情况导致。

如果遇到了这种情况,我们建议您使用 fitlog.debug() 将非主进程上的 fitlog 设置为 debug 模式 。如果您的训练框架是 fastNLP ,可以用 os.environ.get(FASTNLP_GLOBAL_RANK, 0)==0 来判断程序是否运行在主进程上。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants