-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU] fix gm allocaion on XPUContext::Impl::Init #60260
[XPU] fix gm allocaion on XPUContext::Impl::Init #60260
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM && 建议戳原作者/原审批人也看一下。
有个问题:这里引用到的PR是半年之前的,为啥最近发现了这个问题呢? |
@AlbertVan @zhupengyang 辛苦两位同学看看 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
PyTorch的XpuContext实现参考了Paddle这边的实现,PyTorch那边先发现了这个问题,大概是2023年10月份发现的。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
本质问题是多个xpu_context都会各自去申请一份XPUAPI_DEFAULT_SIZE。
训练侧后面可以再关注下。
PR types
Bug fixes
PR changes
APIs
Description
This PR #54674 forces the option XPUAPI_DEFAULT_SIZE of xdnn::Context to 1 by default, regardless of whether we set the environment variable XPUAPI_DEFAULT_SIZE to a different value. It triggers a lot of
xpu_wait
calls.This comment describes why
XPUAPI_DEFAULT_SIZE
is originally set to1
: #54674 (comment)