Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irreproducible Segfault with LightGBM #1743

Closed
vruvora opened this issue Oct 11, 2018 · 24 comments · Fixed by #1886
Closed

Irreproducible Segfault with LightGBM #1743

vruvora opened this issue Oct 11, 2018 · 24 comments · Fixed by #1886

Comments

@vruvora
Copy link

vruvora commented Oct 11, 2018

We are experiencing cases of irreproducible Segfault with LightGBM. Does anyone else have this issue?

Environment info

Operating System:

CPU/GPU model:

C++/Python/R version:

Error message

Reproducible examples

Steps to reproduce

@henry0312
Copy link
Contributor

Did you install LightGBM from PyPI?
The distributions on PyPI may be broken.
(I guess compile optimization may be a cause)
So, can you try to compile from source?

@vruvora
Copy link
Author

vruvora commented Oct 11, 2018

We are installing with pip install. It just hangs in a middle of an optimization. We can try compiling from source but is there a reason why PyPI may be broken?

@henry0312
Copy link
Contributor

I guess there are something wrong with manylinux and -O3.
Maybe, we should compile without -O3 on manylinux.

@vruvora
Copy link
Author

vruvora commented Oct 11, 2018

Interesting. Have you or anyone else experienced this because we have been experiencing this a fair amount but I have not seen any big reported use cases? This seems like it would break the package from being used in production.

@guolinke
Copy link
Collaborator

@vruvora
Did the Segfault occur often?

@vruvora
Copy link
Author

vruvora commented Oct 11, 2018

Yes. It has been happening a quite a bit as of late. We have automated statistical unit tests with sample datasets which may be a degenerate optimization problem. However, we use them to make sure our pipeline is in order when we have big changes. Generally, ~1/5 times it will segfault and it will stop once we do it again.

@henry0312
Copy link
Contributor

I saw uncertain crash many times in my main job.

@vruvora
Copy link
Author

vruvora commented Oct 11, 2018

@henry0312 Interesting. Have you been able to fix it?

@henry0312
Copy link
Contributor

@vruvora you should be sure to uninstall the package from PyPI and install manually.

@henry0312
Copy link
Contributor

@vruvora Yeah, installing from source have solved the problem.

@vruvora
Copy link
Author

vruvora commented Oct 11, 2018

@henry0312 I will try this out and update accordingly. Thanks. @guolinke Do you know why source isn't up to date with PyPI?

@henry0312
Copy link
Contributor

No, please see https://github.com/Microsoft/LightGBM/blob/master/docs/Installation-Guide.rst.

@guolinke
Copy link
Collaborator

@vruvora you can use

pip install --no-binary :all: lightgbm

as well.

@guolinke
Copy link
Collaborator

@henry0312
building without O3 may slow-down the running speed.

I am not very sure why Segfault happened, and why building from source can fix it.

@henry0312
Copy link
Contributor

@guolinke Yes, -O3 makes cumpting perfomance better, but there may be something wrong with tunes for manylinux.

@vruvora
Copy link
Author

vruvora commented Oct 11, 2018

@guolinke Will this be slower pip install --no-binary :all: lightgbm than regular pip? Can you elaborate on the differences?

@henry0312
Copy link
Contributor

@vruvora have you solved your issue?

@vruvora
Copy link
Author

vruvora commented Oct 18, 2018

@henry0312 Yes. As far as we know. No more segafults.

@guolinke
Copy link
Collaborator

should we add this problem and solution to the document?

@StrikerRUS
Copy link
Collaborator

ping @henry0312

@vruvora
Copy link
Author

vruvora commented Nov 27, 2018

Yes. We should.

@henry0312
Copy link
Contributor

yeah, I also think so.

@guolinke
Copy link
Collaborator

@vruvora

@guolinke Will this be slower pip install --no-binary :all: lightgbm than regular pip? Can you elaborate on the differences?

The install speed will be slower. But the running speed isn't affected.

@StrikerRUS
Copy link
Collaborator

@vruvora A bit simplistic, the difference is in that regular pip simply downloads .whl file with prebuilt library file and copies its content in the appropriate directory.
In opposite, pip install --no-binary :all: compiles library file on your machine and performs copying.

@henry0312 Would you mind creating a PR with new FAQ entry about this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants