Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support an installation path using RAPIDS nightly packages & source installs of Curator #133

Open
randerzander opened this issue Jun 29, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@randerzander
Copy link

I'm trying to use development features of cuDF (long string support, json reader bug fixes) with the latest Curator code.

@ayushdg suggested first installing rapids nightly packages by creating a requirements-rapids-nightly.txt:

--extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple
cudf-cu12>=24.8.0a0,<=24.8
dask-cudf-cu12>=24.8.0a0,<=24.8
cuml-cu12>=24.8.0a0,<=24.8
cugraph-cu12>=24.8.0a0,<=24.8
dask-cuda>=24.8.0a0,<=24.8
cython

And installing RAPIDS & Curator like so:

pip install -r requirements-rapids-nightly.txt
pip install -e .

RAPIDS installed fine. Then the install of most Curator packages proceeds, but I get errors about building fasttext and pycld2 wheels:

# Very lengthy list of c++ errors like:
95.04       src/args.cc:17:1: note: 'uint64_t' is defined in header '<cstdint>'; this is probably fixable by adding '#include <cstdint>'                                                      
95.04          16 | #include <unordered_map>                                                                                                                                                  
95.04         +++ |+#include <cstdint>                                                                                                                                                        
95.04          17 |                                                                                                                                                                           
95.04       src/args.cc:471:5: error: 'multiplier' was not declared in this scope                                                                                                             
95.04         471 |     multiplier = units[lastCharacter];                                                                                                                                    
95.04             |     ^~~~~~~~~~                                                                                                                                                            
95.04       src/args.cc:474:11: error: expected ';' before 'size'                                                                                                                             
95.04         474 |   uint64_t size = 0;                                                                                                                                                      
95.04             |           ^~~~~                                                                                                                                                           
95.04             |           ;                                                                                                                                                               
95.04       src/args.cc:478:5: error: 'size' was not declared in this scope                                                                                                                   
95.04         478 |     size = std::stol(modelSize, &nonNumericCharacter);                                                                                                                    
95.04             |     ^~~~                                                                                                                                                                  
95.04       src/args.cc:490:10: error: 'size' was not declared in this scope                                                                                                                  
95.04         490 |   return size * multiplier;                                                                                                                                               
95.04             |          ^~~~                                                                                                                                                             
95.04       src/args.cc:490:17: error: 'multiplier' was not declared in this scope                                                                                                            
95.04         490 |   return size * multiplier;                                                                                                                                               
95.04             |                 ^~~~~~~~~~                                                                                                                                                
95.04       error: command '/opt/conda/envs/rapids/bin/gcc' failed with exit code 1                                                                                                           
95.04       [end of output]                                                                                                                                                                   
95.04                                                                                                                                                                                         
95.04   note: This error originates from a subprocess, and is likely not a problem with pip.                                                                                                  
95.05   ERROR: Failed building wheel for fasttext  

## End
128.0 Successfully built nemo_curator jieba unidic-lite comment-parser crossfit antlr4-python3-runtime youtokentome rouge-score wget asciitree                                               
128.0 Failed to build fasttext pycld2                                                                                                                                                         
128.0 ERROR: Could not build wheels for fasttext, pycld2, which is required to install pyproject.toml-based projects 

For reference, here's what I hope is the relevant build info:

(rapids) root@ipp1-3302:/repos/NeMo-Curator# gcc --version
gcc (conda-forge gcc 14.1.0-0) 14.1.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(rapids) root@ipp1-3302:/repos/NeMo-Curator# g++ --version
g++ (conda-forge gcc 14.1.0-0) 14.1.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(rapids) root@ipp1-3302:/repos/NeMo-Curator# cmake --version
cmake version 3.29.6

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Do pycld2 and fasttext depend on some other build packages, or different versions of gcc/g++ ?

I was able to comment out the pycld2 and fastext deps in setup.py, and then the install succeeds, but I believe I'll need those packages to be able to use our filtering module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant