Only import PII constants during Curator import #61

ayushdg · 2024-05-10T19:11:36Z

One approach to fixing #59.
The root cause seems to come from the fact that importing space (which imports and calls a cupy cuda function) somehow impacts the state of the system and prevent a cluster starting up using all available GPUs on the machine.

Currently the only imports in Curator that lead to this situation is importing DEFAULT_LANGUAGE from the pii modules which transitively ends up importing presidio->spacy.

This pr moves these constants to a separate file so that we don't end up importing all other dependencies during Curator import.

…pacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>

VibhuJawa

Mostly looks good to me, nitpicks around comments and code structure.

Thanks a ton for fixing this for us.

nemo_curator/modifiers/pii_modifier.py

nemo_curator/pii/algorithm.py

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>

VibhuJawa · 2024-05-13T21:39:53Z

Tested locally, works, please go ahead and merge.

* Move PII constants to a seperate file that does not import presidio/spacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comment around import, move constant import to global scope Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>

* update extract_partitioning_index with compat code Signed-off-by: rjzamora <rzamora217@gmail.com> * [Tutorials] Add a tutorial for PEFT data curation (#45) This PR adds a new tutorial to demonstrate data curation for PEFT use-cases. Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Signed-off-by: rjzamora <rzamora217@gmail.com> * move compat code to _compat file Signed-off-by: rjzamora <rzamora217@gmail.com> * Only import PII constants during Curator import (#61) * Move PII constants to a seperate file that does not import presidio/spacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comment around import, move constant import to global scope Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * add unit test Signed-off-by: rjzamora <rzamora217@gmail.com> * add pytest.mark.gpu Signed-off-by: rjzamora <rzamora217@gmail.com> * move extract_partitioning_index import for now Signed-off-by: rjzamora <rzamora217@gmail.com> * test both cudf and pandas Signed-off-by: rjzamora <rzamora217@gmail.com> * spelling Signed-off-by: rjzamora <rzamora217@gmail.com> --------- Signed-off-by: rjzamora <rzamora217@gmail.com> Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Co-authored-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>

* Move PII constants to a seperate file that does not import presidio/spacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comment around import, move constant import to global scope Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

…DIA#60) * update extract_partitioning_index with compat code Signed-off-by: rjzamora <rzamora217@gmail.com> * [Tutorials] Add a tutorial for PEFT data curation (NVIDIA#45) This PR adds a new tutorial to demonstrate data curation for PEFT use-cases. Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Signed-off-by: rjzamora <rzamora217@gmail.com> * move compat code to _compat file Signed-off-by: rjzamora <rzamora217@gmail.com> * Only import PII constants during Curator import (NVIDIA#61) * Move PII constants to a seperate file that does not import presidio/spacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comment around import, move constant import to global scope Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * add unit test Signed-off-by: rjzamora <rzamora217@gmail.com> * add pytest.mark.gpu Signed-off-by: rjzamora <rzamora217@gmail.com> * move extract_partitioning_index import for now Signed-off-by: rjzamora <rzamora217@gmail.com> * test both cudf and pandas Signed-off-by: rjzamora <rzamora217@gmail.com> * spelling Signed-off-by: rjzamora <rzamora217@gmail.com> --------- Signed-off-by: rjzamora <rzamora217@gmail.com> Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Co-authored-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

* Move PII constants to a seperate file that does not import presidio/spacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comment around import, move constant import to global scope Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Init commit for tutorial notebook Signed-off-by: Nicole Luo <nluo@nvidia.com> * Fix metadata inference with pandas and dask (#35) * Fix metadata inference with pandas and dask Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix datatypes for task decontamination Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Use targetted import Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Disable PyTorch Compile Multiprocessing (#34) * Move tokenizer import Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Reduce inductor threads Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change env int to string Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change location of env var Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add comment linking issue Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Improve speed of AddId module (#36) * Add fast id method Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add type conversion Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix off by one errors in tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Make GPU dependencies optional (#27) * Move GPU imports and make them optional Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Move gpu dependencies to a seperate install Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove unused import Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Switch to placeholder import that raises on usage Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove deprecated utils usage Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add cuML attribution Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Safe import tests, improve install instruction, update gha workflow Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Fix pytests due to loc bug Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * update install instructions Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Raise on non module-not-found errors, update logging Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Update logging to not change root logger Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Fix failing GPU tests with latest pandas bump (#41) Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Adds Nemo Curator K8s example (#40) * [K8s]: Adds a helper script to create a dask cluster on k8s and includes instructions for how to a Curator workload on k8s Signed-off-by: Terry Kong <terryk@nvidia.com> * black formatting Signed-off-by: Terry Kong <terryk@nvidia.com> * big_english -> my_dataset Signed-off-by: Terry Kong <terryk@nvidia.com> * 24.01 -> 24.03 default container Signed-off-by: Terry Kong <terryk@nvidia.com> * Add help kwarg to all flags Signed-off-by: Terry Kong <terryk@nvidia.com> * Clarify why venv is needed Signed-off-by: Terry Kong <terryk@nvidia.com> * fix precommit failures Signed-off-by: Terry Kong <terryk@nvidia.com> --------- Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Move common dedup utils and remove unused code (#42) * Refactor common utils and remove unused code Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * More cleanup Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * More updates/shuffling Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Move gpu_dedup scripts into subfolder Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove gpu_deduplication subfolder Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add readme to fuzzy dedup scripts section Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Fix typo and relative links Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove legacy script entrypoints Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove legacy scripts and add init file Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Update GpuDeduplication.rst Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Fix lang id example (#37) * Fix lang id example Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add classifier unit tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add test for failure Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Remove failure test Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Add dataset blending tool (#32) * Add initial dataset blending function Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add blend unit tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add self parameter Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix return type of blend dataset Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix blending tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change assert statement for very uneven blend Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix key error Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add proper proportion blending test Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add four dataset blend and clarify docs Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add shuffle module Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add blend example and tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix random method name Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Wrap return type in DocumentDataset Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Save result of column drop Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change equality check for shuffle tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix expected order after shuffle Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add more documents to shuffle test Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add assert statement Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add within partition shuffle Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Refactor add rand column for shuffle Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix filename tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add determinism handling for shuffle Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change numpy random function Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix tests with new random method Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Remove length call from blending Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Improve scaling of blending function Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix blend tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add blending script Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add additional file paths call Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add documentation Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Reformat docs Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Remove backticks Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add context manager for shuffle tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add better deterministic shuffle path Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Update documentation and reset index Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * High level fuzzy duplicates module (#46) * Initial pass at fuzzy dedup api Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Update deprecated shuffle arg Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * dask_cuda gpu only import Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Move fuzzy_dedup imports to optional Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * more tests Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Move FuzzyDeDupConfig to it's own class Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add example script and config file, fix typo Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove slurm examples for gpu dedup Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add config module Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Rename FuzzyDeDupConfig and minhash_length to FuzzyDuplicatesConfig, num_hashes Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comments and update example Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Write to same format as input in fuzzy dedup example Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Fix indexing in PII Modifier (#55) * Fix pii index issue Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add sequential wrapper Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix pii tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Disable string conversion globally (#56) Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Fix issue #43 (empty files creation) and improve reading/writing speed (#57) This commit fixes issue #43 (empty files created when invoking reshard_jsonl method at nemo_curator.utils.file_utils.py) by double-checking the files size after being generated, and deleting them with size zero. In addition to that, I have noticed there is no need to parse to JSON object the content of the different lines, which should be already in json format. By removing that extra-parsing, there is a significant speed up in the execution of this method. Signed-off-by: Miguel Martínez <26169771+miguelusque@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * [Tutorials] Add a tutorial for PEFT data curation (#45) This PR adds a new tutorial to demonstrate data curation for PEFT use-cases. Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Only import PII constants during Curator import (#61) * Move PII constants to a seperate file that does not import presidio/spacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comment around import, move constant import to global scope Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Deleting links Signed-off-by: Nicoel Luo <nluo@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com> * Fixed typo. Update content to lastest NeMo Curator version. Added fuzzy deduplication wrapper example Signed-off-by: Nicole Luo <nluo@nvidia.com> * Fixing Style Signed-off-by: Nicole Luo <nluo@nvidia.com> * Updating container version Signed-off-by: Nicole Luo <nluo@nvidia.com> * Fixing style Signed-off-by: Nicole Luo <nluo@nvidia.com> * Update get_client() according to latest version; Update log path for map_bucket section Signed-off-by: Nicole Luo <nluo@nvidia.com> --------- Signed-off-by: Nicole Luo <nluo@nvidia.com> Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Miguel Martínez <26169771+miguelusque@users.noreply.github.com> Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Signed-off-by: Nicoel Luo <nluo@nvidia.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Co-authored-by: Ryan Wolf <rywolf@nvidia.com> Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Miguel Martínez <26169771+miguelusque@users.noreply.github.com> Co-authored-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com>

Move PII constants to a seperate file that does not import presidio/s…

d80f40e

…pacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>

ayushdg requested a review from VibhuJawa May 10, 2024 20:16

VibhuJawa approved these changes May 10, 2024

View reviewed changes

nemo_curator/modifiers/pii_modifier.py Outdated Show resolved Hide resolved

nemo_curator/pii/algorithm.py Show resolved Hide resolved

Add comment around import, move constant import to global scope

37c2b17

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>

ayushdg marked this pull request as ready for review May 13, 2024 18:10

ayushdg merged commit 38d8ce7 into NVIDIA:main May 13, 2024
3 checks passed

ayushdg mentioned this pull request May 15, 2024

[BUG] OOM errors while running Deduplication #59

Closed

ayushdg deleted the gpu-spacy-import-oom branch June 3, 2024 17:18

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only import PII constants during Curator import #61

Only import PII constants during Curator import #61

ayushdg commented May 10, 2024 •

edited

Loading

VibhuJawa left a comment

VibhuJawa commented May 13, 2024

Only import PII constants during Curator import #61

Only import PII constants during Curator import #61

Conversation

ayushdg commented May 10, 2024 • edited Loading

VibhuJawa left a comment

Choose a reason for hiding this comment

VibhuJawa commented May 13, 2024

ayushdg commented May 10, 2024 •

edited

Loading