Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package torchtext #17129

Merged
merged 14 commits into from
Jan 20, 2023
21 changes: 21 additions & 0 deletions recipes/revtok/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2017 Victor Zhong

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
40 changes: 40 additions & 0 deletions recipes/revtok/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# upstream did not publish any tags, and a performance-critical
# fix appears after the last bump (for 0.0.3). This is what e.g.
# torchtext uses upstream (installing through git), so we add a ".1"
{% set version = "0.0.3.1" %}
Comment on lines +1 to +4
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for this is https://github.com/pytorch/text/blob/v0.11.0-rc3/requirements.txt#L11, and wanting not to leave out a potentially very substantial performance improvement from jekbradbury/revtok#4 (the PR notes 100-fold improvement by interning strings in some cases), which is fact the only change after 0.0.3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to patch, just to reference the pypi version but ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean use 0.0.3 and carry a patch for jekbradbury/revtok#4? I don't mind either way. That repo looks pretty dead otherwise (last commit 4.5 years ago), and I don't expect anyone to start using micro versions even if it does get revived, which is why I chose it.

{% set commit = "f1998b72a941d1e5f9578a66dc1c20b01913caab" %}

package:
name: revtok
version: {{ version }}

source:
url: https://github.com/jekbradbury/revtok/archive/{{ commit }}.tar.gz
sha256: a7447fefb44fbe46140bfc337c6ec756b869c37f737fd18eaec1293d15865b8f

build:
number: 0
noarch: python
script: {{ PYTHON }} -m pip install . --no-deps -vv

requirements:
host:
- python >=3.6
- pip
run:
- python >=3.6
- tqdm
test:
imports:
- revtok

about:
home: https://github.com/jekbradbury/revtok
license: MIT
license_file: LICENSE
summary: Reversible tokenization in Python.
dev_url: https://github.com/jekbradbury/revtok

extra:
recipe-maintainers:
- h-vetinari
29 changes: 29 additions & 0 deletions recipes/torchtext/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
BSD 3-Clause License

Copyright (c) James Bradbury and Soumith Chintala 2016,
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
89 changes: 89 additions & 0 deletions recipes/torchtext/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
{% set version = "0.13.1" %}
{% set spacy_model = "en_core_web_sm" %}

package:
name: torchtext
version: {{ version }}

source:
url: https://github.com/pytorch/text/archive/refs/tags/v{{ version }}.tar.gz
sha256: 1f7de1fd8c6303ea13ef2aed48a6df67df5f92d1c4a6918253be7decd93753be
patches:
# don't build pytorch/text/tree/main/third_party; pull in from conda-forge
- patches/0001-do-not-build-third_party-libs.patch
# make sure we install from $PREFIX and into $SP_DIR/torchtext
- patches/0002-fix-some-CMake-arguments-for-our-infrastructure.patch
# remove spurious run requirements from installation requirements
- patches/0003-remove-unnecessary-installation-requirements.patch
- patches/0004-load-library-from-correct-place.patch
- patches/0005-must-use-C-17-to-match-abseil.patch

build:
number: 0
# no pytorch on windows in conda-forge, see
# https://github.com/conda-forge/pytorch-cpu-feedstock/issues/32
skip: true # [win]
script: {{ PYTHON }} -m pip install . --no-deps -vv
rpaths:
- lib/
- {{ SP_DIR }}/torch/lib
- {{ SP_DIR }}/torchtext/lib

requirements:
build:
- {{ compiler('cxx') }}
- sysroot_linux-64 2.17 # [linux64]
- cmake
- ninja
- pkg-config
host:
- python
- pip
- numpy
- pytorch
# from pytorch/text/tree/main/third_party
- double-conversion
- libsentencepiece
- libutf8proc
- re2
run:
- python
- nltk
- requests
- revtok ==0.0.3.1
- sacremoses
- spacy
- tqdm

test:
requires:
- pip
- pytest
- expecttest
- parameterized
- spacy-model-{{ spacy_model }}
- torchdata
source_files:
- test/
imports:
- torchtext
- torchtext.datasets
- torchtext.data
- torchtext.nn
- torchtext.vocab
commands:
- pip check
# then run test suite
- pytest test/ -v

about:
home: https://pytorch.org/text
license: BSD-3-Clause
license_file: LICENSE
summary: Data loaders and abstractions for text and NLP
dev_url: https://github.com/pytorch/text

extra:
recipe-maintainers:
- h-vetinari
- giswqs
53 changes: 53 additions & 0 deletions recipes/torchtext/patches/0001-do-not-build-third_party-libs.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
From 187c7dc6a0a1a7ee2d87f4be7aae5a7ed1ff5a30 Mon Sep 17 00:00:00 2001
From: "H. Vetinari" <h.vetinari@gmx.com>
Date: Wed, 1 Dec 2021 21:07:42 +1100
Subject: [PATCH 1/5] do not build third_party libs

---
CMakeLists.txt | 5 ++++-
torchtext/csrc/CMakeLists.txt | 8 --------
2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1ead15d4..fe9f9636 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -63,5 +63,8 @@ endif()
# TORCH_CXX_FLAGS contains the same -D_GLIBCXX_USE_CXX11_ABI value as PyTorch
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall ${TORCH_CXX_FLAGS}")

-add_subdirectory(third_party)
+find_package(re2 REQUIRED)
+find_package(double-conversion REQUIRED)
+find_package(sentencepiece REQUIRED)
+
add_subdirectory(torchtext/csrc)
diff --git a/torchtext/csrc/CMakeLists.txt b/torchtext/csrc/CMakeLists.txt
index 037f814d..658b9034 100644
--- a/torchtext/csrc/CMakeLists.txt
+++ b/torchtext/csrc/CMakeLists.txt
@@ -24,10 +24,6 @@ set(
set(
LIBTORCHTEXT_INCLUDE_DIRS
${PROJECT_SOURCE_DIR}
- ${PROJECT_SOURCE_DIR}/third_party/sentencepiece/src
- $<TARGET_PROPERTY:re2,INCLUDE_DIRECTORIES>
- $<TARGET_PROPERTY:double-conversion,INCLUDE_DIRECTORIES>
- $<TARGET_PROPERTY:utf8proc,INCLUDE_DIRECTORIES>
${TORCH_INSTALL_PREFIX}/include
${TORCH_INSTALL_PREFIX}/include/torch/csrc/api/include
)
@@ -123,10 +119,6 @@ if (BUILD_TORCHTEXT_PYTHON_EXTENSION)
set(
EXTENSION_INCLUDE_DIRS
${PROJECT_SOURCE_DIR}
- ${PROJECT_SOURCE_DIR}/third_party/sentencepiece/src
- $<TARGET_PROPERTY:re2,INCLUDE_DIRECTORIES>
- $<TARGET_PROPERTY:double-conversion,INCLUDE_DIRECTORIES>
- $<TARGET_PROPERTY:utf8proc,INCLUDE_DIRECTORIES>
${TORCH_INSTALL_PREFIX}/include
${TORCH_INSTALL_PREFIX}/include/torch/csrc/api/include
)
--
2.38.1.windows.1

Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
From 630e2ae4ee1e921cc4c8c264a12790ea7e21abf0 Mon Sep 17 00:00:00 2001
From: "H. Vetinari" <h.vetinari@gmx.com>
Date: Wed, 18 Jan 2023 16:56:57 +1100
Subject: [PATCH 2/5] fix some CMake arguments for our infrastructure

---
build_tools/setup_helpers/extension.py | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/build_tools/setup_helpers/extension.py b/build_tools/setup_helpers/extension.py
index 1f7236e4..71261577 100644
--- a/build_tools/setup_helpers/extension.py
+++ b/build_tools/setup_helpers/extension.py
@@ -60,12 +60,13 @@ class CMakeBuild(build_ext):
cfg = "Debug" if self.debug else "Release"

cmake_args = [
+ "-GNinja",
f"-DCMAKE_BUILD_TYPE={cfg}",
- f"-DCMAKE_PREFIX_PATH={torch.utils.cmake_prefix_path}",
- f"-DCMAKE_INSTALL_PREFIX={extdir}",
+ f"-DCMAKE_PREFIX_PATH={os.environ['PREFIX']}",
+ f"-DCMAKE_INSTALL_PREFIX={os.environ['SP_DIR'] + '/torchtext'}",
"-DCMAKE_VERBOSE_MAKEFILE=ON",
f"-DPython_INCLUDE_DIR={distutils.sysconfig.get_python_inc()}",
- f"-DTORCH_INSTALL_PREFIX:STRING={os.path.dirname(torch.__file__)}",
+ f"-DTORCH_INSTALL_PREFIX:STRING={os.environ['SP_DIR'] + '/torch'}",
"-DBUILD_TORCHTEXT_PYTHON_EXTENSION:BOOL=ON",
"-DRE2_BUILD_TESTING:BOOL=OFF",
"-DBUILD_TESTING:BOOL=OFF",
@@ -75,9 +76,6 @@ class CMakeBuild(build_ext):
]
build_args = ["--target", "install"]

- # Default to Ninja
- if "CMAKE_GENERATOR" not in os.environ or platform.system() == "Windows":
- cmake_args += ["-GNinja"]
if platform.system() == "Windows":
import sys

--
2.38.1.windows.1

Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
From 55627e85bccfccc9ad9344556e52f6b9a178034c Mon Sep 17 00:00:00 2001
From: "H. Vetinari" <h.vetinari@gmx.com>
Date: Wed, 18 Jan 2023 22:06:33 +1100
Subject: [PATCH 3/5] remove unnecessary installation requirements

---
setup.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/setup.py b/setup.py
index 080415f7..d3ca73d2 100644
--- a/setup.py
+++ b/setup.py
@@ -86,7 +86,7 @@ setup_info = dict(
description="Text utilities and datasets for PyTorch",
long_description=read("README.rst"),
license="BSD",
- install_requires=["tqdm", "requests", pytorch_package_dep, "numpy"],
+ install_requires=[pytorch_package_dep, "numpy"],
python_requires=">=3.7",
classifiers=[
"Programming Language :: Python :: 3.7",
--
2.38.1.windows.1

Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
From 973b89da3291944637807f9908bc50bd0f02f772 Mon Sep 17 00:00:00 2001
From: "H. Vetinari" <h.vetinari@gmx.com>
Date: Wed, 18 Jan 2023 22:25:03 +1100
Subject: [PATCH 4/5] load library from correct place

---
torchtext/_extension.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/torchtext/_extension.py b/torchtext/_extension.py
index b6dbb07b..e6205be1 100644
--- a/torchtext/_extension.py
+++ b/torchtext/_extension.py
@@ -4,7 +4,7 @@ from pathlib import Path
import torch
from torchtext._internal import module_utils as _mod_utils

-_LIB_DIR = Path(__file__).parent / "lib"
+_LIB_DIR = Path(os.environ["SP_DIR"]) / "torch" / "lib"


def _get_lib_path(lib: str):
--
2.38.1.windows.1

25 changes: 25 additions & 0 deletions recipes/torchtext/patches/0005-must-use-C-17-to-match-abseil.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
From e4712395289276ec81a17a3367e42ab77eb7b8cd Mon Sep 17 00:00:00 2001
From: "H. Vetinari" <h.vetinari@gmx.com>
Date: Fri, 20 Jan 2023 17:16:20 +1100
Subject: [PATCH 5/5] must use C++17 to match abseil

---
CMakeLists.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index fe9f9636..ab29a6da 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -27,7 +27,7 @@ if(env_cxx_standard GREATER -1)
"PyTorch requires -std=c++14. Please remove -std=c++ settings in your environment.")
endif()

-set(CMAKE_CXX_STANDARD 14)
+set(CMAKE_CXX_STANDARD 17)
set(CMAKE_C_STANDARD 11)

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
--
2.38.1.windows.1