Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make torchtext available on conda-forge #2023

Closed
giswqs opened this issue Jan 13, 2023 · 30 comments
Closed

Make torchtext available on conda-forge #2023

giswqs opened this issue Jan 13, 2023 · 30 comments

Comments

@giswqs
Copy link

giswqs commented Jan 13, 2023

🚀 Feature

Motivation

Although torchtext is available on the Anaconda pytorch channel, it is not yet available on the conda-forge channel. Downstream packages of pytorch can't be made available on conda-forge until torchtext has been published on conda-forge.

Pitch

Making torchtext available on conda-forge will benefit downstream packages and numerous conda-forge users.

Alternatives

I am not aware of any alternatives.

Additional context

Relevant issue: autogluon/autogluon#612

@arturdaraujo
Copy link

arturdaraujo commented Jan 13, 2023

I fully agree! If any help is needed I'm here. Just a reminder that you only need to publish once, because the updates are automatically done. It's worth entering the conda-forge ecosystem and it has more than 20k packages

@arturdaraujo
Copy link

I'm trying to get a recipe using https://www.marcelotrevisani.com/grayskull which I use to get every recipe and I don't see why it doesn't work for this package.

@arturdaraujo
Copy link

@Nayef211 @parmeet @mthrok @osalpekar

The Conda-forge community is trying to publish this package, could any of you help us figure this out?

@arturdaraujo
Copy link

@ezyang @malfet @zou3519 @jerryzh168 @peterbell10

Sorry for mentioning you guys but you seem to be the main maintainers of PyTorch package, maybe any of you could give us some light on how to publish this on conda-forge. We are trying to publish torchtext as mentioned in autogluon/autogluon#612 (comment) for a while and any help here would be great.

We are trying to publish a package called autogluon and we were able to publish 11 dependencies packages, being torchtext the last one that we got stuck with. It seems awfully hard without help from one of the maintainers. The conda-forge community would really appreciate some support.

@arturdaraujo
Copy link

@giswqs comment from autogluon/autogluon#612 (comment)

torchtext is probably the most challenging one to resolve. I have opened an issue at conda-forge/staged-recipes#21740

torchtext is available on the pytorch channel, but not on the conda-forge channel. Unfortunately, the conda-forge recipe is not allowed to use the pytorch channel.
https://anaconda.org/pytorch/torchtext

Some of the conda recipes used to build pytorch are available at https://github.com/pytorch/builder/tree/main/conda.
It is quite complicated.

I am stuck on the torchtext recipe. Any help would be appreciated.

@malfet
Copy link
Contributor

malfet commented Jan 16, 2023

[edit] I see @giswqs already opened a PR that adds it to the forge

@arturdaraujo
Copy link

arturdaraujo commented Jan 16, 2023

That would be great! This is the main documentation about adding packages, but this is not very specific. https://conda-forge.org/docs/maintainer/adding_pkgs.html

I bet @giswqs could enlighten us here, but if he can't, I will talk to one of the maintainers from conda-forge who are more familiar with the details of this kind of transition to make torchtext easier to publish it.

@arturdaraujo
Copy link

[edit] I see @giswqs already opened a PR that adds it to the forge

That pr is about trying to merge to conda forge, but still a long way to go. In the documentation I sent there is a section about feedstock structure and another about publishing, which should give you an idea. We need to build a recipe using requirements but at the moment we can't do that due to the current structure, which makes it harder.

We are still trying figuring it out, but thanks for reaching out

@mthrok
Copy link
Contributor

mthrok commented Jan 17, 2023

Don't we have to first change the build process so that it supports building on top of third party libraries from separate package?

cc #1615

@giswqs
Copy link
Author

giswqs commented Jan 17, 2023

Yes, that's where we need help. Any guidance for changing the build process to support third party libraries?

Don't we have to first change the build process so that it supports building on top of third party libraries from separate package?

cc #1615

@giswqs
Copy link
Author

giswqs commented Jan 17, 2023

Can someone share the recipe that was used to build torchtext on the pytorch channel on Anaconda? I guess it would be a similar recipe for building it on conda-forge.

https://anaconda.org/pytorch/torchtext

@mthrok
Copy link
Contributor

mthrok commented Jan 17, 2023

Can someone share the recipe that was used to build torchtext on the pytorch channel on Anaconda? I guess it would be a similar recipe for building it on conda-forge.

https://anaconda.org/pytorch/torchtext

It's in packaging directory.

https://github.com/pytorch/text/tree/main/packaging/torchtext

@mthrok
Copy link
Contributor

mthrok commented Jan 17, 2023

Yes, that's where we need help. Any guidance for changing the build process to support third party libraries?

Don't we have to first change the build process so that it supports building on top of third party libraries from separate package?
cc #1615

Third party libraries are all configured here.
https://github.com/pytorch/text/blob/main/third_party/CMakeLists.txt

I think introducing the switch to alternate the behavior between build-from-source and search-existing-installation, then actually implementing the search mechanism is one way to achieve that.

@giswqs
Copy link
Author

giswqs commented Jan 17, 2023

@mthrok @malfet Would you be willing to be listed as a maintainer of the torchtext recipe on conda-forge? I need your help to make the recipe acceptable by conda-forge.

@giswqs
Copy link
Author

giswqs commented Jan 17, 2023

It appears that the recipe has some build scripts outside the torchtext recipe folder. For the conda-forge recipe, we can only add files within the recipe folder. We can't modify files outside the recipe folder.

image

@arturdaraujo
Copy link

arturdaraujo commented Jan 17, 2023

That's the main issue for not being able to publish torchtext package in conda-forge @mthrok

@mthrok
Copy link
Contributor

mthrok commented Jan 17, 2023

@arturdaraujo @giswqs

@mthrok @malfet Would you be willing to be listed as a maintainer of the torchtext recipe on conda-forge? I need your help to make the recipe acceptable by conda-forge.

I am supportive of this direction but I don't really know about how conda recipe works, so I don't think I am good fit for recipe maintainer.

It appears that the recipe has some build scripts outside the torchtext recipe folder. For the conda-forge recipe, we can only add files within the recipe folder. We can't modify files outside the recipe folder.

Those are files to setup environment in PyTorch CI, including downloading and installing Conda. And these scripts call the final conda build process. I imagine conda feed-stack CI infra already has equivalent setup mechanism. So that should not be a big issue. (but I can be wrong here)

That's the main issue for not being able to publish torchtext package in conda-forge

Looking at PyTorch CMakeLists.txt, it can be as simple as defining the option USE_SYSTEM_XXX, then call find_library when the option is enabled.

https://github.com/pytorch/pytorch/blob/master/cmake/Dependencies.cmake#L136-L141

Couple of questions

  • How to know if this approach works when USE_SYSYEM_XXX=true? (How do you test conda build with 3rd parties pulled from other conda packages?)
  • Where does conda feedstock pull the source code? (If we land the change on main branch, would you use it now?)

@mthrok
Copy link
Contributor

mthrok commented Jan 17, 2023

cc @Nayef211 @joecummings

@arturdaraujo
Copy link

arturdaraujo commented Jan 17, 2023

Once the package is published, most of the updates are done automatically. Except when you add new packages or remove them, however, this can easily be done by creating a PR that changes the recipe. It literally takes 2 to 3 minutes and once published, others can (and will) help to fix it if it has any issues...

The conda-forge community is very supportive and that's why it has more than 20.000 packages. It's easy to manage the updates because most of them are automatically

@arturdaraujo
Copy link

arturdaraujo commented Jan 17, 2023

Many times the structure of the package already published in pypi already allows conda-forge to be published in conda-forge channel by just using greyskull (package to create conda forge recipes).

It seems that we have to manually do that with torchtext, but I'm not sure we even can... that's why we are asking for help here

@giswqs
Copy link
Author

giswqs commented Jan 17, 2023

@mthrok

How to know if this approach works when USE_SYSYEM_XXX=true? (How do you test conda build with 3rd parties pulled from other conda packages?)

I don't have experience in this as I have not created a recipe with third party libraries before. Here is the conda-forge documentation regarding third party libraries.

Where does conda feedstock pull the source code? (If we land the change on main branch, would you use it now?)

Conda-forge can either pull the source from PyPI or GitHub. See the recipe here. For torchtext, I pulled the source code from the latest release https://github.com/pytorch/text/archive/refs/tags/v{{ version }}.tar.gz

@mthrok
Copy link
Contributor

mthrok commented Jan 18, 2023

I briefly looked into this, the general direction I think that can work is

  1. list requirements libsentencepiece, libutf8proc, re2, and double-conversion.
  2. Introduce the USE_SYSTEM_XXX switch to CMakeLists.txt.
  3. For each dependency, when USE_SYSTEM_XXX evaluates to True, locate the library .
    • for this CMake functions like find_package, find_library and find_path can be used.

For testing, one can simply install the dependencies from 1 via conda install -c conda-forge and
edit CMakeLists.txt until it works.

I do not have bandwidth to own this now. I think the ideal situation is that the logic for USE_SYSTEM_XXX
is developed by OSS community.

@arturdaraujo
Copy link

arturdaraujo commented Jan 18, 2023

@h-vetinari do you think that would be helpful for conda-forge/staged-recipes#17129? mthrok is one of the maintainers of pytorch that is helping us to deal with this

@h-vetinari
Copy link

@h-vetinari do you think that would be helpful for conda-forge/staged-recipes#17129? mthrok is one of the maintainers of pytorch that is helping us to deal with this

I already mentioned this in conda-forge/staged-recipes#21740:

[We] can also do this without having to patch things in upstream torchtext. It could start out as local patches in conda-forge that eventually get upstreamed (carrying patches is more of a last resort, but often a reality).

So yes, the @mthrok's proposal would be very helpful, but currently I'm focusing on debugging the whole build end-to-end, and then we can think about upstreaming patches in a way that's both acceptable for the project here and helpful for our packaging.

I'm pretty getting pretty far in conda-forge/staged-recipes#17129 now, currently running into a problem I commented about in #1886 (interestingly, GH does not create cross-references for links in a review, it seems...)

@giswqs
Copy link
Author

giswqs commented Jan 20, 2023

torchtext is now available on conda-forge. A big thank you to @h-vetinari for his tireless efforts in getting this published!!!
https://anaconda.org/conda-forge/torchtext

@giswqs giswqs closed this as completed Jan 20, 2023
@h-vetinari
Copy link

If anyone is interested, I pushed a branch with the changes that were necessary to get this to work for conda-forge.

There's several things around (not) building submodules, which are obviously not meant as patches that should be upstreamed directly (this is the part that @mthrok's proposal would cover much better by having some CMake switches like USE_SYSTEM_xxx).

There's other adaptations to our infrastructure that are not necessarily suited to this repository (yet), but perhaps someone is interested in taking those and banging them into shape. Not sure I have much time to do it myself, but anyone's free to take my patches (and I've signed the contributor license agreement already for other meta projects, so that should be fine).

@h-vetinari
Copy link

Also worth noting, we don't have support on windows yet, because windows support for pytorch in conda-forge is still missing, primarily due to a combination of CI constraints & consequent maintenance constraints.

@Nayef211
Copy link
Contributor

If anyone is interested, I pushed a branch with the changes that were necessary to get this to work for conda-forge.

There's several things around (not) building submodules, which are obviously not meant as patches that should be upstreamed directly (this is the part that @mthrok's proposal would cover much better by having some CMake switches like USE_SYSTEM_xxx).

There's other adaptations to our infrastructure that are not necessarily suited to this repository (yet), but perhaps someone is interested in taking those and banging them into shape. Not sure I have much time to do it myself, but anyone's free to take my patches (and I've signed the contributor license agreement already for other meta projects, so that should be fine).

@h-vetinari thanks for calling this out. Would you mind creating an issue to help us keep track of this? Also giving a very brief summary of the the things that were done differently in your branch (and their benefits) which could be adopted in the torchtext repo would be immensely helpful when we look into this in the near future! :)

@mthrok
Copy link
Contributor

mthrok commented Jan 24, 2023

Note that we can upstream the C++ 17 patch immediately and we should be doing that before 2.0 release branch cut as PyTorch raised the requirement to C++17.

See also: pytorch/audio#2973

@Nayef211
Copy link
Contributor

Note that we can upstream the C++ 17 patch immediately and we should be doing that before 2.0 release branch cut as PyTorch raised the requirement to C++17.

See also: pytorch/audio#2973

Done #2031

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants