Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package torchtext #17129

Merged
merged 14 commits into from
Jan 20, 2023
Merged

package torchtext #17129

merged 14 commits into from
Jan 20, 2023

Conversation

h-vetinari
Copy link
Member

came up in conda-forge/pytorch-lightning-feedstock#76

Hopefully there is an end to all the pytorch flavours at some point 😅

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/revtok, recipes/torchtext) and found it was in an excellent condition.

Comment on lines +1 to +4
# upstream did not publish any tags, and a performance-critical
# fix appears after the last bump (for 0.0.3). This is what e.g.
# torchtext uses upstream (installing through git), so we add a ".1"
{% set version = "0.0.3.1" %}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for this is https://github.com/pytorch/text/blob/v0.11.0-rc3/requirements.txt#L11, and wanting not to leave out a potentially very substantial performance improvement from jekbradbury/revtok#4 (the PR notes 100-fold improvement by interning strings in some cases), which is fact the only change after 0.0.3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to patch, just to reference the pypi version but ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean use 0.0.3 and carry a patch for jekbradbury/revtok#4? I don't mind either way. That repo looks pretty dead otherwise (last commit 4.5 years ago), and I don't expect anyone to start using micro versions even if it does get revived, which is why I chose it.

@ngam
Copy link
Contributor

ngam commented Dec 1, 2021

@h-vetinari I have a question (I tried building torchtext and torchaudio, and failed; that's why I've been keeping an eye on your progress):

Can we perhaps have a clearer strategy about ecosystems such as pytorch and tensorflow? I think the current setup is really unproductive and it cannot be sustainable with this highly decentralized approach. I can give you a lot of practical examples and problems I ran into over the past few months dealing with pytorch and tensorflow from conda/conda-forge (and, with all the cuda stuff added in, it is a toxic mess).

I have been thinking, it might be time to rethink this and have a strategy for "ecosystem building", rather than "package building", so tensorflow and all its sprawling pieces can be built together more or less and the same for pytorch (we are still massively stuck on torchvision which has a bug that has consequences all over the place).

This will especially make sense for collections of packages which tend to be released together and pinned together (which seems to be the case with tensorflow and pytorch). So maybe we need to rethink this instead of dealing with these packages as isolated items to be built individually?

Copying @hmaarrfk who's been extremely helpful as I started getting involved with this recently.

@h-vetinari
Copy link
Member Author

h-vetinari commented Dec 1, 2021

Hey, thanks or taking an interest!

I have been thinking, it might be time to rethink this and have a strategy for "ecosystem building", rather than "package building", so tensorflow and all its sprawling pieces can be built together more or less and the same for pytorch

I don't think this is necessary. The hardest part is to get the packages built at all, but with the abstractions that conda offers, we have full control over things like rebuilding for ABI-relevant version updates, and this is something that pip simply cannot do well, so we see all the terrifying workarounds upstream.

In other words, IMO the infrastructure is there to do just that kind of "ecosystem building", only distributed over several feedstocks.

It would of course help help hugely if facebook/google would support conda-forge on the whole packaging effort - in both cases the main packages are a real pain to build and therefore a big bottleneck. facebook obviously has done a lot of its own work for the pytorch channel, so might be less inclined to change strategy. google hasn't done much beyond packaging manylinux wheels. In both cases their packages are a pretty unholy mess of vendored third_party code (that we have to get rid of as well as possible), and for google, there's lots of quirks from bazel on top of that.

Long story short: I think we don't need a big change in strategy, but more resources would help a lot, and especially a more aligned upstream maintainership.

we are still massively stuck on torchvision which has a bug that has consequences all over the place

Sorry I haven't had time to look at that. Will see how things go in the next few weeks.

I tried building torchtext and torchaudio, and failed; that's why I've been keeping an eye on your progress

torchtext is done on linux, osx is blocked on conda-forge/sentencepiece-feedstock#25 (which would also help improve linux); torchaudio is blocked on conda-forge/kaldi-feedstock#15.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 2, 2021

I would suggest making two separate PRs. Seems like you ran out of space....

conda.CondaMultiError: [Errno 28] No space left on device

@h-vetinari
Copy link
Member Author

I would suggest making two separate PRs. Seems like you ran out of space....

Nah, there's no point, revtok is tiny. And it's not gonna be a problem once the builds are per-python-verion on the feedstock.

@ngam
Copy link
Contributor

ngam commented Dec 2, 2021

Long story short: I think we don't need a big change in strategy, but more resources would help a lot, and especially a more aligned upstream maintainership.

Fully agree on the upstream mess and resources. I hope you're right about strategy 😃

@stale
Copy link

stale bot commented May 1, 2022

Hi friend!

We really, really, really appreciate that you have taken the time to make a PR on conda-forge/staged-recipes! conda-forge only exists because people like you donate their time to build and maintain conda recipes for use by the community.

In an effort to maintain this repository and increase the signal-to-noise for open PRs, the maintainers of staged-recipes close excessively old PRs after six months. This PR will remain open for another month, and then will be closed.

If you'd like to keep it open, please comment/push and we will be happy to oblige! Note that very old PRs will likely need to be rebased on main so that they can be rebuilt with the most recent CI scripts. If you have any trouble, or we missed reviewing this PR in the first place (sorry!), feel free to ping the team using a special command in a comment on the PR to get the attention of the staged-recipes team.

Cheers and thank you for contributing to this community effort!

@stale stale bot added the stale will be closed in 30 days label May 1, 2022
@h-vetinari
Copy link
Member Author

Not stale.

@stale stale bot removed the stale will be closed in 30 days label May 1, 2022
@stale
Copy link

stale bot commented Oct 1, 2022

Hi friend!

We really, really, really appreciate that you have taken the time to make a PR on conda-forge/staged-recipes! conda-forge only exists because people like you donate their time to build and maintain conda recipes for use by the community.

In an effort to maintain this repository and increase the signal-to-noise for open PRs, the maintainers of staged-recipes close excessively old PRs after six months. This PR will remain open for another month, and then will be closed.

If you'd like to keep it open, please comment/push and we will be happy to oblige! Note that very old PRs will likely need to be rebased on main so that they can be rebuilt with the most recent CI scripts. If you have any trouble, or we missed reviewing this PR in the first place (sorry!), feel free to ping the team using a special command in a comment on the PR to get the attention of the staged-recipes team.

Cheers and thank you for contributing to this community effort!

@stale stale bot added the stale will be closed in 30 days label Oct 1, 2022
@h-vetinari
Copy link
Member Author

Not stale

@stale
Copy link

stale bot commented Nov 12, 2022

Hi again! About a month ago, we commented on this PR saying it would be closed in another month if it was still inactive. It has been a month and so now it is being closed. Thank you so much for making it in the first place and contributing to the community project that is conda-forge. If you'd like to reopen this PR, please feel free to do so at any time!

Cheers and have a great day!

@stale stale bot closed this Nov 12, 2022
@h-vetinari
Copy link
Member Author

@conda-forge/staged-recipes
This was closed by the stale-bot despite me commenting as requested. I've picked up this PR again after conda-forge/sentencepiece-feedstock#32 got merged - I can open a new one, but would be nice if it were reopened. :)

(I also just found out that torchtext now depends on torchaudio, weeee)

@h-vetinari h-vetinari mentioned this pull request Dec 18, 2022
6 tasks
@BastianZim BastianZim reopened this Dec 18, 2022
@stale stale bot removed the stale will be closed in 30 days label Dec 18, 2022
@giswqs giswqs mentioned this pull request Jan 18, 2023
10 tasks
@giswqs
Copy link
Member

giswqs commented Jan 18, 2023

I am trying to publish autogluon on conda-forge. Have been stuck in the torchtext for a few days now. Would love to collaborate with you to get torchtext on conda-forge.

Relevant issues:

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/revtok, recipes/torchtext) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

Have been stuck in the torchtext for a few days now

*looks at when this PR was opened* haha, welcome to the club 🙃

@h-vetinari
Copy link
Member Author

Would love to collaborate with you to get torchtext on conda-forge.

I've invited you as a collaborator on the fork so you'll be able to push commits into this PR. For now I've fixed some obvious issues myself, now that I realized that this doesn't depend on torchaudio after all (I think I got caught out by a comment when I previously tried to rebase the patch that disables the compilation of the third_party libs).

@giswqs
Copy link
Member

giswqs commented Jan 18, 2023

@h-vetinari Thank you for adding me. I have limited knowledge of CMake and packaging third-party libraries into conda-forge recipes. Still learning. Will try my best to contribute.

@giswqs
Copy link
Member

giswqs commented Jan 18, 2023

The error says cmake not available. It needs to be added to build requirements.

https://github.com/conda-forge/staged-recipes/pull/21740/files#diff-a755bd30ff51c30c36d2dbeb961e7e43a4a5e6826d8eea5328e3e1e15d793812R35

@h-vetinari h-vetinari force-pushed the torchtext branch 2 times, most recently from a3484da to 1a5ce24 Compare January 20, 2023 13:34
@h-vetinari
Copy link
Member Author

h-vetinari commented Jan 20, 2023

Since the pytorch 1.13 migration isn't through yet (and torchtext 0.14 seems to need the newest pytorch API), I dropped back to 0.13.1, and this seems to be building OK now (🤞).

Together with conda-forge/torchdata-feedstock#6, I'm planning to get to pytorch 1.13 / torchtext 0.14 ASAP after the feedstock is created.

PTAL @conda-forge/help-python-c

CC @conda-forge/pytorch-cpu @conda-forge/sentencepiece @xhochy @hmaarrfk @ngam

@hmaarrfk
Copy link
Contributor

Awesome work patching their build process to match what we need.

@giswqs
Copy link
Member

giswqs commented Jan 20, 2023

@h-vetinari Thank you very much for your hard work. Excited to see this come through.

So this will use torchtext v0.13.1, which is compatible with pytorch v1.12.1, right?

@h-vetinari
Copy link
Member Author

So this will use torchtext v0.13.1, which is compatible with pytorch v1.12.1, right?

For now yes. As soon as we have the feedstock (and conda-forge/torchdata-feedstock#6 merged), we can let the pytorch 1.13 migrator come through, and couple that with a bump to torchtext 0.14.0 (I now know already which patches are necessary).

@giswqs
Copy link
Member

giswqs commented Jan 20, 2023

So this will use torchtext v0.13.1, which is compatible with pytorch v1.12.1, right?

For now yes. As soon as we have the feedstock (and conda-forge/torchdata-feedstock#6 merged), we can let the pytorch 1.13 migrator come through, and couple that with a bump to torchtext 0.14.0 (I now know already which patches are necessary).

Awesome! Thank you. It would be great to make it compatible with pytorch v1.12.1, as some downstream packages (e.g., autogluon) are still working on upgrading to use pytorch v1.13.1.
autogluon/autogluon#2739

@h-vetinari
Copy link
Member Author

It would be great to make it compatible with pytorch v1.12.1

As I said, this is what we're starting with here, so those consumers should be fine.

@h-vetinari
Copy link
Member Author

Awesome work patching their build process to match what we need.

Thanks! TBH, ~98% of the work for this PR was getting a usable (shared) lib out of sentencepiece. Glad that seems to have been achieved now. 🙃

@hmaarrfk hmaarrfk merged commit 908075e into conda-forge:main Jan 20, 2023
@h-vetinari h-vetinari deleted the torchtext branch January 20, 2023 14:45
@giswqs
Copy link
Member

giswqs commented Jan 20, 2023

Hooray!! @h-vetinari THANK YOU SO MUCH!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants