Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GlobalPlayer,Whyp,DLF, Clipchamp] Add new extractors back-ported from yt-dlp #32138

Merged
merged 11 commits into from
Jul 19, 2023

Conversation

dirkf
Copy link
Contributor

@dirkf dirkf commented May 3, 2023

Boilerplate: yt-dlp code, new extractor+improvement ## Please follow the guide below
  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code, except as below and I am willing to release it under Unlicense
  • Where I am not the original author of this code it was released under the same terms at https://github.com/yt-dlp.

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This PR adds four new extractors back-ported from yt-dlp, together with some core improvements to support them.

Supporting improvements:

  • update traverse_obj() to support traversal of Iterables, sets and re.Matches (aka compat_re_Match): thanks @Grub4K; add convenience function T() as a (slight) abbreviation of set((x,)) since literal set {x} isn't in Py2.6
  • improve js_to_json(): thanks @ChillingPepper, @Grub4K, @pukkandan; add limited support for ! evaluation (eg, !!0 -> false)
  • support multiple groups in group argument of InfoExtractor._search_regex()`, etc
  • add keyword arguments to merge_dicts(): unblank=True to disallow empty string, rev=False to reverse the order of merge_dict's args list; rev=True matches {**dict1, **dict2, ...} (not in Py 2)
  • add search methods for Next/Nuxt.js from yt-dlp to InfoExtractor with tests
  • fix HTML5 type recognition and tests, from yt-dlp/yt-dlp@222a230, thanks @Lesmiscore
  • add _match_valid_url() class method to InfoExtractor and refactor
  • support Sequence of patterns in _VALID_URL.

An immediate compat fix to traverse_obj() is also included: fixes #32456.

@dirkf dirkf changed the title Df extractor miscbp 202305 [GlobalPlayer,Whyp,DLF] Add new extractors back-ported from yt-dlp May 3, 2023
youtube_dl/utils.py Show resolved Hide resolved
youtube_dl/utils.py Show resolved Hide resolved
youtube_dl/utils.py Outdated Show resolved Hide resolved
@dirkf dirkf changed the title [GlobalPlayer,Whyp,DLF] Add new extractors back-ported from yt-dlp [GlobalPlayer,Whyp,DLF, Clipchamp] Add new extractors back-ported from yt-dlp May 3, 2023
@dirkf

This comment was marked as resolved.

youtube_dl/extractor/whyp.py Outdated Show resolved Hide resolved
youtube_dl/utils.py Outdated Show resolved Hide resolved
Thanks Grub4k for these:
* traverse `Iterable`s, from yt-dlp/yt-dlp#6902, etc
* traverse `set` key for transformations/filters, `re.Match` group names, from
  yt-dlp/yt-dlp@776995b, etc
* traverse `re.Match`es, from yt-dlp/yt-dlp#5174
* always return list when branching, from yt-dlp/yt-dlp#5170
* support variable substitution, from https://github.com/yt-dlp/yt-dlp/pull/#521 etc,
  thanks ChillingPepper, Grub4k, pukkandan
* improve escape handling, from https://github.com/yt-dlp/yt-dlp/pull/#521
  thanks Grub4k
* support template strings from yt-dlp/yt-dlp#6623
  thanks Grub4k
* add limited `!` evaluation (eg, !!0 -> false, see tests)
@dirkf dirkf force-pushed the df-extractor-miscbp-202305 branch 4 times, most recently from e3635f1 to c08f740 Compare July 19, 2023 17:25
A couple of mods to ease yt-dlp back-ports:
* add kwargs to merge_dicts:
  `unblank=True` (disallow empty string), `rev=False` (reverse the merge list)
* add `T(x)` shortcut for `{x}`, unsupported in Py2.6
* add _search_nextjs_data(), from yt-dlp/yt-dlp#1386
  thanks selfisekai
* add _search_nuxt_data(), from yt-dlp/yt-dlp#1921,
  thanks Lesmiscore, pukkandan
* add tests for the above
* also fix HTML5 type recognition and tests, from
  yt-dlp/yt-dlp@222a230,
  thanks Lesmiscore
* update extractors in PR using above, fix tests.
* inspect.getargspec is missing despite doc claiming backward compat
* replace with emulation of `Signature.bind()`
* API compatible with yt-dlp
* also support Sequence of patterns in _VALID_URL
* one place to compile _VALID_URL
* TODO: remove existing extractor shims
@dirkf dirkf force-pushed the df-extractor-miscbp-202305 branch from c08f740 to 4dae5c2 Compare July 19, 2023 17:42
@dirkf dirkf merged commit b2ba24b into ytdl-org:master Jul 19, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inspect.getargspec() is removed in Python 3.11 GlobalPlayer
2 participants