Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[eroprofile] Update regexes #23626

Closed
wants to merge 4 commits into from

Conversation

YetAnotherGhost
Copy link

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

It updates the regular expressions following the eroprofile.com site update pointed out in #23200. The title is hopefully stable as long as they stick to <h1> elements; the thumbnail is a bit more fragile, but at least it's optional. (I could only find it in one place on the page, in the playlist bar above the video, together with other thumbnails.)

Note that the first video in _TESTS appears to no longer exist, but I assume that's unrelated.

@dstftw
Copy link
Collaborator

dstftw commented Jan 4, 2020

Keep old regexes as fallbacks.

(Should be squashed into the previous commit)
@YetAnotherGhost
Copy link
Author

Done (as separate commit, since I'm using the web UI). I hope the lines aren't too long.

thumbnail = self._search_regex(
r'onclick="showVideoPlayer\(\)"><img src="([^"]+)',
[r'<div class="playlistItem current">.*<img src="([^"]+)"', r'onclick="showVideoPlayer\(\)"><img src="([^"]+)'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part in the middle should not be greedy. Also use _parse_html5_media_entries as main path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part in the middle should not be greedy.

Done, thanks.

Also use _parse_html5_media_entries as main path.

I'm not sure what you mean… but this change doesn't touch the video_url so far?

(Should be squashed into the previous commits)
(Should be squashed into the previous commits)
@YetAnotherGhost
Copy link
Author

What’s the status on this? I’m done at the moment, as far as I’m concerned – if you’re waiting for anything else from me, let me know what’s still missing before merging this.

@dstftw
Copy link
Collaborator

dstftw commented Jan 10, 2020

I've already pointed out to use _parse_html5_media_entries as main path for formats and thumbnail extraction.

@YetAnotherGhost
Copy link
Author

Yes, and I already said (or tried to say, I guess it wasn’t clear) that a) I wasn’t sure what you meant by that (the “for formats and thumbnail extraction” makes it a bit clearer, thanks), and b) if it means what I think it means, I don’t think it belongs in this pull request, since it’s unrelated beyond affecting the same extractor.

I tried to implement this in YetAnotherGhost/youtube-dl#1 (was supposed to be a pull request against this repository but I didn’t pay attention when creating the pull request; can be moved later). Is it more or less what you were thinking of?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants