-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import collections #1012
Comments
Hi @snickers2k - thanks for the request: let's see what we can figure out. Currently, as you may have realized/figured out already, We do have a scraper implemented for individual @hhursev @jknndy @strangetom @brett do you have any thoughts on whether we could/should support retrieval of collections of recipes? I lean towards 'no', and could explain more about that, but my perspective could be different from others. Linking to TandoorRecipes/recipes#3017 to keep a relationship between the two issue reports. |
I think this could be a cool function to add but it definitely feels outside the scope of the core usage. A possible implementation would be to specify some specific format or header / something in the html that switches over to returning |
I agree that it feels a bit out of scope, but I think it's worth investigating. There might also be an opportunity to combine this with handling to web pages with more than one recipe, which has come up a few times before. I can imagine something along the lines of def scrape_collection(url: str, **options: Any) -> List[Scraper]:
# Check the url is for a site that we support scraping collections for and raise Exception if not.
# Do some magic to scrape collection To determine if a scraper supports scraping collections, we could add a class method to |
I think we should be cautious about this. In fact, @jknndy - I'd suggest pausing until we check a few more details. In particular I have a memory about copyright law as it relates to recipes; basically that individual recipes are generally not copyrightable, but that the process of selecting and making available a number of recipes (like in cookbooks) could be. With that in mind, I want to be careful so that we don't provide functionality that could get either us or our users into murky legal territory. This page from Y2002 is what I'm reading currently, although there may be more recent rulings / precedents: https://www.everything2.com/title/US+Copyright+for+Recipes |
I don't think I'm enough of an expert in this to provide any definitive answers. I'll seek some expert/professional advice that I can share here about what the acceptable boundaries are, and from there we can figure out whether it's sensible to apply any restrictions (annoying though that could be). That could get complicated if any of the guidance is jurisdiction-specific, but I'll also try to find common universally-acceptable baselines. |
I haven't yet found that advice, but do still intend to; while preparing for that I'll write up some guidelines about how I think about some of the valid, intended and potentially more dubious/controversial use cases that could be possible in future, and how those affect the way I think about feature suggestions and also how I review code. Those will be my opinions but perhaps having a single place to view and adjust those will help build gradual consensus and refer back to for questions. |
Here's what I've put together so far. One side-effect of this is: no, I don't think we should support collections of recipes. However, all just my opinion and perspective on the library. It needs some references/citations to be added if it is to become anything like a documented policy. Proposed Scraper development guidelinesAbout me / introduction
Goals of these guidelinesPrimarily: give recipe-scrapers the best chance of continuing to develop and thrive as a project. To do that, provide reasoning and guidelines for:
...in order to give us the best chance of maintaining a healthy relationship with content creators, part of which I believe involves being transparent and justifying how the library works and is developed. The library has been successful so far and I expect that our contributors each have their own unstated and slightly-varying ideas about these areas already. Writing the guidelines down should allow us to debate them, refer to them during code review / development / discussion, and adjust them as times change or when problems are encountered. Concerns / risksI believe that the main risks for the project are that we may infringe copyright unfairly ourselves, or that we may create circumstances that make it unreasonably straightforward for other entities to infringe copyright unfairly. Those circumstances would negatively affect recipe authors, and so I believe that our guidelines should reassure recipe authors about the way that we handle their recipe pages, while also providing a framework that contributors and maintainers can refer to (for example, when trying to decide whether a recipe website should be supported, or during decisions about how/whether to support specific fields on a scraper). To a certain extent, food recipes -- lists of ingredients and a description of how to prepare them -- especially those that do not include any unpublished trade secrets -- are generally not protectable using copyright law in the legal systems I'm aware of. However, some web recipe authors do earn income from their websites, and so they have a reason to want to protect their recipes, and could reasonably be concerned about code that can provide access to them. Advice can be found on the web about ways to create recipes that make them more likely to be copyrightable - some of this explains, for example, that photography and imagery is copyrightable, and some of it explains that it is possible to add distinct written elements in or around the core of the recipe to improve the chance that legal proceedings would consider it copyrightable. In addition, there is historical precedent that although a typical individual recipe may not be copyrightable, cookbooks -- that is, multiple recipes that have been collected, curated and published together -- can be. Geographic considerationsImplementation of copyright law and exceptions to it vary by location (jurisdiction), and we have contributors and downstream library consumers who we can reasonably expect could be almost anywhere in the world (with perhaps a small number of exceptions, based on our source repository and packaging distribution host providers). Proposed GuidelinesContent
Functionality
Acceptance Criteria
|
Took this request from tandoor, as they're using recipe-scrapers for this.
Is your feature request related to a problem? Please describe.
would be great to have the ability to import whole collections into
tandoorrecipe-scrapers for supported siteshttps://www.chefkoch.de/rezeptsammlung/896538/Auflaeufe.html
for example
Describe the solution you'd like
"one-click" import for collections
Describe alternatives you've considered
importing hundreds of favorite recipes by "hand" (tandoor bookmarklet)
thanks
The text was updated successfully, but these errors were encountered: