-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception with 6.0.7: "This should be implemented". #148
Comments
Thanks for the report @synergiator; the allrecipe scraper should be picking up the necessary data from schema.org metadata in the page and returning it to us, but it looks like that isn't happening here. Do you have any time to dig into why the code falls into this path for the example recipe? |
Hi there @jayaddison, I can now share some facts what I could find so far.
The above test was in a fresh Python virtual environment with just the package and its requirements installed. Then, I've tried same test from the path where all other tests are, in a corresponding recipe-scrapers development virtual environment. No error. This behavior gives me an impression there is some wrongly coded dynamic loader / dependency injection or similar (just a guess), but for time being I can't tell more and better pass the word to the authors of this code. :-) |
Thanks for the analysis @synergiator - this leads me to believe we have a regression somewhere between 6.0.7 and the 5.x.x branch (latest release 5.18.0). The largest change between the two is the introduction of schema.org handlers taking priority. Essentially the code will look for a schema.org (or JSON-LD) metadata property on the target page, and only if it does not find it will it revert to using the It seems that the example Apple Cake recipe doesn't contain such metadata, and so the code tries to invoke the AllRecipes extraction methods -- but then finds that they don't exist either, so it finally raises the I've no sense for how many AllRecipes pages contain vs. do not contain schema.org properties, unfortunately, so I don't know how many scrapes this would affect. There are a few options here:
It could be a while before we get to item 3. Let me know if this helps you out short-term :) |
Hi, there, @jayaddison, thanks for the explanation! In the meantime I've found out there is a quite large dataset from MIT with about 1M recipes so I don't need currently to scrape by myself. Regarding the issue, it's strange enough same site uses different templates to render recipes. |
@synergiator version 6.1.0 is updated and will handle "gracefully" the exception (defaulting to 0 if there is no information for preparation time found). Thanks for pointing out the issue! The dataset you've linked is awesome! You can take a look at this issue for another set of recipes w/o needing to scrape a thing ;) |
hi there @hhursev! thank you for the fix, and the link to the other data set! |
Installed the packege 6.0.7 and tried with Python 3.6 following the README file:
Error:
The text was updated successfully, but these errors were encountered: