Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1722303056697 | ERROR | Parsed article missing critical content #53

Open
TrumanXia opened this issue Jul 31, 2024 · 1 comment
Open
Labels
unsupported-site Sites that reject requests from Slurp or use scripts that prevent Slurp from working

Comments

@TrumanXia
Copy link

thank you for the help first

site:https://www.toutiao.com/article/6745730819765043720/

log:
`##### 1722303056697 | ERROR | Parsed article missing critical content

  • Caller: SlurpPlugin.slurp (plugin:slurp:12537:12)
null
1722303071813 | DEBUG | onValidate called, no changes detected
  • Caller: HTMLDivElement.<anonymous> (app://obsidian.md/app.js:1:2995116)
{
  "hash": 1309301853
}
1722303119651 | ERROR | Parsed article missing critical content
  • Caller: SlurpPlugin.slurp (plugin:slurp:12537:12)
null
1722303124069 | DEBUG | onValidate called, no changes detected
  • Caller: t.onOpen (app://obsidian.md/app.js:1:2997362)
{
  "hash": 1309301853
}
1722303283706 | DEBUG | attempting to parse prop metadata
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
{
  "enabled": true,
  "custom": false,
  "_key": "link",
  "_idx": 0,
  "id": "link",
  "metaFields": [
    "url",
    "og:url",
    "parsely-link",
    "twitter:url"
  ],
  "defaultIdx": 0,
  "defaultKey": "link",
  "description": "Page URL provided or a permalink discovered in metadata."
}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"url"
"meta[name=\"url\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"og:url"
"meta[name=\"og:url\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"parsely-link"
"meta[name=\"parsely-link\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"twitter:url"
"meta[name=\"twitter:url\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283706 | DEBUG | attempting to parse prop metadata
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
{
  "enabled": true,
  "custom": false,
  "_key": "byline",
  "_idx": 1,
  "id": "byline",
  "metaFields": [
    "author",
    "article:author",
    "parsely-author",
    "cXenseParse:author"
  ],
  "defaultIdx": 1,
  "defaultKey": "byline",
  "description": "Name of the primary author or the first author detected."
}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"author"
"meta[name=\"author\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"article:author"
"meta[name=\"article:author\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"parsely-author"
"meta[name=\"parsely-author\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"cXenseParse:author"
"meta[name=\"cXenseParse:author\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283706 | DEBUG | attempting to parse prop metadata
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
{
  "enabled": true,
  "custom": false,
  "_key": "site",
  "_idx": 2,
  "id": "siteName",
  "metaFields": [
    "og:site_name",
    "page.content.source",
    "application-name",
    "apple-mobile-web-app-title",
    "twitter:site"
  ],
  "defaultIdx": 2,
  "defaultKey": "site",
  "description": "Website or publication name."
}
1722303283706 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"og:site_name"
"meta[name=\"og:site_name\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283707 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"page.content.source"
"meta[name=\"page.content.source\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283707 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"application-name"
"meta[name=\"application-name\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283707 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"apple-mobile-web-app-title"
"meta[name=\"apple-mobile-web-app-title\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283707 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"twitter:site"
"meta[name=\"twitter:site\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283707 | DEBUG | attempting to parse prop metadata
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
{
  "enabled": true,
  "custom": false,
  "_key": "date",
  "_idx": 3,
  "_format": "d|YYYY-MM-DDTHH:mm",
  "id": "publishedTime",
  "metaFields": [
    "article:published_time",
    "parsely-pub-date",
    "datePublished",
    "article.published"
  ],
  "defaultIdx": 3,
  "defaultKey": "date",
  "description": "Date/time that the page was initially published.",
  "defaultFormat": "d|YYYY-MM-DDTHH:mm"
}
1722303283707 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"article:published_time"
"meta[name=\"article:published_time\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283707 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"parsely-pub-date"
"meta[name=\"parsely-pub-date\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283707 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"datePublished"
"meta[name=\"datePublished\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
1722303283707 | DEBUG | found prop elements
  • Caller: SlurpPlugin.slurp (plugin:slurp:12540:30)
"article.published"
"meta[name=\"article.published\"], meta[property=\"{s}\"], meta[itemprop=\"{s}\"], meta[http-equiv=\"{s}\"]"
{}
@inhumantsar
Copy link
Owner

it looks like this site uses javascript-based forwarding. when going to the URL you shared, some javascript waits for the page to fully load before changing the window location to the "proper" URL, only then does the content load.

the only way to support sites like this would be to allow all sites to run arbitrary scripts within obsidian. unfortunately this isn't something i can (or would) do, so there's no real fix for this. i will keep this issue open for now though, as its a good reminder to add some kind of pre-slurp check for sites which do this and display a more appropriate error message.

@inhumantsar inhumantsar added the unsupported-site Sites that reject requests from Slurp or use scripts that prevent Slurp from working label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unsupported-site Sites that reject requests from Slurp or use scripts that prevent Slurp from working
Projects
None yet
Development

No branches or pull requests

2 participants