Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

爬取回答时报错,文章、想法可以爬取 #5

Open
66my opened this issue Apr 30, 2024 · 3 comments
Open

爬取回答时报错,文章、想法可以爬取 #5

66my opened this issue Apr 30, 2024 · 3 comments

Comments

@66my
Copy link

66my commented Apr 30, 2024

报错内容

DevTools listening on ws://127.0.0.1:9922/devtools/browser/8b5cd6db-98dc-4859-a19b-586646e5eccd
[25540:10460:0430/152431.589:ERROR:fallback_task_provider.cc(127)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[25540:10460:0430/152438.872:ERROR:fallback_task_provider.cc(127)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
Traceback (most recent call last):
  File "D:\24Python\zhihu_spider_selenium-master\crawler.py", line 1117, in <module>
    zhihu()
  File "D:\24Python\zhihu_spider_selenium-master\crawler.py", line 1053, in zhihu
    crawl_answers_links(driver, username)
  File "D:\24Python\zhihu_spider_selenium-master\crawler.py", line 177, in crawl_answers_links
    WebDriverWait(driver, timeout=10).until(lambda d: d.find_element(By.CLASS_NAME, "Pagination"))
  File "S:\condaenv\getdata310new\lib\site-packages\selenium\webdriver\support\wait.py", line 95, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
        GetHandleVerifier [0x00007FF6D98FD8E2+35890]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98BACC2+1330002]
        Microsoft::Applications::Events::ILogManager::operator= [0x00007FF6D96AE137+5095]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96F4E7E+159950]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96F4F66+160182]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D972FEF7+401735]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D971474F+289183]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96EA6C7+117015]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D972DAF1+392513]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D9714373+288195]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96E9BEE+114238]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96E8DAC+110588]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96E97A1+113137]
        GetHandleVerifier [0x00007FF6D99939F4+650564]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D97899BC+79948]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D9862D4C+969692]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D985B485+938773]
        GetHandleVerifier [0x00007FF6D99929B5+646405]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98C2E81+1363217]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98BE4F4+1344388]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98BE62B+1344699]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98B5B21+1309105]
        BaseThreadInitThunk [0x00007FF970E5257D+29]
        RtlUserThreadStart [0x00007FF971E6AA58+40]
@ZouJiu1
Copy link
Owner

ZouJiu1 commented Apr 30, 2024

已经修复了的,这边可以正常爬取website以及回答的,若是存在问题,需要给出爬取的网址,以及相应的报错

@66my
Copy link
Author

66my commented Apr 30, 2024

测试了一下,仍然不行,非网络问题。

python crawler.py --answer --links_scratch

生成了 answers.txt,正常抓了链接地址,在生成第一个回答时,程序崩溃。
也就是第一个回答没有输出 .md.pdf 文件程序就报错了。

@66my
Copy link
Author

66my commented May 2, 2024

问题已经解决,认为是win11系统下,edge自动开启了效能模式,导致网络正常时放在后台流量打不到要求,使用新版代码,持续在前台是可以正常下载的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants