Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in SEC Scraper #15

Open
imakemoneymoves opened this issue Aug 4, 2021 · 0 comments
Open

Error in SEC Scraper #15

imakemoneymoves opened this issue Aug 4, 2021 · 0 comments

Comments

@imakemoneymoves
Copy link

imakemoneymoves commented Aug 4, 2021

Describe the bug
Encounter an error when grrabbing the Filing XML Summary (Referred to as "Second Block")

To Reproduce
Steps to reproduce the behavior:

  1. First Block

import our libraries

import requests
import pandas as pd
from bs4 import BeautifulSoup

  1. Second Block

define the base url needed to create the file url.

base_url = r"https://www.sec.gov"

convert a normal url to a document url

normal_url = r"https://www.sec.gov/Archives/edgar/data/106040/000010604020000024/0000106040-20-000024.txt"
normal_url = normal_url.replace('-','').replace('.txt','/index.json')

define a url that leads to a 10k document landing page

documents_url = r"https://www.sec.gov/Archives/edgar/data/106040/000010604020000024/index.json"

request the url and decode it.

content = requests.get(documents_url).json()

for file in content['directory']['item']:

# Grab the filing summary and create a new url leading to the file so we can download it.
if file['name'] == 'FilingSummary.xml':

    xml_summary = base_url + content['directory']['name'] + "/" + file['name']
    
    print('-' * 100)
    print('File Name: ' + file['name'])
    print('File Path: ' + xml_summary)
  1. See error

JSONDecodeError Traceback (most recent call last)
in
10
11 # request the url and decode it.
---> 12 content = requests.get(documents_url).json()
13
14 for file in content['directory']['item']:

C:\ProgramData\Miniconda2\envs\tensorflow\lib\site-packages\requests\models.py in json(self, **kwargs)
898 # used.
899 pass
--> 900 return complexjson.loads(self.text, **kwargs)
901
902 @Property

C:\ProgramData\Miniconda2\envs\tensorflow\lib\json_init_.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder

C:\ProgramData\Miniconda2\envs\tensorflow\lib\json\decoder.py in decode(self, s, _w)
337
338 """
--> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
340 end = _w(s, end).end()
341 if end != len(s):

C:\ProgramData\Miniconda2\envs\tensorflow\lib\json\decoder.py in raw_decode(self, s, idx)
355 obj, end = self.scan_once(s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Expected behavior

File Name: FilingSummary.xml
File Path: https://www.sec.gov/Archives/edgar/data/106040/000010604020000024/FilingSummary.xml

Screenshots
Not Applicable.

Additional context
For context, its a 50/50 if it works. Sometimes when I run it, it sucssesfully returns the File name and File Path, other times I get the JSON Decode error and have to restart kernel and run it all again. By the way, I am a big fan. Are you working on any projects recently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant