Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically fetch and parse complete feed list from wiki #59

Closed
humphd opened this issue Nov 6, 2019 · 10 comments
Closed

Automatically fetch and parse complete feed list from wiki #59

humphd opened this issue Nov 6, 2019 · 10 comments
Assignees

Comments

@humphd
Copy link
Contributor

humphd commented Nov 6, 2019

At the moment we're using a hack to read feed URLs from a text file. This is fine for our initial efforts, but we should really be pulling those feed URLs from https://wiki.cdot.senecacollege.ca/wiki/Planet_CDOT_Feed_List.

Looking at the markup in that page, we need to grab the contents of the single pre element on the page:

<pre>
################# Failing Feeds Commented Out [Start] #################

#Feed excluded due to getaddrinfo ENOTFOUND s-aleinikov.blog.ca s-aleinikov.blog.ca:80
#[http://s-aleinikov.blog.ca/feed/atom/posts/]
#name=Sergey Aleinikov
...
</pre>

We then need to lightly process this:

  • ignore lines beginning with # (comments)
  • read URLs out of the [url] format (surrounded in square brackets)
  • read the person's name from name=User Name

Later we might decide to store this in a db or something, but let's begin by pulling this down from the cdot wiki.

@l1qu1dated
Copy link
Contributor

I would like to work on this.

@humphd
Copy link
Contributor Author

humphd commented Nov 6, 2019

Go for it Igor. You might be able to get away with a regex for this, or might decide to use some node based DOM parser to extract the <pre>'s innerText.

@l1qu1dated
Copy link
Contributor

I was thinking of using request and cheerio libraries to connect and traverse DOM, also maybe a good idea to start using express.js at this point.

@humphd
Copy link
Contributor Author

humphd commented Nov 6, 2019

Don't use request, as it's a dead project. Use bent, see https://github.com/mikeal/bent and look at my code in the worker for downloading feeds.

For parsing the DOM, I'd use https://www.npmjs.com/package/jsdom vs. cheerio, since it's also better maintained.

And you won't need express for this. We're just going to feed this data into the queue, like I do now in the src/index.js file.

@l1qu1dated
Copy link
Contributor

I created PR for the parser, currently it displays data to the console, but later we can save it to json or db. I commented out the code that saves the data to the txt file. It can be removed later or used for test purposes.

@l1qu1dated
Copy link
Contributor

Also some of the feeds were written a bit different from the rest so I did a check to remove unnecessary tags. Like this one for example:

[href="https://amddeeb.wordpress.com/feed/]
name = Ahmed Deeb

It has "href=" inside square brackets, I'm not really sure how many more mistakes like that is in the feed list, cause its huge and I didn't go through the whole list, but if whoever notices, your welcome to add another check or let me know and I'l add one.

@mskuybeda
Copy link
Contributor

@humphd I will make functionality that will receive the data parsed by @cryolis and send them to redis db.

@l1qu1dated
Copy link
Contributor

I managed to push my commits to upstream/issue#59 branch but they are not shown in the PR that was created. Should I create another PR?

@c3ho
Copy link
Contributor

c3ho commented Nov 14, 2019

@cryolis try pushing to origin instead with your branch

@l1qu1dated
Copy link
Contributor

I think I did it but it pushed like 20 commits with it.

@humphd humphd closed this as completed in 674ec97 Nov 15, 2019
menghif pushed a commit to menghif/telescope that referenced this issue Mar 8, 2022
Co-authored-by: Renovate Bot <bot@renovateapp.com>
tpmai22 added a commit that referenced this issue Mar 8, 2022
* Initial work

* Docs and bodyParser

* fix: correct typo and add missing quotation mark

* Move router onto Satellite instance

* Use built-in body parsing with Express

* Add tests for body parsers

* Add GitHub CI + README badge

* Fix package-lock.json sync with package.json

* Fix README build badge URL

* Add release workflow

* 1.0.1

* Use standard tag format for npm with v prefix

* 1.0.2

* Release yaml fix

* 1.0.3

* Explicitly set package public for npm publish

* 1.0.4

* Remove private field completely in order to publish to npm

* 1.0.5

* Switch org name to @senecacdot

* 1.0.6

* Update lock file

* 1.0.7

* Expose Router from package

* 1.1.0

* Switch from new Router() to Router()

* 1.1.1

* Use env variables to start apm monitoring sooner

* 1.2.0

* Switch to ELASTIC_APM_SERVER_URL, better 404 reporting, refactor Router()

* 1.3.0

* Add beforeParsers and beforeRouter options with tests

* 1.4.0

* Add pino-colada for debug logging

* Improve logging, use ELASTIC_APM_SERVICE_NAME env var, add router option to ctor

* 1.5.0

* Remove pino-tiny dep

* Fix logger picking logic on startup

* 1.5.1

* Document healthCheck and add more tests

* 1.5.2

* Add default favicon support

* 1.6.0

* Update README install instructions, deps

* 1.6.1

* Add JWT validation, tests, and update docs

* 1.7.0

* Ported Hash to Satellite

* Removed Redundant Code, Added Comment Block, Fixed Import

* Re-add crypto

* Add test for req.user

* Refactor into src/, breakup middleware authenticate vs. authorize, remove favicon

* 1.8.0

* Init Prettier Commit

* Adds the createError module for use in Telescope (#5)

* Fixed CreateError module, use http-errors

* removed merge conflict errors

* Added Docs in README

Co-authored-by: David Humphrey <david.andrew.humphrey@gmail.com>

* Finish prettier integration

* Fix workflows

* Prettier for jest.config.js

* Update deps, fix prettier-check on windows

* 1.8.5

* Specify main entry point in package.json

* 1.8.6

* Add .husky directory and pre-commit hook

* Fix #1930

* Support credentials for HTTPS vs. HTTP server

* 1.8.7

* Don't install Husky on postinstall

* 1.8.8

* Add support for generating a service token

* 1.9.0

* Updated redis and added ping test

* Updated redis export

* Fixed Redis test case

* 1.10.0

* isAuthorized() always takes a function with req, user params

* 1.11.0

* 1.12.0

* Initial Elastic client code

* Updated elastic contructor, add initial tests

* Fixed elastic search client, mock elastic connection

* Updated README.md with Elastic() info

* 1.13.0

* Add shutDown() to allow killing connections

* 1.14.0

* Add automatic, graceful shutdown for Redis and Elastic clients

* Update deps for 1.14.0

* 1.15.0

* Initial exported Fetch() function to Satellite

* Updated exports to require node-fetch instead of it being in a separate file

* Updated spelling to 'fetch' and updated tests to use nock

* Removed done() from tests and moved node-fetch to be a dependency vs dev-dependency

* Update lock file

* 1.16.0

* chore: include nodejs 16 in the CI build matrix

* feat: add auto-opt-out of FLoC

* 1.17.0

* Add eslint to satellite

* adding and configuring eslint

* Fixing linting errors

* configuring anti-trojan-source plugin

* adding lint to pre-commit hook

* Removing ts from eslint run and removing comment

* removing unused function validateAuthorizationOptions

* Adding no-unused-vars override to test.js and removing the override from config

* Adding ESLint to CI runs

* Integrating eslint with the release workflow

* Fix Dependencies:
    * Remove elastic-apm-node
    * Remove @elastic/ecs-pino-format
    * Update Jest (To fix deprecated dependencies)

* * Update pino
 * Switch from express-pino-logger to pino-http
 * Standardize Dependencies

* Configure Renovate (#23)

* Configure renovate bot


Co-authored-by: Renovate Bot <bot@renovateapp.com>
Co-authored-by: Duke Manh <manhducdkcb@gmail.com>

* chore(deps): update dependency pretty-quick to v3.1.3

* fix(deps): update dependency express to v4.17.2

* fix(deps): update dependency node-fetch to v2.6.7

* fix(deps): update dependency @elastic/elasticsearch-mock to v0.3.1

* fix(deps): update dependency http-errors to v1.8.1

* chore(deps): update dependency eslint to v8.7.0

* chore(deps): update dependency eslint-plugin-anti-trojan-source to v1.1.0

* chore(deps): update dependency eslint-plugin-jest to v25.7.0

* chore(deps): update dependency husky to v5.2.0

* Switch npm to pnpm

* Release v1.18.0

* 1.20.0

* 1.21.0

* Use --no-git-checks with pnpm publish to avoid failure on CI

* 1.22.0

* remove pre-commit

* Release v1.23.0

* fix(deps): update dependency pino-pretty to v7.5.1

* fix(deps): update dependency pino to v7.6.5

* chore(deps): update dependency eslint to v8.8.0

* chore(deps): update dependency nock to v13.2.2

* bump prettier to v2.5.1 and run prettier on entire tree

* fix(deps): update dependency @elastic/elasticsearch to v7.16.0

* fix(deps): update dependency @godaddy/terminus to v4.10.2 (#48)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* fix(deps): update dependency express-jwt to v6.1.0

* fix(deps): update dependency ioredis to v4.28.3 (#50)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* fix(deps): update dependency ioredis-mock to v5.9.1

* fix-renovate-bot

* Refactoring elastic.js so mock is exported for tests
- Adding tests for mock Elastic()
- Added mock Elastic() description in README.md

* Release v1.24.0

* chore(deps): update dependency nock to v13.2.4

* fix(deps): update dependency ioredis to v4.28.4 (#55)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* chore(deps): update dependency jest to v27.5.0 (#56)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* fix(deps): update dependency ioredis to v4.28.5

* fix(deps): update dependency @elastic/elasticsearch to v7.17.0

* chore(deps): update dependency jest to v27.5.1 (#59)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* chore(deps): update dependency eslint to v8.9.0 (#60)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* changed all uses of SECRET -> JWT_SECRET

* Release v1.25.0

* Adding more tests for createError

* fix(deps): update dependency express to v4.17.3

* fix(deps): update dependency pino to v7.8.0 (#66)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* adding ES error cases to createError

* Release v.1.26.0

* fix(deps): update dependency express-jwt to v6.1.1

* chore(deps): update dependency eslint to v8.10.0

* fix(deps): update dependency ioredis-mock to v7

* fix(deps): update dependency ioredis-mock to v7.1.0

* fix(deps): update dependency pino-pretty to v7.5.3

* chore(deps): update dependency husky to v7

* chore(deps): update dependency eslint-plugin-jest to v26

* fix(deps): update dependency helmet to v5

* set default values for status and argToSend so they're not undefined

* Delete unneded config files from Satellite repo

Co-authored-by: David Humphrey <david.andrew.humphrey@gmail.com>
Co-authored-by: Josue <josue.quilon-barrios@senecacollege.ca>
Co-authored-by: Metropass <moho472@gmail.com>
Co-authored-by: Mo <58116522+Metropass@users.noreply.github.com>
Co-authored-by: Abdulbasid Guled <guled.basid@gmail.com>
Co-authored-by: Josue <manekenpix@fastmail.com>
Co-authored-by: dhillonks <kunwarvir@hotmail.com>
Co-authored-by: Kevan-Y <58233223+Kevan-Y@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Renovate Bot <bot@renovateapp.com>
Co-authored-by: Duke Manh <manhducdkcb@gmail.com>
Co-authored-by: Cindy Le <cindyledev@gmail.com>
Co-authored-by: AmasiaNalbandian <amasia.nalbandian@mitel.com>
Co-authored-by: Amasia <77639637+AmasiaNalbandian@users.noreply.github.com>
Co-authored-by: rclee91 <32626950+rclee91@users.noreply.github.com>
Co-authored-by: Jia Hua Zou <jiahua.zou1@gmail.com>
Co-authored-by: Joel Azwar <joel_azwar@yahoo.com>
Co-authored-by: Anatoliy Serputoff <65831678+aserputov@users.noreply.github.com>
Co-authored-by: tpmai <thienphuoc.0108@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants