Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #20: Added basic summarize functionality #66

Merged
merged 13 commits into from
Nov 14, 2019

Conversation

ThomasLuu00
Copy link
Contributor

@ThomasLuu00 ThomasLuu00 commented Nov 7, 2019

Fixes #20

Added a summarizer that uses gensim's TextRank Summarizer in python.

I included code that allows it to be called from the node.js framework just to test it and to make it easier for whoever wants to call the function.

It isn't good, but it's a starting point. I will continue working on a document summary model.

Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not take on a python dependency if we don't need to. There are a few dozen text summarization modules on npm, let's try and use those.

Next, we can get rid of express here. We don't need a web server to do this, just a function interface. You can model your code on what @jatinAroraGit is doing in #46. Basically, we need to export an async run function that accepts some text, and gives back the result via a Promise.

Also, we should be able to include a test now that the Jest code is in place. You can either do that in this PR, or file another bug for someone to work on that adds one. The test should, at the least, call your function with some snippet of text, and expect to get back results of the form we'll expect at runtime.

Copy link
Contributor

@Immutablevoid Immutablevoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to run npm eslint-fix and rerun npm test to see if all errors have been corrected.

Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taking shape! I left some more comments.

const { SummarizerManager } = require('node-summarizer');

module.exports = async function (text, numSentences = 3) {
const Summarizer = new SummarizerManager(text, numSentences);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const summarizer = ...

Use lower case to start a variable name, only use a capital for a Constructor function name.

const fs = require('fs');
const summarize = require('./summarize');

describe('summarize tests...', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to convert this to Jest format (i.e., test() vs. describe())

Copy link
Contributor Author

@ThomasLuu00 ThomasLuu00 Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to nest tests in Jest, I use describe to define a scope just in case I wish to add other types of test cases into the file. It isn't really needed at the moment so I could just remove describe,.

const summarize = require('./summarize');

describe('summarize tests...', () => {
const text = fs.readFileSync('./summarizer/text.txt', 'utf-8');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go in a before test setup call. You can also drop the readFileSync and just use readFile, since we can use async calls in a test setup function.


it('summarize returns string', async () => {
data = await summarize(text, 5);
expect(typeof data).toBe('string');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test for the exact summary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea, but I'm am not certain I have the best approach to it. I think I could compare how similar the machine summary is to a human summary and have it pass if it meets a threshold, but this requires a good human summary for each blog input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is: run your code once, get the summary it produces, see if you think it makes sense. If it does, hard code that into your test, so we know in the future if it breaks for this given blog post text. This is really a regression test, to guard against future problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I added the test.

Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really close.

.stylelintrc.js Outdated
@@ -6,7 +6,8 @@ module.exports = {
"LICENSE",
"*.*",
"images/*.*",
"!*.css"
"!*.css",
"summarizer/*.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes me think we need to file a new issue to adjust how our stylelint works, such that it only looks at files in a particular directory, maybe src/frontend/styles/* or something, and then we don't collide with random files like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted as issue #162.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I closed the issue as duplicate. Did not see the issue you posted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know, this is being fixed in #151, so why don't you reset your changes to this file, so you don't have merge conflicts with that:

git checkout master .stylelintrc.js

And you can go back to what it was. It will fail Travis, but we can ignore. By the way, you might want to add another review to #151 so we can land it.

@@ -0,0 +1,23 @@
const fs = require('fs').promises;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this into test/ where the other tests are.

.stylelintrc.js Outdated
@@ -6,7 +6,8 @@ module.exports = {
"LICENSE",
"*.*",
"images/*.*",
"!*.css"
"!*.css",
"summarizer/*.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know, this is being fixed in #151, so why don't you reset your changes to this file, so you don't have merge conflicts with that:

git checkout master .stylelintrc.js

And you can go back to what it was. It will fail Travis, but we can ignore. By the way, you might want to add another review to #151 so we can land it.

let expected;

beforeAll(async () => {
inputData = await fs.readFile('./summarizer/input.txt', 'utf-8');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these files into test/, and/or just inline the contents of the files into this test file so we don't even have to read from disk.

return expect(predicted).toBe(expected);
});

test('summarize returns string', async () => expect(typeof predicted).toBe('string'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests can be run in different order, so don't count on predicted to exist here. I'd re-calculate this in each test so you know it's good. Try not to have test() functions depend on shared state. It can lead to random failures.


test('summarize returns string', async () => expect(typeof predicted).toBe('string'));

test('summarize returns correct number of lines', () => expect(predicted.split('.').length).toBe(6));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does predicted look like that you have to call split? I don't follow the logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Predicted' should be returning 5 sentences since I pass 5 sentences as a parameter for summarize. So I split 'predicted' in order to get the number of sentences. The module I use requires that a number of sentences be specified to produce a summary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Defining a sentence as "ends with a ." could be problematic, if a sentence includes something like "I'm working on a file named foo.js in my course." Probably doing /\. {1,2}/ for "period followed by 1 or 2 spaces" would be more accurate. That still leaves out sentences that end with a ;. Anyway, this gets into the weed. If you want to tighten up your split() to deal with the periods too, that's fine. Otherwise, we'll go with what you have here.

Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, a few little things, and I think this is probably good.


test('summarize returns string', async () => expect(typeof predicted).toBe('string'));

test('summarize returns correct number of lines', () => expect(predicted.split('.').length).toBe(6));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Defining a sentence as "ends with a ." could be problematic, if a sentence includes something like "I'm working on a file named foo.js in my course." Probably doing /\. {1,2}/ for "period followed by 1 or 2 spaces" would be more accurate. That still leaves out sentences that end with a ;. Anyway, this gets into the weed. If you want to tighten up your split() to deal with the periods too, that's fine. Otherwise, we'll go with what you have here.

src/summarize.js Outdated
module.exports = async function (text, numSentences = 3) {
const summarizer = new SummarizerManager(text, numSentences);
return summarizer.getSummaryByRank().then(
(summaryObject) => Promise.resolve(summaryObject.summary),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(summaryObject) => summaryObject.summary is all you need here

@humphd
Copy link
Contributor

humphd commented Nov 13, 2019

Looks like your regex isn't passing this test anymore. I think it's fine if you want to revert to what you had, so we can land this and improve it later.

@ThomasLuu00
Copy link
Contributor Author

I'm not sure why, but the TextRank algorithm in node-summarizer throws out inconsistent results. Sometimes it fails the test but once its rerun it might pass, or just fail again.

@ThomasLuu00
Copy link
Contributor Author

So from what I read on the node-summarizer repo, looks like TextRank might not always throw out the same result, but it should still be generating the correct number of sentences each time.

@humphd
Copy link
Contributor

humphd commented Nov 13, 2019

Fascinating. OK, let's adjust the tests to match the expectations of the library, and include some comments in your tests/code that indicate what your research has uncovered about how it varies with each run.

@ThomasLuu00 ThomasLuu00 merged commit fdda764 into Seneca-CDOT:master Nov 14, 2019
@ThomasLuu00 ThomasLuu00 deleted the issue-20 branch November 14, 2019 15:12
DukeManh pushed a commit to DukeManh/telescope that referenced this pull request Mar 7, 2022
Co-authored-by: Renovate Bot <bot@renovateapp.com>
tpmai22 added a commit that referenced this pull request Mar 8, 2022
* Initial work

* Docs and bodyParser

* fix: correct typo and add missing quotation mark

* Move router onto Satellite instance

* Use built-in body parsing with Express

* Add tests for body parsers

* Add GitHub CI + README badge

* Fix package-lock.json sync with package.json

* Fix README build badge URL

* Add release workflow

* 1.0.1

* Use standard tag format for npm with v prefix

* 1.0.2

* Release yaml fix

* 1.0.3

* Explicitly set package public for npm publish

* 1.0.4

* Remove private field completely in order to publish to npm

* 1.0.5

* Switch org name to @senecacdot

* 1.0.6

* Update lock file

* 1.0.7

* Expose Router from package

* 1.1.0

* Switch from new Router() to Router()

* 1.1.1

* Use env variables to start apm monitoring sooner

* 1.2.0

* Switch to ELASTIC_APM_SERVER_URL, better 404 reporting, refactor Router()

* 1.3.0

* Add beforeParsers and beforeRouter options with tests

* 1.4.0

* Add pino-colada for debug logging

* Improve logging, use ELASTIC_APM_SERVICE_NAME env var, add router option to ctor

* 1.5.0

* Remove pino-tiny dep

* Fix logger picking logic on startup

* 1.5.1

* Document healthCheck and add more tests

* 1.5.2

* Add default favicon support

* 1.6.0

* Update README install instructions, deps

* 1.6.1

* Add JWT validation, tests, and update docs

* 1.7.0

* Ported Hash to Satellite

* Removed Redundant Code, Added Comment Block, Fixed Import

* Re-add crypto

* Add test for req.user

* Refactor into src/, breakup middleware authenticate vs. authorize, remove favicon

* 1.8.0

* Init Prettier Commit

* Adds the createError module for use in Telescope (#5)

* Fixed CreateError module, use http-errors

* removed merge conflict errors

* Added Docs in README

Co-authored-by: David Humphrey <david.andrew.humphrey@gmail.com>

* Finish prettier integration

* Fix workflows

* Prettier for jest.config.js

* Update deps, fix prettier-check on windows

* 1.8.5

* Specify main entry point in package.json

* 1.8.6

* Add .husky directory and pre-commit hook

* Fix #1930

* Support credentials for HTTPS vs. HTTP server

* 1.8.7

* Don't install Husky on postinstall

* 1.8.8

* Add support for generating a service token

* 1.9.0

* Updated redis and added ping test

* Updated redis export

* Fixed Redis test case

* 1.10.0

* isAuthorized() always takes a function with req, user params

* 1.11.0

* 1.12.0

* Initial Elastic client code

* Updated elastic contructor, add initial tests

* Fixed elastic search client, mock elastic connection

* Updated README.md with Elastic() info

* 1.13.0

* Add shutDown() to allow killing connections

* 1.14.0

* Add automatic, graceful shutdown for Redis and Elastic clients

* Update deps for 1.14.0

* 1.15.0

* Initial exported Fetch() function to Satellite

* Updated exports to require node-fetch instead of it being in a separate file

* Updated spelling to 'fetch' and updated tests to use nock

* Removed done() from tests and moved node-fetch to be a dependency vs dev-dependency

* Update lock file

* 1.16.0

* chore: include nodejs 16 in the CI build matrix

* feat: add auto-opt-out of FLoC

* 1.17.0

* Add eslint to satellite

* adding and configuring eslint

* Fixing linting errors

* configuring anti-trojan-source plugin

* adding lint to pre-commit hook

* Removing ts from eslint run and removing comment

* removing unused function validateAuthorizationOptions

* Adding no-unused-vars override to test.js and removing the override from config

* Adding ESLint to CI runs

* Integrating eslint with the release workflow

* Fix Dependencies:
    * Remove elastic-apm-node
    * Remove @elastic/ecs-pino-format
    * Update Jest (To fix deprecated dependencies)

* * Update pino
 * Switch from express-pino-logger to pino-http
 * Standardize Dependencies

* Configure Renovate (#23)

* Configure renovate bot


Co-authored-by: Renovate Bot <bot@renovateapp.com>
Co-authored-by: Duke Manh <manhducdkcb@gmail.com>

* chore(deps): update dependency pretty-quick to v3.1.3

* fix(deps): update dependency express to v4.17.2

* fix(deps): update dependency node-fetch to v2.6.7

* fix(deps): update dependency @elastic/elasticsearch-mock to v0.3.1

* fix(deps): update dependency http-errors to v1.8.1

* chore(deps): update dependency eslint to v8.7.0

* chore(deps): update dependency eslint-plugin-anti-trojan-source to v1.1.0

* chore(deps): update dependency eslint-plugin-jest to v25.7.0

* chore(deps): update dependency husky to v5.2.0

* Switch npm to pnpm

* Release v1.18.0

* 1.20.0

* 1.21.0

* Use --no-git-checks with pnpm publish to avoid failure on CI

* 1.22.0

* remove pre-commit

* Release v1.23.0

* fix(deps): update dependency pino-pretty to v7.5.1

* fix(deps): update dependency pino to v7.6.5

* chore(deps): update dependency eslint to v8.8.0

* chore(deps): update dependency nock to v13.2.2

* bump prettier to v2.5.1 and run prettier on entire tree

* fix(deps): update dependency @elastic/elasticsearch to v7.16.0

* fix(deps): update dependency @godaddy/terminus to v4.10.2 (#48)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* fix(deps): update dependency express-jwt to v6.1.0

* fix(deps): update dependency ioredis to v4.28.3 (#50)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* fix(deps): update dependency ioredis-mock to v5.9.1

* fix-renovate-bot

* Refactoring elastic.js so mock is exported for tests
- Adding tests for mock Elastic()
- Added mock Elastic() description in README.md

* Release v1.24.0

* chore(deps): update dependency nock to v13.2.4

* fix(deps): update dependency ioredis to v4.28.4 (#55)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* chore(deps): update dependency jest to v27.5.0 (#56)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* fix(deps): update dependency ioredis to v4.28.5

* fix(deps): update dependency @elastic/elasticsearch to v7.17.0

* chore(deps): update dependency jest to v27.5.1 (#59)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* chore(deps): update dependency eslint to v8.9.0 (#60)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* changed all uses of SECRET -> JWT_SECRET

* Release v1.25.0

* Adding more tests for createError

* fix(deps): update dependency express to v4.17.3

* fix(deps): update dependency pino to v7.8.0 (#66)

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* adding ES error cases to createError

* Release v.1.26.0

* fix(deps): update dependency express-jwt to v6.1.1

* chore(deps): update dependency eslint to v8.10.0

* fix(deps): update dependency ioredis-mock to v7

* fix(deps): update dependency ioredis-mock to v7.1.0

* fix(deps): update dependency pino-pretty to v7.5.3

* chore(deps): update dependency husky to v7

* chore(deps): update dependency eslint-plugin-jest to v26

* fix(deps): update dependency helmet to v5

* set default values for status and argToSend so they're not undefined

* Delete unneded config files from Satellite repo

Co-authored-by: David Humphrey <david.andrew.humphrey@gmail.com>
Co-authored-by: Josue <josue.quilon-barrios@senecacollege.ca>
Co-authored-by: Metropass <moho472@gmail.com>
Co-authored-by: Mo <58116522+Metropass@users.noreply.github.com>
Co-authored-by: Abdulbasid Guled <guled.basid@gmail.com>
Co-authored-by: Josue <manekenpix@fastmail.com>
Co-authored-by: dhillonks <kunwarvir@hotmail.com>
Co-authored-by: Kevan-Y <58233223+Kevan-Y@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Renovate Bot <bot@renovateapp.com>
Co-authored-by: Duke Manh <manhducdkcb@gmail.com>
Co-authored-by: Cindy Le <cindyledev@gmail.com>
Co-authored-by: AmasiaNalbandian <amasia.nalbandian@mitel.com>
Co-authored-by: Amasia <77639637+AmasiaNalbandian@users.noreply.github.com>
Co-authored-by: rclee91 <32626950+rclee91@users.noreply.github.com>
Co-authored-by: Jia Hua Zou <jiahua.zou1@gmail.com>
Co-authored-by: Joel Azwar <joel_azwar@yahoo.com>
Co-authored-by: Anatoliy Serputoff <65831678+aserputov@users.noreply.github.com>
Co-authored-by: tpmai <thienphuoc.0108@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement feature to generate summary of the blog post
5 participants