Skip to content

Commit

Permalink
Corpora_and_Vector_Spaces tutorial text clarification (#1116)
Browse files Browse the repository at this point in the history
  • Loading branch information
lgmoneda authored and tmylk committed Jan 29, 2017
1 parent be3c9f9 commit ba37ff3
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions docs/notebooks/Corpora_and_Vector_Spaces.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The function `doc2bow()` simply counts the number of occurrences of each distinct word, converts the word to its integer word id and returns the result as a sparse vector. The sparse vector `[(0, 1), (1, 1)]` therefore reads: in the document *“Human computer interaction”*, the words computer (id 0) and human (id 1) appear once; the other ten dictionary words appear (implicitly) zero times."
"The function `doc2bow()` simply counts the number of occurrences of each distinct word, converts the word to its integer word id and returns the result as a sparse vector. The sparse vector `[(word_id, 1), (word_id, 1)]` therefore reads: in the document *“Human computer interaction”*, the words *\"computer\"* and *\"human\"*, identified by an integer id given by the built dictionary, appear once; the other ten dictionary words appear (implicitly) zero times. Check their id at the dictionary displayed in the previous cell and see that they match."
]
},
{
Expand Down Expand Up @@ -250,7 +250,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"By now it should be clear that the vector feature with `id=10 stands` for the question “How many times does the word graph appear in the document?” and that the answer is “zero” for the first six documents and “one” for the remaining three. As a matter of fact, we have arrived at exactly the same corpus of vectors as in the [Quick Example](https://radimrehurek.com/gensim/tutorial.html#first-example).\n",
"By now it should be clear that the vector feature with `id=10 stands` for the question “How many times does the word graph appear in the document?” and that the answer is “zero” for the first six documents and “one” for the remaining three. As a matter of fact, we have arrived at exactly the same corpus of vectors as in the [Quick Example](https://radimrehurek.com/gensim/tutorial.html#first-example). If you're running this notebook by your own, the words id may differ, but you should be able to check the consistency between documents comparing their vectors. \n",
"\n",
"## Corpus Streaming – One Document at a Time\n",
"\n",
Expand Down Expand Up @@ -616,7 +616,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.6.0"
}
},
"nbformat": 4,
Expand Down

0 comments on commit ba37ff3

Please sign in to comment.