Skip to content

Commit

Permalink
DEP bump pylangacq to 0.13.1 for chat parsing fix, add more talkbank …
Browse files Browse the repository at this point in the history
…datasets in docs
  • Loading branch information
jacksonllee committed Mar 23, 2021
1 parent 7c4b78c commit 5542fde
Show file tree
Hide file tree
Showing 42 changed files with 121 additions and 332 deletions.
2 changes: 1 addition & 1 deletion docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 1b2c69900a20a5e6d404547532dec06c
config: 76d7be4743f134d1ecb28043ff0666d5
tags: 645f666f9bcd5a90fca523b33c5a78b7
2 changes: 1 addition & 1 deletion docs/_modules/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ <h1>All modules for which code is available</h1>
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/corpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -703,7 +703,7 @@ <h1>Source code for pycantonese.corpus</h1><div class="highlight"><pre>
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/jyutping/characters.html
Original file line number Diff line number Diff line change
Expand Up @@ -438,7 +438,7 @@ <h1>Source code for pycantonese.jyutping.characters</h1><div class="highlight"><
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/jyutping/parse_jyutping.html
Original file line number Diff line number Diff line change
Expand Up @@ -498,7 +498,7 @@ <h1>Source code for pycantonese.jyutping.parse_jyutping</h1><div class="highligh
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/jyutping/tipa.html
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,7 @@ <h1>Source code for pycantonese.jyutping.tipa</h1><div class="highlight"><pre>
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/jyutping/yale.html
Original file line number Diff line number Diff line change
Expand Up @@ -625,7 +625,7 @@ <h1>Source code for pycantonese.jyutping.yale</h1><div class="highlight"><pre>
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/pos_tagging/hkcancor_to_ud.html
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ <h1>Source code for pycantonese.pos_tagging.hkcancor_to_ud</h1><div class="highl
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/pos_tagging/tagger.html
Original file line number Diff line number Diff line change
Expand Up @@ -660,7 +660,7 @@ <h1>Source code for pycantonese.pos_tagging.tagger</h1><div class="highlight"><p
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/stop_words.html
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ <h1>Source code for pycantonese.stop_words</h1><div class="highlight"><pre>
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/_modules/pycantonese/word_segmentation.html
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ <h1>Source code for pycantonese.word_segmentation</h1><div class="highlight"><pr
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
281 changes: 2 additions & 279 deletions docs/_modules/pylangacq/chat.html

Large diffs are not rendered by default.

34 changes: 28 additions & 6 deletions docs/_sources/data.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,34 @@ the CC BY-NC-SA 3.0 license.
As of March 2021, the following Cantonese-related datasets are
available from CHILDES and TalkBank (in alphabetical order):

.. invisible-code-block: python
>>> import os
.. skip: start if(os.getenv("CI") == "true", reason="certain CHILDES data pulls fail in some but not all python versions for unknown reasons")
* `Child Heritage Chinese Corpus <https://childes.talkbank.org/access/Biling/CHCC.html>`_

.. code-block:: python
>>> url = "https://childes.talkbank.org/data/Biling/CHCC.zip"
>>> corpus = pycantonese.read_chat(url)
>>> corpus.n_files()
190
>>> len(corpus.words())
533877
* `Guthrie Bilingual Corpus <https://childes.talkbank.org/access/Biling/Guthrie.html>`_

.. code-block:: python
>>> url = "https://childes.talkbank.org/data/Biling/Guthrie.zip"
>>> corpus = pycantonese.read_chat(url)
>>> corpus.n_files()
36
>>> len(corpus.words())
70438
* `HKU-70 Corpus <https://childes.talkbank.org/access/Chinese/Cantonese/HKU.html>`_

.. code-block:: python
Expand All @@ -76,12 +104,6 @@ available from CHILDES and TalkBank (in alphabetical order):
>>> len(corpus.words())
178270
.. invisible-code-block: python
>>> import os
.. skip: start if(os.getenv("CI") == "true", reason="certain CHILDES data pulls fail in some but not all python versions for unknown reasons")
* `Lee-Wong-Leung Corpus <https://childes.talkbank.org/access/Chinese/Cantonese/LeeWongLeung.html>`_

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/api.html
Original file line number Diff line number Diff line change
Expand Up @@ -1601,7 +1601,7 @@ <h2><a class="reference internal" href="#pycantonese.jyutping.Jyutping" title="p
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/changelog.html
Original file line number Diff line number Diff line change
Expand Up @@ -641,7 +641,7 @@ <h2>[0.1] - 2014-12-17<a class="headerlink" href="#id40" title="Permalink to thi
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
28 changes: 25 additions & 3 deletions docs/data.html
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,30 @@ <h2>CHILDES and TalkBank Data<a class="headerlink" href="#childes-and-talkbank-d
<p>As of March 2021, the following Cantonese-related datasets are
available from CHILDES and TalkBank (in alphabetical order):</p>
<ul>
<li><p><a class="reference external" href="https://childes.talkbank.org/access/Biling/CHCC.html">Child Heritage Chinese Corpus</a></p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;https://childes.talkbank.org/data/Biling/CHCC.zip&quot;</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">corpus</span> <span class="o">=</span> <span class="n">pycantonese</span><span class="o">.</span><span class="n">read_chat</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">corpus</span><span class="o">.</span><span class="n">n_files</span><span class="p">()</span>
<span class="go">190</span>
<span class="gp">&gt;&gt;&gt; </span><span class="nb">len</span><span class="p">(</span><span class="n">corpus</span><span class="o">.</span><span class="n">words</span><span class="p">())</span>
<span class="go">533877</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p><a class="reference external" href="https://childes.talkbank.org/access/Biling/Guthrie.html">Guthrie Bilingual Corpus</a></p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;https://childes.talkbank.org/data/Biling/Guthrie.zip&quot;</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">corpus</span> <span class="o">=</span> <span class="n">pycantonese</span><span class="o">.</span><span class="n">read_chat</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">corpus</span><span class="o">.</span><span class="n">n_files</span><span class="p">()</span>
<span class="go">36</span>
<span class="gp">&gt;&gt;&gt; </span><span class="nb">len</span><span class="p">(</span><span class="n">corpus</span><span class="o">.</span><span class="n">words</span><span class="p">())</span>
<span class="go">70438</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p><a class="reference external" href="https://childes.talkbank.org/access/Chinese/Cantonese/HKU.html">HKU-70 Corpus</a></p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;https://childes.talkbank.org/data/Chinese/Cantonese/HKU.zip&quot;</span>
Expand All @@ -362,8 +386,6 @@ <h2>CHILDES and TalkBank Data<a class="headerlink" href="#childes-and-talkbank-d
</div>
</div></blockquote>
</li>
</ul>
<ul>
<li><p><a class="reference external" href="https://childes.talkbank.org/access/Chinese/Cantonese/LeeWongLeung.html">Lee-Wong-Leung Corpus</a></p>
<blockquote>
<div><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;https://childes.talkbank.org/data/Chinese/Cantonese/LeeWongLeung.zip&quot;</span>
Expand Down Expand Up @@ -456,7 +478,7 @@ <h2>Custom Data<a class="headerlink" href="#custom-data" title="Permalink to thi
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.CHATReader.html
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,7 @@ <h1>pycantonese.CHATReader<a class="headerlink" href="#pycantonese-chatreader" t
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.CHATReader.search.html
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ <h1>pycantonese.CHATReader.search<a class="headerlink" href="#pycantonese-chatre
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.characters_to_jyutping.html
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ <h1>pycantonese.characters_to_jyutping<a class="headerlink" href="#pycantonese-c
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.hkcancor.html
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ <h1>pycantonese.hkcancor<a class="headerlink" href="#pycantonese-hkcancor" title
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.jyutping_to_tipa.html
Original file line number Diff line number Diff line change
Expand Up @@ -358,7 +358,7 @@ <h1>pycantonese.jyutping_to_tipa<a class="headerlink" href="#pycantonese-jyutpin
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.jyutping_to_yale.html
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ <h1>pycantonese.jyutping_to_yale<a class="headerlink" href="#pycantonese-jyutpin
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.parse_jyutping.html
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ <h1>pycantonese.parse_jyutping<a class="headerlink" href="#pycantonese-parse-jyu
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.pos_tag.html
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,7 @@ <h1>pycantonese.pos_tag<a class="headerlink" href="#pycantonese-pos-tag" title="
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.pos_tagging.hkcancor_to_ud.html
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ <h1>pycantonese.pos_tagging.hkcancor_to_ud<a class="headerlink" href="#pycantone
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.read_chat.html
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,7 @@ <h1>pycantonese.read_chat<a class="headerlink" href="#pycantonese-read-chat" tit
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.segment.html
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ <h1>pycantonese.segment<a class="headerlink" href="#pycantonese-segment" title="
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/generated/pycantonese.stop_words.html
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,7 @@ <h1>pycantonese.stop_words<a class="headerlink" href="#pycantonese-stop-words" t
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ <h1>pycantonese.word_segmentation.Segmenter<a class="headerlink" href="#pycanton
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -570,7 +570,7 @@ <h2 id="W">W</h2>
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ <h2>Table of Contents<a class="headerlink" href="#table-of-contents" title="Perm
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/jyutping.html
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,7 @@ <h2>Jyutping-to-TIPA Conversion<a class="headerlink" href="#jyutping-to-tipa-con
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/papers.html
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/pos_tagging.html
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,7 @@
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/reader.html
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,7 @@ <h2>Word Frequencies and Ngrams<a class="headerlink" href="#word-frequencies-and
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/search.html
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/searches.html
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,7 @@
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 21, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 23, 2021

</p>
</div>
Expand Down
Loading

0 comments on commit 5542fde

Please sign in to comment.