Skip to content

Commit

Permalink
ENH use Jyutping class
Browse files Browse the repository at this point in the history
  • Loading branch information
jacksonllee committed Mar 20, 2021
1 parent 0954575 commit b620b75
Show file tree
Hide file tree
Showing 62 changed files with 411 additions and 172 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,12 @@ as well. The details are in
The changelog entries below only document updates specific to PyCantonese.

### Added
* Defined the `Jyutping` class to better represent parsed Jyutping romanization.

### Changed
* Bumped the PyLangAcq dependency to v0.13.0.
* The function `parse_jyutping` now returns a list of `Jyutping` objects,
rather than tuples of strings.

### Deprecated

Expand Down
30 changes: 16 additions & 14 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,25 +72,27 @@ With PyCantonese imported:
>>> corpus = pycantonese.hkcancor() # get HKCanCor
>>> all_verbs = corpus.search(pos='^V')
>>> len(all_verbs) # number of all verbs
29012
29726
>>> all_verbs[:10] # print 10 results
[('', 'V', 'heoi3', ''),
('', 'V', 'heoi3', ''),
('旅行', 'VN', 'leoi5hang4', ''),
('有冇', 'V1', 'jau5mou5', ''),
('', 'VU', 'jiu3', ''),
('有得', 'VU', 'jau5dak1', ''),
('冇得', 'VU', 'mou5dak1', ''),
('', 'V', 'heoi3', ''),
('', 'V', 'hai6', ''),
('', 'V', 'hai6', '')]
4. Parsing Jyutping for (onset, nucleus, coda, tone)
[Token(word='', pos='V', jyutping='heoi3', mor=None, gra=None),
Token(word='', pos='V', jyutping='heoi3', mor=None, gra=None),
Token(word='旅行', pos='VN', jyutping='leoi5hang4', mor=None, gra=None),
Token(word='有冇', pos='V1', jyutping='jau5mou5', mor=None, gra=None),
Token(word='', pos='VU', jyutping='jiu3', mor=None, gra=None),
Token(word='有得', pos='VU', jyutping='jau5dak1', mor=None, gra=None),
Token(word='冇得', pos='VU', jyutping='mou5dak1', mor=None, gra=None),
Token(word='', pos='V', jyutping='heoi3', mor=None, gra=None),
Token(word='', pos='V', jyutping='hai6', mor=None, gra=None),
Token(word='', pos='V', jyutping='hai6', mor=None, gra=None)]
4. Parsing Jyutping for the onset, nucleus, coda, and tone

.. code-block:: python
>>> pycantonese.parse_jyutping('gwong2dung1waa2') # 廣東話
[('gw', 'o', 'ng', '2'), ('d', 'u', 'ng', '1'), ('w', 'aa', '', '2')]
[Jyutping(onset='gw', nucleus='o', coda='ng', tone='2'),
Jyutping(onset='d', nucleus='u', coda='ng', tone='1'),
Jyutping(onset='w', nucleus='aa', coda='', tone='2')]
Download and Install
--------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 61780d602a8de6c6a6297ccf492a8a73
config: e2ce3b1b15a78089ebb65d4eae880b78
tags: 645f666f9bcd5a90fca523b33c5a78b7
3 changes: 2 additions & 1 deletion docs/_modules/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
</li>
<li class="toctree-l2"><a class="reference internal" href="../api.html#chatreader"><code class="xref py py-class docutils literal notranslate"><span class="pre">CHATReader</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../api.html#token"><code class="xref py py-class docutils literal notranslate"><span class="pre">Token</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../api.html#jyutping"><code class="xref py py-class docutils literal notranslate"><span class="pre">Jyutping</span></code></a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../changelog.html">Changelog</a><ul>
Expand Down Expand Up @@ -305,7 +306,7 @@ <h1>All modules for which code is available</h1>
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 19, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 20, 2021

</p>
</div>
Expand Down
3 changes: 2 additions & 1 deletion docs/_modules/pycantonese/corpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
</li>
<li class="toctree-l2"><a class="reference internal" href="../../api.html#chatreader"><code class="xref py py-class docutils literal notranslate"><span class="pre">CHATReader</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../../api.html#token"><code class="xref py py-class docutils literal notranslate"><span class="pre">Token</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../../api.html#jyutping"><code class="xref py py-class docutils literal notranslate"><span class="pre">Jyutping</span></code></a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../changelog.html">Changelog</a><ul>
Expand Down Expand Up @@ -691,7 +692,7 @@ <h1>Source code for pycantonese.corpus</h1><div class="highlight"><pre>
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 19, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 20, 2021

</p>
</div>
Expand Down
5 changes: 3 additions & 2 deletions docs/_modules/pycantonese/jyutping/characters.html
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
</li>
<li class="toctree-l2"><a class="reference internal" href="../../../api.html#chatreader"><code class="xref py py-class docutils literal notranslate"><span class="pre">CHATReader</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../api.html#token"><code class="xref py py-class docutils literal notranslate"><span class="pre">Token</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../api.html#jyutping"><code class="xref py py-class docutils literal notranslate"><span class="pre">Jyutping</span></code></a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../../changelog.html">Changelog</a><ul>
Expand Down Expand Up @@ -315,7 +316,7 @@ <h1>Source code for pycantonese.jyutping.characters</h1><div class="highlight"><
<span class="k">continue</span>
<span class="n">words_to_jyutping_counters</span><span class="p">[</span><span class="n">word</span><span class="p">][</span><span class="n">jyutping</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">char</span><span class="p">,</span> <span class="n">jp</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">parsed_jp</span><span class="p">):</span>
<span class="n">characters_to_jyutping_counters</span><span class="p">[</span><span class="n">char</span><span class="p">][</span><span class="s2">&quot;&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">jp</span><span class="p">)]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">characters_to_jyutping_counters</span><span class="p">[</span><span class="n">char</span><span class="p">][</span><span class="nb">str</span><span class="p">(</span><span class="n">jp</span><span class="p">)]</span> <span class="o">+=</span> <span class="mi">1</span>

<span class="n">words_to_jyutping</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">word</span><span class="p">,</span> <span class="n">jyutping_counter</span> <span class="ow">in</span> <span class="n">words_to_jyutping_counters</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
Expand Down Expand Up @@ -426,7 +427,7 @@ <h1>Source code for pycantonese.jyutping.characters</h1><div class="highlight"><
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 19, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 20, 2021

</p>
</div>
Expand Down
54 changes: 46 additions & 8 deletions docs/_modules/pycantonese/jyutping/parse_jyutping.html
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
</li>
<li class="toctree-l2"><a class="reference internal" href="../../../api.html#chatreader"><code class="xref py py-class docutils literal notranslate"><span class="pre">CHATReader</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../api.html#token"><code class="xref py py-class docutils literal notranslate"><span class="pre">Token</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../../../api.html#jyutping"><code class="xref py py-class docutils literal notranslate"><span class="pre">Jyutping</span></code></a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../../changelog.html">Changelog</a><ul>
Expand Down Expand Up @@ -284,7 +285,11 @@
<div itemprop="articleBody">

<h1>Source code for pycantonese.jyutping.parse_jyutping</h1><div class="highlight"><pre>
<span></span><span class="n">ONSETS</span> <span class="o">=</span> <span class="p">{</span>
<span></span><span class="kn">import</span> <span class="nn">dataclasses</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">List</span>


<span class="n">ONSETS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">&quot;b&quot;</span><span class="p">,</span>
<span class="s2">&quot;d&quot;</span><span class="p">,</span>
<span class="s2">&quot;g&quot;</span><span class="p">,</span>
Expand Down Expand Up @@ -314,7 +319,38 @@ <h1>Source code for pycantonese.jyutping.parse_jyutping</h1><div class="highligh
<span class="n">TONES</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;1&quot;</span><span class="p">,</span> <span class="s2">&quot;2&quot;</span><span class="p">,</span> <span class="s2">&quot;3&quot;</span><span class="p">,</span> <span class="s2">&quot;4&quot;</span><span class="p">,</span> <span class="s2">&quot;5&quot;</span><span class="p">,</span> <span class="s2">&quot;6&quot;</span><span class="p">}</span>


<div class="viewcode-block" id="parse_jyutping"><a class="viewcode-back" href="../../../generated/pycantonese.parse_jyutping.html#pycantonese.parse_jyutping">[docs]</a><span class="k">def</span> <span class="nf">parse_jyutping</span><span class="p">(</span><span class="n">jp_str</span><span class="p">):</span>
<div class="viewcode-block" id="Jyutping"><a class="viewcode-back" href="../../../api.html#pycantonese.jyutping.Jyutping">[docs]</a><span class="nd">@dataclasses</span><span class="o">.</span><span class="n">dataclass</span>
<span class="k">class</span> <span class="nc">Jyutping</span><span class="p">:</span>
<span class="sd">&quot;&quot;&quot;Jyutping representation of a Chinese/Cantonese character.</span>

<span class="sd"> Attributes</span>
<span class="sd"> ----------</span>
<span class="sd"> onset : str</span>
<span class="sd"> Onset</span>
<span class="sd"> nucleus : str</span>
<span class="sd"> Nucleus</span>
<span class="sd"> coda : str</span>
<span class="sd"> Coda</span>
<span class="sd"> tone : str</span>
<span class="sd"> Tone</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="vm">__slots__</span> <span class="o">=</span> <span class="p">(</span><span class="s2">&quot;onset&quot;</span><span class="p">,</span> <span class="s2">&quot;nucleus&quot;</span><span class="p">,</span> <span class="s2">&quot;coda&quot;</span><span class="p">,</span> <span class="s2">&quot;tone&quot;</span><span class="p">)</span>
<span class="n">onset</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">nucleus</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">coda</span><span class="p">:</span> <span class="nb">str</span>
<span class="n">tone</span><span class="p">:</span> <span class="nb">str</span>

<div class="viewcode-block" id="Jyutping.__str__"><a class="viewcode-back" href="../../../api.html#pycantonese.jyutping.Jyutping.__str__">[docs]</a> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">&quot;&quot;&quot;Combine onset + nucleus + coda + tone.&quot;&quot;&quot;</span>
<span class="k">return</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">onset</span><span class="si">}{</span><span class="bp">self</span><span class="o">.</span><span class="n">nucleus</span><span class="si">}{</span><span class="bp">self</span><span class="o">.</span><span class="n">coda</span><span class="si">}{</span><span class="bp">self</span><span class="o">.</span><span class="n">tone</span><span class="si">}</span><span class="s2">&quot;</span></div>

<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">final</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">&quot;&quot;&quot;Return the final (= nucleus + coda).&quot;&quot;&quot;</span>
<span class="k">return</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">nucleus</span><span class="si">}{</span><span class="bp">self</span><span class="o">.</span><span class="n">coda</span><span class="si">}</span><span class="s2">&quot;</span></div>


<div class="viewcode-block" id="parse_jyutping"><a class="viewcode-back" href="../../../generated/pycantonese.parse_jyutping.html#pycantonese.jyutping.parse_jyutping">[docs]</a><span class="k">def</span> <span class="nf">parse_jyutping</span><span class="p">(</span><span class="n">jp_str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="n">Jyutping</span><span class="p">]:</span>
<span class="sd">&quot;&quot;&quot;Parse Jyutping romanization into onset, nucleus, code, and tone.</span>

<span class="sd"> Parameters</span>
Expand All @@ -324,7 +360,7 @@ <h1>Source code for pycantonese.jyutping.parse_jyutping</h1><div class="highligh

<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> list[tuple[str]]</span>
<span class="sd"> List[Jyutping]</span>

<span class="sd"> Raises</span>
<span class="sd"> ------</span>
Expand All @@ -335,7 +371,9 @@ <h1>Source code for pycantonese.jyutping.parse_jyutping</h1><div class="highligh
<span class="sd"> Examples</span>
<span class="sd"> --------</span>
<span class="sd"> &gt;&gt;&gt; parse_jyutping(&quot;gwong2dung1waa2&quot;) # 廣東話, Cantonese</span>
<span class="sd"> [(&#39;gw&#39;, &#39;o&#39;, &#39;ng&#39;, &#39;2&#39;), (&#39;d&#39;, &#39;u&#39;, &#39;ng&#39;, &#39;1&#39;), (&#39;w&#39;, &#39;aa&#39;, &#39;&#39;, &#39;2&#39;)]</span>
<span class="sd"> [Jyutping(onset=&#39;gw&#39;, nucleus=&#39;o&#39;, coda=&#39;ng&#39;, tone=&#39;2&#39;),</span>
<span class="sd"> Jyutping(onset=&#39;d&#39;, nucleus=&#39;u&#39;, coda=&#39;ng&#39;, tone=&#39;1&#39;),</span>
<span class="sd"> Jyutping(onset=&#39;w&#39;, nucleus=&#39;aa&#39;, coda=&#39;&#39;, tone=&#39;2&#39;)]</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">jp_str</span><span class="p">:</span>
<span class="k">return</span> <span class="p">[]</span>
Expand Down Expand Up @@ -379,7 +417,7 @@ <h1>Source code for pycantonese.jyutping.parse_jyutping</h1><div class="highligh
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&quot;coda error -- &quot;</span> <span class="o">+</span> <span class="nb">repr</span><span class="p">(</span><span class="n">jp</span><span class="p">))</span>

<span class="k">if</span> <span class="n">cvc</span> <span class="ow">in</span> <span class="p">[</span><span class="s2">&quot;m&quot;</span><span class="p">,</span> <span class="s2">&quot;n&quot;</span><span class="p">,</span> <span class="s2">&quot;ng&quot;</span><span class="p">,</span> <span class="s2">&quot;i&quot;</span><span class="p">,</span> <span class="s2">&quot;e&quot;</span><span class="p">,</span> <span class="s2">&quot;aa&quot;</span><span class="p">,</span> <span class="s2">&quot;o&quot;</span><span class="p">,</span> <span class="s2">&quot;u&quot;</span><span class="p">]:</span>
<span class="n">jp_parsed_list</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="s2">&quot;&quot;</span><span class="p">,</span> <span class="n">cvc</span><span class="p">,</span> <span class="s2">&quot;&quot;</span><span class="p">,</span> <span class="n">tone</span><span class="p">))</span>
<span class="n">jp_parsed_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">Jyutping</span><span class="p">(</span><span class="s2">&quot;&quot;</span><span class="p">,</span> <span class="n">cvc</span><span class="p">,</span> <span class="s2">&quot;&quot;</span><span class="p">,</span> <span class="n">tone</span><span class="p">))</span>
<span class="k">continue</span>
<span class="k">elif</span> <span class="n">cvc</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">:]</span> <span class="o">==</span> <span class="s2">&quot;ng&quot;</span><span class="p">:</span>
<span class="n">coda</span> <span class="o">=</span> <span class="s2">&quot;ng&quot;</span>
Expand Down Expand Up @@ -412,12 +450,12 @@ <h1>Source code for pycantonese.jyutping.parse_jyutping</h1><div class="highligh
<span class="k">if</span> <span class="n">onset</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">ONSETS</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&quot;onset error -- &quot;</span> <span class="o">+</span> <span class="nb">repr</span><span class="p">(</span><span class="n">jp</span><span class="p">))</span>

<span class="n">jp_parsed_list</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">onset</span><span class="p">,</span> <span class="n">nucleus</span><span class="p">,</span> <span class="n">coda</span><span class="p">,</span> <span class="n">tone</span><span class="p">))</span>
<span class="n">jp_parsed_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">Jyutping</span><span class="p">(</span><span class="n">onset</span><span class="p">,</span> <span class="n">nucleus</span><span class="p">,</span> <span class="n">coda</span><span class="p">,</span> <span class="n">tone</span><span class="p">))</span>

<span class="k">return</span> <span class="n">jp_parsed_list</span></div>


<span class="k">def</span> <span class="nf">parse_final</span><span class="p">(</span><span class="n">final</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">_parse_final</span><span class="p">(</span><span class="n">final</span><span class="p">):</span>
<span class="sd">&quot;&quot;&quot;Parse a final into its nucleus and coda.</span>

<span class="sd"> Parameters</span>
Expand Down Expand Up @@ -448,7 +486,7 @@ <h1>Source code for pycantonese.jyutping.parse_jyutping</h1><div class="highligh
<div role="contentinfo">
<p>

&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 19, 2021
&copy; Copyright 2014-2021, Jackson L. Lee | Documentation last updated on March 20, 2021

</p>
</div>
Expand Down
Loading

0 comments on commit b620b75

Please sign in to comment.