Skip to content

Commit

Permalink
MAINT update changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
jacksonllee committed Jul 1, 2018
1 parent bf83eda commit 07b38fb
Show file tree
Hide file tree
Showing 2 changed files with 131 additions and 3 deletions.
7 changes: 4 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,16 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
## [Unreleased]

### Added

* 104 stop words.

### Changed
### Deprecated
### Removed
### Fixed
### Security

## [2.2.0] - 2018-06-30

### Added
* 104 stop words.

## [2.1.0] - 2018-06-11

Expand Down
127 changes: 127 additions & 0 deletions docs/stop_words.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Stop Words &#8212; PyCantonese 2.2.0 documentation</title>
<link rel="stylesheet" href="_static/sphinxdoc.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Corpus Reader Methods" href="reader.html" />
<link rel="prev" title="Corpus Data" href="data.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="reader.html" title="Corpus Reader Methods"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="data.html" title="Corpus Data"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">PyCantonese 2.2.0 documentation</a> &#187;</li>
</ul>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h4>Previous topic</h4>
<p class="topless"><a href="data.html"
title="previous chapter">Corpus Data</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="reader.html"
title="next chapter">Corpus Reader Methods</a></p>
<div id="searchbox" style="display: none" role="search">
<h3>Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>

<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">

<div class="section" id="stop-words">
<span id="id1"></span><h1>Stop Words<a class="headerlink" href="#stop-words" title="Permalink to this headline"></a></h1>
<p>In many natural language processing tasks, it is often necessary to filter
stop words, English examples of which include function words such as
pronouns and determiners. PyCantonese provides the function <code class="docutils literal notranslate"><span class="pre">stop_words()</span></code>
that returns a set of about 100 Cantonese stop words:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">pycantonese</span> <span class="kn">as</span> <span class="nn">pc</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">stop_words</span> <span class="o">=</span> <span class="n">pc</span><span class="o">.</span><span class="n">stop_words</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="nb">len</span><span class="p">(</span><span class="n">stop_words</span><span class="p">)</span>
<span class="go">104</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">stop_words</span>
<span class="go">{&#39;一啲&#39;, &#39;一定&#39;, &#39;不如&#39;, &#39;不過&#39;, ...}</span>
</pre></div>
</div>
<p>Depending on your use cases, you may like to add or remove stop words
from the default ones.
The <code class="docutils literal notranslate"><span class="pre">stop_words()</span></code> function has the optional arguments of <code class="docutils literal notranslate"><span class="pre">add</span></code> and
<code class="docutils literal notranslate"><span class="pre">remove</span></code>.</p>
<p><code class="docutils literal notranslate"><span class="pre">add</span></code> can either be a string (e.g., treat <code class="docutils literal notranslate"><span class="pre">'香港'</span></code> as a stop word if your
data is all about Hong Kong) or an iterable of strings:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">pycantonese</span> <span class="kn">as</span> <span class="nn">pc</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">stop_words_1</span> <span class="o">=</span> <span class="n">pc</span><span class="o">.</span><span class="n">stop_words</span><span class="p">(</span><span class="n">add</span><span class="o">=</span><span class="s1">&#39;香港&#39;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="nb">len</span><span class="p">(</span><span class="n">stop_words_1</span><span class="p">)</span>
<span class="go">105</span>
<span class="gp">&gt;&gt;&gt; </span><span class="s1">&#39;香港&#39;</span> <span class="ow">in</span> <span class="n">stop_words_1</span>
<span class="go">True</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">stop_words_2</span> <span class="o">=</span> <span class="n">pc</span><span class="o">.</span><span class="n">stop_words</span><span class="p">(</span><span class="n">add</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;香港島&#39;</span><span class="p">,</span> <span class="s1">&#39;九龍&#39;</span><span class="p">,</span> <span class="s1">&#39;新界&#39;</span><span class="p">])</span>
<span class="gp">&gt;&gt;&gt; </span><span class="nb">len</span><span class="p">(</span><span class="n">stop_words_2</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="mi">107</span>
<span class="gp">&gt;&gt;&gt; </span><span class="p">{</span><span class="s1">&#39;香港島&#39;</span><span class="p">,</span> <span class="s1">&#39;九龍&#39;</span><span class="p">,</span> <span class="s1">&#39;新界&#39;</span><span class="p">}</span><span class="o">.</span><span class="n">issubset</span><span class="p">(</span><span class="n">stop_words_2</span><span class="p">)</span>
<span class="go">True</span>
</pre></div>
</div>
<p>Similarly, the <code class="docutils literal notranslate"><span class="pre">remove</span></code> argument can also take either a string or an iterable
of strings.</p>
</div>


</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="reader.html" title="Corpus Reader Methods"
>next</a> |</li>
<li class="right" >
<a href="data.html" title="Corpus Data"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">PyCantonese 2.2.0 documentation</a> &#187;</li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2014-2018, Jackson L. Lee | Documentation last updated on June 30, 2018.
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.7.5.
</div>
</body>
</html>

0 comments on commit 07b38fb

Please sign in to comment.