Skip to content

Releases: jacksonllee/pycantonese

v3.1.0.dev3

06 Dec 01:04
Compare
Choose a tag to compare
v3.1.0.dev3 Pre-release
Pre-release

This is another development release towards v3.1.0. Compared to v3.1.0.dev2, this dev release has more word segmentation issues fixed in order to improve part-of-speech tagging being developed.

Installing this version from the GitHub source requires Git LFS on your system, if it's not already installed.

Corresponding PyPI release: https://pypi.org/project/pycantonese/3.1.0.dev3/

v3.1.0.dev2

10 Nov 12:15
Compare
Choose a tag to compare
v3.1.0.dev2 Pre-release
Pre-release

This is a development release to tag some unreleased features, particularly a part-of-speech tagger under development. (Installing this version from the GitHub source likely requires Git LFS on your system.)

Corresponding PyPI release: https://pypi.org/project/pycantonese/3.1.0.dev2/

v3.0.0

26 Oct 02:03
Compare
Choose a tag to compare

[3.0.0] - 2020-10-25

Added

  • Word segmentation:
    • Segmentation is now customizable for the following:
      • Maximum word length
      • A user-supplied list of words to allow as words
      • A user-supplied list of words to disallow as words
    • The default segmentation model has been improved with the rime-cantonese data (CC BY 4.0 license).
  • Characters-to-Jyutping conversion:
    • The conversion returns results in a word-segmented form.
    • The conversion model has been improved with the rime-cantonese data (CC BY 4.0 license).
  • Added the following functions; they are equivalent to their (now deprecated)
    x2y counterparts:
    • characters_to_jyutping
    • jyutping_to_tipa
    • jyutping_to_yale
  • Added support for Python 3.9.

Changed

API-breaking Changes

  • jyutping_to_yale: The default value of the keyword argument as_list has
    been changed from False to True, so that this function is now more in
    line with the other "jyutping_to_X" functions for returning a list.
  • characters_to_jyutping: The returned value is now a list of segmented words,
    where each is a 2-tuple of (Cantonese characters, Jyutping).
    Previously, it was a list of Jyutping strings for the individual
    Cantonese characters.

Non-API-breaking Changes

  • Switched documentation to the readthedocs theme and numpydoc docstring style.
  • Improved CircleCI builds with orbs.

Deprecated

  • The following x2y functions have been deprecated in favor of their
    counterparts named as x_to_y.
    • characters2jyutping
    • jyutping2tipa
    • jyutping2yale

Security

  • Turned on HTTPS for the pycantonese.org domain.

v2.4.1

11 Oct 03:18
Compare
Choose a tag to compare

[2.4.1] - 2020-10-10

Fixed

  • Switched the wordseg dependency to the PyPI source instead of a GitHub direct link.

v2.4.0

11 Oct 02:45
Compare
Choose a tag to compare

[2.4.0] - 2020-10-10

Added

  • Added the characters2jyutping() function for converting
    Cantonese characters to Jyutping romanization.
  • Added the segment() function for word segmentation.

v2.3.0

24 Jul 20:24
Compare
Choose a tag to compare

[2.3.0] - 2020-07-24

Added

  • Added support for Python 3.7 and 3.8.

Removed

  • Dropped support for Python 3.4 and 3.5 (supporting 3.6, 3.7, and 3.8 now).

v2.2.0

01 Jul 04:45
Compare
Choose a tag to compare

[2.2.0] - 2018-06-30

Added

  • 104 stop words.

v2.1.0

11 Jun 05:57
Compare
Choose a tag to compare

[2.1.0] - 2018-06-11

Added

  • Exposed the exclude parameter in various reader methods
    for excluding specific participants. This parameter was implemented at
    pylangacq v0.10.0.

Fixed

  • Allowed "n" to be a syllabic nasal.
  • Fixed corpus reader not picking up the characters.

v2.0.0 release

07 Feb 05:28
Compare
Choose a tag to compare

Major update: Shift to the CHAT transcription format for HKCanCor and custom corpus datasets.

v1.0 release

07 Sep 05:23
Compare
Choose a tag to compare
  • Overall code restructuring
  • Only Python 3.x is supported from this point onwards
  • Used generators instead of lists for corpus access methods
  • Added the part-of-speech search criterion
  • Added Jyutping-to-Yale conversion
  • Added Jyutping-to-TIPA conversion
  • Disabled the function for reading a custom corpus dataset (it will come back)