Release v0.13.0 · jacksonllee/pylangacq

[0.13.0] - 2021-03-15

API-breaking changes:
The Reader class has been completely rewritten.
A couple methods have been removed, while others have been renamed.
For methods that remain (renamed or not),
their behavior for output data structure and arguments allowed has been changed.
The details are in the following.

Added

New classmethods of Reader for reader instantiation:
- from_zip
- from_dir
New classes to better structure CHAT data:
- Utterance
- Token
- Gra
New Reader methods:
- append_left, extend, extend_left, pop, pop_left
- tokens (which gives Token objects, essentially the "tagged words" from before)
In the header dictionary, each participant's info has the new key "dob"
for date of birth (if the info is available in the CHAT header).
The corresponding value is a datetime.date object.
(The same info was previously exposed as the Reader method date_of_birth,
now removed.)
The test suite now covers code snippets in both the docstrings and .rst doc files.

Changed

CHAT parsing in Reader instantiation has been completely rewritten.
The previous private class _SingleReader has been removed.
This private class duplicated a lot of the Reader code,
which made it hard to make changes.
The Reader rewrite has also greatly sped up the reading and parsing of CHAT data.
The by_files argument, which many Reader methods has,
now gives you a simpler list of results for each data file,
no longer the previous output of a dict that mapped a file path to the file's
result.
The participant argument, which many Reader methods has for specifying
which participants' data to include in the output, has been renamed as
participants to avoid confusion. There is no change to its behavior of
handling either a single string (e.g., "CHI") or a collection of strings
(e.g., {"CHI", "MOT"}) .
The following Reader methods have been renamed as indicated,
some for stylistic or Pythonic reasons, others for reasons as given:
- age -> ages
- number_of_utterances -> n_utterances
- number_of_files -> n_files
- filenames -> file_paths
- MLU -> mlu
- MLUm -> mlum
- MLUw -> mluw
- TTR -> ttr
- IPSyn -> ipsyn
- word_frequency -> word_frequencies
- from_chat_str -> from_strs
- from_chat_files -> from_files
- add -> append.
  Since the data files in a Reader have a natural ordering (by time of
  recording sessions, and therefore commonly by file paths as well),
  a reader is list-like rather than an unordered set of data files,
  which add would suggest.
- participant_codes -> participants.
  Before this version, the methods participant_codes (for CHI, MOT, etc) and
  participants (for, say, Eve, Mother, Investigator, etc) co-existed,
  but in practice we mostly only care about CHI, MOT, etc.
  So the method participants for Eve etc has been removed,
  and participant_codes has been renamed as participants.
Each participant's info in a header dictionary has these keys renamed:
- participant_name -> name
- participant_role -> role
- SES -> ses (socioeconomic status)
The class DependencyGraph has been made private
(i.e., now _DependencyGraph with a leading underscore).
Its functionality hasn't really changed (it's used in the computation of IPSyn).
It may be made more visible again in the future if more functionality
related to grammatical relations is developed in the package.
Switched to sphinx-rtd-theme as the documentation theme.
Switched to CircleCI orbs; update dev requirements' versions.

Deprecated

The following Reader methods have been deprecated:
- tagged_sents (use tokens with by_utterances=True instead)
- tagged_words (use tokens with by_utterances=False instead)
- sents (use words with by_utterances=True instead)

Removed

The following methods of the Reader class have been removed:
- abspath. Use file_paths instead.
- index_to_tiers. All the unparsed tiers are now available from utterances.
- participant_codes. It's been renamed as participants, another method now removed; see "Changed" above.
- part_of_speech_tags
- update and remove. A reader is a list-like collection of CHAT data files,
  not a set (which update and remove would suggest).
- search and concordance. To search, use one of
  the words, tokens, and utterances methods to walk through a reader's CHAT data
  and keep track of elements of interest.
- date_of_birth. The info is now available under headers, in each participant's
  "dob" key.

Fixed

Handled [/-] in cleaning utterances.
[x <number>] means a repetition of the previous word/item, not repetition
of the entire utterance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.13.0