Skip to content

Releases: apparebit/demicode

v1.4: Ready for Unicode 16.0

23 Aug 21:28
147df91
Compare
Choose a tag to compare

This release updates Demicode with support for the upcoming release of Unicode 16.0. That includes the ability to run with prerelease data in general and to run code generation without requiring full access to the Unicode character database files (which creates a circular dependency and results in a crash).

Unicode 16.0 again makes substantial changes to the definition of grapheme clusters. Nonetheless, Demicode's implementation of grapheme cluster breaking passed all updated tests without requiring any changes. I see that as validation of Demicode's approach, which uses a clever encoding of Unicode properties as Unicode letters and a straight-forward regular expression obtained by applying the encoding to the rules from Unicode Standard Annex #29 on text segmentation.

Since the preliminary files for version 16.0 of the Unicode Character Database have already been posted on Unicode's website, you too can run Demicode 1.4 with the prerelease data. Just add the --ucd-version 16.0.0 option on the command line. Without that option, Demicode continues to default to Unicode 15.1—until the next weekly update check after the release of Unicode 16.0. By contrast, Demicode 1.3 fails with an error declaring that Unicode 16.0 is "from future." Well, with Demicode 1.4, the future is now! 🎉

v1.3: Easier experiments and a bug fix

07 Jan 22:27
5d153fb
Compare
Choose a tag to compare

This release greatly simplifies running demicode across several popular terminal emulators, at least on macOS. It also fixes #1.

v1.2 gains benchmarking, improves mirroring and testing

30 Oct 20:59
c36ac0a
Compare
Choose a tag to compare

With this release, demicode gains the ability to benchmark page rendering. Initial results for nine terminal applications suggest that all of them are reasonably fast at rendering styled text, taking 4–9ms for a 120×40 page on a four-year-old macOS laptop. But when demicode queries the terminal for the current column (once each for 38 of those 40 lines), the spread of average latencies explodes to 10-946ms. Judging by these results, it seems that a few terminals strongly oversell their nimbleness.

This release also improves the mirroring of UCD and CLDR data, introducing a from the ground rewrite that uses an explicit manifest to track what data has been mirrored. To see for yourself, --ucd-list-versions lists the UCD versions included in the current mirror. The implementation also is more structured and performs more aggressive error checking. As of today, demicode is using GitHub actions for CI, which hopefully ensures that demicode releases become only more robust.

v1.1: A critical bug fix, a nice-to-have feature, and better tooling

17 Oct 21:33
de5d7c9
Compare
Choose a tag to compare

User-Visible Changes

This release makes the following major changes:

  • It fixes a crashing bug for mirrored CLDR files.
  • It improves terminal input/output, notably by --incrementally/-I displaying character blots. That does markedly slow down tool output. But it also allows for measuring the size of character blots by querying the terminal.

Internal Changes

This release also makes significant internal changes. Notably, the UCD implementation is becoming more uniform and more decoupled. The long-term goal is to provide a generally useful UCD abstraction that may not be the fastest but has excellent support for exploratory coding against the UCD.

The development setup has also been updated. Instead of mypy, demicode now uses pyright for type-checking. In my experience, pyright is more accurate than mypy for the same annotations. It has also surfaced two very subtle bugs. They both are fixed.

The runtest.py script runs both type checker and unit tests. Tests are based on Python's unittest package because I find pytest too invasive and too magical, which always ends up interfering with tests in the long term. Unfortunately, unittest is rather baroque and hard to extend because (1) its interfaces are too wide and (2) it hides critical state. The test.runtime module introduces adapter classes that fix these issues for unittest.TestCase and unittest.TestResult. The test script uses them to provide more readable and helpful output.

v1.0 Demicode Is All Grown Up 🎉

19 Sep 16:13
605f42a
Compare
Choose a tag to compare

This version adds support for Unicode 15.1. Notably, it incorporates the changes to the grapheme cluster breaking algorithm, which changed substantially since Unicode 15.0. The changes are automatically activated when UnicodeCharacterDatabase is instantiated with 15.1 and they are effectively no-ops for 15.0 and earlier.

The --stats option now prints the bit-width for Unicode properties, too. It also includes data on code points that have non-default values for both the Indic_Conjunct_Break and Grapheme_Cluster_Break properties. Such overlap matters because both properties help determine grapheme cluster breaks. If feasible, integrating both into the same enumeration with single letter enumeration constant values simplifies the implementation of the break algorithm significantly.

v1.0.b1 A Better UI, Refactored Unicode Database

12 Sep 21:06
c5b1496
Compare
Choose a tag to compare

Demicode's user experience is much improved: It now pages back and forth. On Linux and macOS it only takes a keypress—take your pick: ‹left›/‹right›, b/f, p/n, ‹tab›/‹shift-tab›, ‹space›/‹delete›—to select the next page. For now, Windows still requires you to type a letter, backward/forward, previous/next work too, and then follow the letter or command with ‹return›. Though ‹return› by itself continues to page forward as well.

This release has been tested with all known Unicode versions from 4.1 forward and does run with them. It also removes several unused Unicode properties that are likely to remain so and introduces several more, which will be needed for implementing grapheme cluster breaks according to the revised Unicode 15.1 algorithm.

The new --with-ucd-extended-pictographic command line option blots all characters that have the Extended_Pictographic property, including unassigned ones. Since that's quite the mouthful and the set of characters especially important for fixed-width rendering, the much shorter -x works, too. Similarly, --with-curation has -q as an alias.

Internally, this release incorporates a significant refactor of the code for loading Unicode Character Database files. Much of the clutter and boilerplate has been eliminated, since I finally found a pattern that is both simple and also flexible enough to accommodate the loading of most files: It requires two lines, one for the context manager that mirrors and opens the file and one for the parser, with a callback constructing the desired datatype. The global UCD singleton instance has been eliminated as well. A direct beneficiary is statistics collection with --stats: It now uses its own private instance and can hence print counts for both the unoptimized and optimized internal representation in one run.

There are no more features to add nor modules to refactor. At least no in the short term. Once Unicode 15.1 has been released, I'll update the grapheme cluster breaking algorithm to account for Indic syllables as well. So please consider this first beta more or less a release candidate for the big 1.0.0, too.

v0.7 Approaching 1.0

07 Sep 02:30
f05419f
Compare
Choose a tag to compare

Starting with this release, demicode clearly distinguishes between user errors and unexpected exceptions, even if it internally uses exceptions for both. For the former, it only prints the error message. For the latter, it also prints an exception trace and points to the issue tracker. Demicode's output of statistics with the --stats option has been significantly improved as well.

The test script has been modularized using Python's builtin unittest module. You can run tests with ./runtest.py or with Visual Studio Code, the latter thanks to the configuration in .vscode/settings.json. In preparation of the release of Unicode 15.1, the versions for code generation have been locked down. In particular, testing grapheme cluster breaks now is specific to Unicode 15.0, since 15.1 updates the algorithm.

v0.6 Handle Older Unicode Versions

05 Sep 17:58
c7c39cb
Compare
Choose a tag to compare

Demicode won't crash when ingesting UCD files from Unicode versions before 13.0.0 any more. The lack of some information and the presence of outdated property values are now gracefully ignored.

This release also changes how unassigned code points and sequences of more than one grapheme cluster are handled. Assuming that they may just be valid for some future version of the Unicode standard than the currently active one, demicode now elides blots for them and adds an explanatory note instead of the (non-existent) name.

v0.5 Faster UCD look ups

04 Sep 14:32
1a6766d
Compare
Choose a tag to compare

In addition to considerable clean-up of demicode's internal code, the tool now optimizes UCD data for faster look ups. Several of the --with-… selections have been improved. In particular --with-version-oracle now displays exactly one emoji per detectable Unicode version.

v0.4 Make Mirroring and Width Computation Great Again

02 Sep 00:56
0acbc65
Compare
Choose a tag to compare
  1. This release fixes a bug in the URL creation logic for mirroring and now mirrors UCD and CLDR files to the operating system's cache directory.
  2. Furthermore, it significantly streamlines the computation of grapheme cluster width, which now takes all emoji into account. That yields significantly better and more consistent results than the wcwidth solely based on Unicode's East Asian Width.
  3. Finally, this release further modularizes the code, with the mirroring logic now in its own module.