Skip to content

SichangHe/natural_syntax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language Syntax Highlighting

Natural-Syntax-LS is a language server that highlights different parts of speech (POS) in plain text.

Full highlighting Partially disabled (via customization)
Full highlighting Partially disabled

Installation

  1. Download libtorch v2.1 as per Rust-BERT's documentation.

    Tips.

    You can figure out the URL to download libtorch in tch-rs' build script. The LIBTORCH variable should be the torch/ directory.

    Why automatic installation does not work.

    Rust-BERT has an "automatic installation" option that uses tch-rs' build script to download libtorch. However, the binary produced this way does not run because that libtorch is not on LD_LIBRARY_PATH. Alternatively, you could statically link libtorch, but that would require you to download libtorch yourself anyway.

  2. Install the natural_syntax_ls package with Cargo or friends to get the natural-syntax-ls binary:

    cargo install natural_syntax_ls --no-default-features

    Setting --no-default-features disables downloading libtorch (automatic installation).

    Why automatic installation is the default.

    Because otherwise it would be a pain to run the continuous integration.

Editor setup

✅ NeoVim setup with LSPConfig

Please paste the below register_natural_syntax_ls function in your Nvim configuration, call it, and set up natural_syntax_ls like any other LSPConfig language server. Please see my config for an example.

The natural_syntax_ls_setup function.
local function natural_syntax_ls_setup(capabilities)
    require('lspconfig.configs').natural_syntax_ls = {
        default_config = {
            cmd = { 'natural-syntax-ls' },
            filetypes = { 'text' },
            single_file_support = true,
        },
        docs = {
            description = [[The Natural Syntax Language Server for highlighting parts of speech.]],
        },
    }
end

You can customize by setting init_options when calling the setup function:

require('lspconfig')['natural_syntax_ls'].setup {
    init_options = {
        token_map_update = { -- Customize your POS-token mapping here. E.g.:
            -- Disable coordinating conjunctions highlighting.
            CC = vim.NIL, -- `nil` does not work because it gets ignored.
            -- Highlight wh-determiners as enum members without any modifiers.
            WDT = { type = "enumMember" },
            -- Highlight determiners as read-only classes.
            DT = { type = "class", modifiers = { "readonly" } },
        },
    },
}

Customizations:

  • I only set the filetypes field to text, but you can enable natural-syntax-ls for any other file types as well. Note that, though, the language server's semantic tokens supersede Tree-sitter highlighting by default.
  • By specifying the token_map_update field in init_options, you can customize the mapping between parts of speech and semantic tokens.
    • The default mapping is in the pos2token_bits function in semantic_tokens.rs.
    • Part of speech tags are the variants of the PartOfSpeech enum in lib.rs.
    • Token types and modifiers are variants of TokenType and TokenModifier in semantic_tokens.rs, all in camelCase.

❓ Visual Studio Code and other editor setup

No official support, but community plugins are welcome.

I do not currently use VSCode and these other editors, so I do not wish to maintain plugins for them.

However, it should be straightforward to implement plugins for them since Natural-Syntax-LS implements the Language Server Protocol (LSP). So, please feel free to make a plugin yourself and create an issue for me to link it here.

Selected specification

Prediction Scheduling

For a single document, only one prediction is scheduled at a time. When a prediction is ongoing, new updates are queued and the latest update replaces any previous updates queued.

Debugging

We use tracing-subscriber with the env-filter feature to emit logs1. Please configure the log level by setting the RUST_LOG environment variable.

On macOS, you may need to set DYLD_LIBRARY_PATH to run the tests.

Future work

  • Customizing the mapping between part of speech and semantic token.
  • Support languages other than English. This simply requires a new model.
  • Incremental updates and semantic token ranges.
  • Do not overwrite Markdown/LaTeX syntax highlighting.

Footnotes

  1. https://docs.rs/tracing-subscriber/latest/tracing_subscriber/#feature-flags