Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loaded shiki languages during syntax highlighting #10618

Merged
merged 10 commits into from
Apr 1, 2024

Conversation

43081j
Copy link
Contributor

@43081j 43081j commented Mar 30, 2024

Changes

There's a few changes in this PR, each contributing to memory usage improvement. Bundled into one PR since all but one are close to one-liners.

  • Lazily load shiki languages if they're not already loaded
    • Rather than falling back to plain text, we attempt to load the language first
    • We default to ['plaintext'] rather than all languages, leading to all others being lazily loaded by default
    • This results in a chunky reduction of memory since we're no longer holding shiki's many language modules in memory for no reason
  • Discarding contexts after build end
    • Some contexts were being held around permanently (i.e. retained in memory). By discarding them on build end, we allow them to be garbage collected
  • Move static image helper into a factory
    • To avoid capturing local scopes, it has been moved into a factory
    • Since we store it on globalThis, we were holding deep references to large objects before
    • Now we no longer hold those refs, they can be garbage collected

Memory benchmark

Before:

Elapsed time (s) Memory used (MB) Final memory (MB)
SSR build 3.11 91.74 128.05
Client build 0.11 -0.04 128.26
Static generate 0.91 4.88 132.53

After:

Elapsed time (s) Memory used (MB) Final memory (MB)
SSR build 3.50 61.52 97.82
Client build 0.10 -41.66 56.20
Static generate 0.90 5.02 51.29

I did a few runs and got roughly these results each time.

Testing

  • Extra test added for shiki highlighting via a lazily loaded language
  • Memory benchmarking run to observe benefit of changes
  • Full test run for regression

Docs

We may need to update docs on the fact that the default shiki languages are now ['plaintext'], although it shouldn't make much difference to a user since the rest will be lazily loaded anyway if they use them.

For reviewers

Each commit describes its own change, so it may be worth reading the commit messages and/or the diff of the individual commit if you want to understand better.

Copy link

changeset-bot bot commented Mar 30, 2024

🦋 Changeset detected

Latest commit: aaddd40

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions bot added feat: markdown Related to Markdown (scope) pkg: create-astro Related to the `create-astro` package (scope) pkg: astro Related to the core `astro` package (scope) labels Mar 30, 2024
@43081j 43081j marked this pull request as draft March 30, 2024 16:29
@43081j
Copy link
Contributor Author

43081j commented Mar 30, 2024

making this a draft for now as its possible we can't have these nice things unless rehype lets us do async 👀

edit: false alarm, looks like it is happy

@github-actions github-actions bot added the pkg: integration Related to any renderer integration (scope) label Mar 30, 2024
@43081j 43081j marked this pull request as ready for review March 30, 2024 17:22
@bluwy
Copy link
Member

bluwy commented Apr 1, 2024

Thanks! All these changes make sense to me. About shiki lazy load, we actually used to do the same but I removed it because I thought shikiji/shiki 1.0 already did that by default, but in hindsight it probably didn't because its highlighter functions are all sync anyways (which you pointed out).

About your message on Discord on version bumps, this will require a major for @astrojs/markdown-remark as it's published. Adding a major changeset for it should be enough and changesets will auto bump the rest (I hope). There's already many things within @astrojs/markdown-remark that had been marked deprecated, and this seems like the right time to clean it all up.

Would you be able to add a major changeset for@astrojs/markdown-remark? I'll handle the other major changes in another PR.

@43081j
Copy link
Contributor Author

43081j commented Apr 1, 2024

have fixed the broken test

FYI it was working because of this it seems:

// If a promise, await the result and mark that.
else if (typeof str.then === 'function') {
return Promise.resolve(str).then((value) => {
return unescapeHTML(value);
});

i suppose <Fragment set:html=${promise}> results in it awaiting the promise internally via the above, so it was working fine

edit: yeah of course, we await the child in renderChild so that was resulting in the highlight task being resolved properly

This moves the static image helper outside of the build hooks such that
the scope is no longer captured.

Since the static image helper exists on `window`, none of the function's
scope will be garbage collected at any point. Part of said scope is
`resolvedConfig`, which holds package caches, plugins, etc. These will
also be retained.

By moving the function into a factory, we no longer hold a reference to
`resolvedConfig`, leading to lower memory usage.
This discards the SSR plugin context once a build has finished, so we're
no longer holding it in memory and it can be garbage collected.
This discards the markdown processor once a build ends, allowing it to
be garbage collected.
When highlighting code via shiki, we default to enabling all available
languages. This results in shiki internally loading each individual
language module, evaluating it, and keeping hold of it in memory.

Instead, we should be able to load the bare minimum and (try) lazily load any
missing languages later on. By doing this, we're not unnecessarily
loading up dozens of large modules and will only load what the user
consumes.

Since traversal of the AST via unist is synchronous, we first do the
traversal to find the nodes and later asynchronously _process_ the nodes
(i.e. execute the highlighter and mutate the AST).
This adds a test for ensuring highlighting happens when using a lazily
loaded language (typescript in this case).
Aligning the prism plugin with the shiki plugin - exporting an async
visitor since the code block highlighter now returns a promise.
Markdoc actually supports async transforms already. Now that the shiki
highlighting is async, we also need to make our transform async.

Do note that typescript will be unhappy until markdoc/markdoc#495 is
resolved.
Copy link
Member

@bluwy bluwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bluwy bluwy merged commit 374efcd into withastro:main Apr 1, 2024
4 checks passed
@astrobot-houston astrobot-houston mentioned this pull request Apr 1, 2024
@43081j 43081j deleted the shiki-memory branch April 1, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat: markdown Related to Markdown (scope) pkg: astro Related to the core `astro` package (scope) pkg: create-astro Related to the `create-astro` package (scope) pkg: integration Related to any renderer integration (scope)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants