Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Parquet to bypass Arrow #601

Merged
merged 14 commits into from
Aug 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ As _stlite_ runs on the web browser environment ([Pyodide](https://pyodide.org/)

- `st.spinner()` does not work with blocking methods like `pyodide.http.open_url()` because stlite runs on a single-threaded environment, so `st.spinner()` can't execute its code to start showing the spinner during the blocking method occupies the only event loop.
- `time.sleep()` is no-op. Use `asyncio.sleep()` instead. This is a restriction from Pyodide runtime. See https://github.com/pyodide/pyodide/issues/2354. The following section about top-level await may also help to know how to use async functions on stlite.
- `st.experimental_data_editor` does not work as it relies on PyArrow, but it doesn't work on Pyodide. Track this issue on https://github.com/whitphx/stlite/issues/509.
- There are some small differences in how (less common) data types of DataFrame columns are handled in `st.dataframe`, `st.data_editor`, `st.table`, and Altair-based charts. The reason is that stlite uses the Parquet format instead of the Arrow IPC format to serialize dataframes (Ref: [#601](https://github.com/whitphx/stlite/pull/601)).
- For URL access, `urllib` and `requests` don't work on Pyodide/stlite, so we have to use alternative methods provided by Pyodide, such as [`pyodide.http.pyfetch()`](https://pyodide.org/en/stable/usage/api/python-api/http.html#pyodide.http.pyfetch) or [`pyodide.http.open_url()`](https://pyodide.org/en/stable/usage/api/python-api/http.html#pyodide.http.open_url). See https://pyodide.org/en/stable/usage/faq.html#how-can-i-load-external-files-in-pyodide for the details. For `pyodide.http.pyfetch()`, see also the following section about top-level await.
- The C extension packages that are not built for Pyodide cannot be installed. See https://pyodide.org/en/stable/usage/faq.html#micropip-can-t-find-a-pure-python-wheel for the details.
Copy link
Contributor

@lukasmasuch lukasmasuch Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add something like:

  1. st.bokeh_chart does not work since Pyodide uses Bokeh version 3.x while Streamlit only supports 2.x. The 3.x support for Streamlit is tracked here: Support bokeh 3.0.3 streamlit/streamlit#5858
  2. There are some small differences in how (less common) data types of DataFrame columns are handled in st.dataframe, st.data_editor, st.table, and Altair-based charts. The reason is that stlite uses the Parquet format instead of the Arrow IPC format to serialize dataframes.

Copy link
Owner Author

@whitphx whitphx Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added the second one in this PR, and will do the first one in another PR.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> #605


Expand Down
12 changes: 12 additions & 0 deletions packages/desktop/craco.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,18 @@ module.exports = {
type: "javascript/auto",
});

// For parquet-wasm, which includes `.wasm` files and needs async load.
// https://qiita.com/laiso/items/a5f7362c4a9163a878e5
// https://webpack.js.org/configuration/experiments/
webpackConfig.module.rules.push({
test: /\.wasm$/,
type: "webassembly/async",
});
webpackConfig.experiments = {
...webpackConfig.experiments,
asyncWebAssembly: true,
};

/* For file-loader that resolves the wheels */
// Since Webpack5, Asset Modules has been introduced to cover what file-loader had done.
// However, in this project, we use the inline loader setting like `import * from "!!file-loader!/path/to/file"` to use file-loader
Expand Down
1 change: 1 addition & 0 deletions packages/desktop/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
"electron-builder": "^23.6.0",
"electron-reload": "^2.0.0-alpha.1",
"esbuild": "^0.16.6",
"parquet-wasm": "^0.4.0",
"raw-loader": "^4.0.2",
"react": "^17.0.2",
"react-dom": "^17.0.2",
Expand Down
1 change: 0 additions & 1 deletion packages/kernel/src/worker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

let pyodide: PyodideInterface;

let httpServer: any;

Check warning on line 15 in packages/kernel/src/worker.ts

View workflow job for this annotation

GitHub Actions / test-kernel

Unexpected any. Specify a different type

interface StliteWorkerContext extends Worker {
postMessage(message: OutMessage, transfer: Transferable[]): void;
Expand All @@ -20,7 +20,7 @@
}

// Ref: https://v4.webpack.js.org/loaders/worker-loader/#loading-with-worker-loader
const ctx: StliteWorkerContext = self as any;

Check warning on line 23 in packages/kernel/src/worker.ts

View workflow job for this annotation

GitHub Actions / test-kernel

Unexpected any. Specify a different type

const initDataPromiseDelegate = new PromiseDelegate<WorkerInitialData>();

Expand Down Expand Up @@ -255,7 +255,6 @@
from stlite_server.server import Server

load_config_options({
"global.dataFrameSerialization": "legacy", # Not to use PyArrow
"browser.gatherUsageStats": False,
"runner.fastReruns": False, # Fast reruns do not work well with the async script runner of stlite. See https://github.com/whitphx/stlite/pull/550#issuecomment-1505485865.
})
Expand Down Expand Up @@ -318,7 +317,7 @@

httpServer.start_websocket(
path,
(messageProxy: any, binary: boolean) => {

Check warning on line 320 in packages/kernel/src/worker.ts

View workflow job for this annotation

GitHub Actions / test-kernel

Unexpected any. Specify a different type
// XXX: Now there is no session mechanism

if (binary) {
Expand Down
13 changes: 13 additions & 0 deletions packages/mountable/config/webpack.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,14 @@ module.exports = function (webpackEnv) {
type: "javascript/auto",
},
// Stlite:
// For parquet-wasm, which includes `.wasm` files and needs async load.
// https://qiita.com/laiso/items/a5f7362c4a9163a878e5
// https://webpack.js.org/configuration/experiments/
{
test: /\.wasm$/,
type: "webassembly/async",
},
// Stlite:
// Since Webpack5, Asset Modules has been introduced to cover what file-loader had done.
// However, in this project, we use the inline loader setting like `import * from "!!file-loader!/path/to/file"` to use file-loader
// but it does not turn off Asset Modules and leads to duplicate assets generated.
Expand All @@ -591,6 +599,11 @@ module.exports = function (webpackEnv) {
},
].filter(Boolean),
},
// Stlite:
// This is necessary to make the async load of parquet-wasm work. See the comment above about parquet-wasm as well.
experiments: {
asyncWebAssembly: true,
},
plugins: [
// Generates an `index.html` file with the <script> injected.
// Stlite: enable this only for development, as the production build is as a library, not an app.
Expand Down
1 change: 1 addition & 0 deletions packages/mountable/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
"jest-resolve": "^27.4.2",
"jest-watch-typeahead": "^1.0.0",
"mini-css-extract-plugin": "^2.4.5",
"parquet-wasm": "^0.4.0",
"postcss": "^8.4.4",
"postcss-flexbugs-fixes": "^5.0.2",
"postcss-loader": "^6.2.1",
Expand Down
12 changes: 12 additions & 0 deletions packages/sharing/craco.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,18 @@ module.exports = {
type: "javascript/auto",
});

// For parquet-wasm, which includes `.wasm` files and needs async load.
// https://qiita.com/laiso/items/a5f7362c4a9163a878e5
// https://webpack.js.org/configuration/experiments/
webpackConfig.module.rules.push({
test: /\.wasm$/,
type: "webassembly/async",
});
webpackConfig.experiments = {
...webpackConfig.experiments,
asyncWebAssembly: true,
};

/* For file-loader that resolves the wheels */
// Since Webpack5, Asset Modules has been introduced to cover what file-loader had done.
// However, in this project, we use the inline loader setting like `import * from "!!file-loader!/path/to/file"` to use file-loader
Expand Down
1 change: 1 addition & 0 deletions packages/sharing/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"@types/node": "^16.18.12",
"@types/react": "^17.0.7",
"@types/react-dom": "^17.0.5",
"parquet-wasm": "^0.4.0",
"react": "^17.0.2",
"react-dom": "^17.0.2",
"react-scripts": "5.0.1",
Expand Down
Loading
Loading