Skip to content

Commit

Permalink
"no profile" aka actually yes profile
Browse files Browse the repository at this point in the history
env vars don't work, as expected. but also turning off profiling doesn't change the GC pattern
  • Loading branch information
gjoseph92 committed May 19, 2021
1 parent f8721be commit ac61e5f
Show file tree
Hide file tree
Showing 5 changed files with 94 additions and 26 deletions.
4 changes: 2 additions & 2 deletions dask_profiling_coiled/run_profile.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def main():
if __name__ == "__main__":
n_workers = 100
cluster = coiled.Cluster(
software="gjoseph92/profiling-daskconfig",
software="gjoseph92/profiling",
n_workers=n_workers,
worker_cpu=1,
worker_memory="4 GiB",
Expand Down Expand Up @@ -104,7 +104,7 @@ def main():
# This is key---otherwise we're uploading ~300MiB of graph to the scheduler
dask.config.set({"optimization.fuse.active": False})

test_name = "cython-shuffle-gc-noprofiling-daskconfig"
test_name = "cython-shuffle-gc-noprofiling-env"
with (
distributed.performance_report(f"results/{test_name}.html"),
pyspy_on_scheduler(
Expand Down
5 changes: 2 additions & 3 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,5 @@ dependencies:
# - git+https://github.com/gjoseph92/scheduler-profilers.git # TODO this conflicts with --install-option for distributed, using postBuild instead
# - git+https://github.com/gjoseph92/dask-noop.git
variables:
DASK_CONFIG: dask.yaml
# DASK_DISTRIBUTED__WORKER__PROFILE__INTERVAL: 2h
# DASK_DISTRIBUTED__WORKER__PROFILE__CYCLE: 10h
DASK_DISTRIBUTED__WORKER__PROFILE__INTERVAL: 2h
DASK_DISTRIBUTED__WORKER__PROFILE__CYCLE: 10h
22 changes: 1 addition & 21 deletions make-coiled-env.sh
Original file line number Diff line number Diff line change
@@ -1,30 +1,10 @@
#!/bin/bash

# Install py-spy separately so it doesn't conflict with Cythonized distributed.
# Also add dask config.

# HACK: Coiled offers no easy way to add auxiliary data files---or a dask config---in software environments,
# so we generate a post-build shell script that has the contents of `dask.yaml` within itself, and writes
# those contents out when executed.
OUT_CONFIG_PATH="~/.config/dask/dask.yaml"
YAML_CONTENTS=$(<dask.yaml)
# Install py-spy separately so it doesn't conflict with Cythonized distributed
cat > postbuild.sh <<EOF
#!/bin/bash
python3 -m pip install git+https://github.com/gjoseph92/scheduler-profilers.git@8d59e7f8b2ab59e22f0937557fefe388eac6ea61
OUT_CONFIG_PATH=$OUT_CONFIG_PATH
# ^ NOTE: no quotes, so ~ expands (https://stackoverflow.com/a/32277036)
mkdir -p \$(dirname \$OUT_CONFIG_PATH)
cat > \$OUT_CONFIG_PATH <<INNER_EOF
$YAML_CONTENTS
INNER_EOF
echo "export DASK_CONFIG=\$OUT_CONFIG_PATH" >> ~/.bashrc
echo "Wrote dask config to \$OUT_CONFIG_PATH:"
cat \$OUT_CONFIG_PATH
EOF
coiled env create -n profiling --conda environment.yml --post-build postbuild.sh
rm postbuild.sh
88 changes: 88 additions & 0 deletions results/cython-shuffle-gc-noprofiling-env.html

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions results/cython-shuffle-gc-noprofiling-env.json

Large diffs are not rendered by default.

0 comments on commit ac61e5f

Please sign in to comment.