Skip to content

Commit

Permalink
Improve documentation and introduce simple synthetic examples (#4)
Browse files Browse the repository at this point in the history
Improved documentation and added multiple fixes.

- Changed the way that an "uncertainty" is provided. That is,
while "distances" and "similarities" are provided by a user through
 a function call, an "uncertainty" was originally given as a type. (#13)

- Optimized computations in `distance based.jl` (#11)

- Implemented a variant of sq. Mahalanobis distance
with missing entries, see https://www.jstor.org/stable/3559861
on page 285, fixes #12

- Renamed `MahalanobisDistance` to `SquaredMahalanobisDistance`

Fixes #11, #12, and #13
---------

Co-authored-by: Bíma, Jan <jan.bima@merck.com>
  • Loading branch information
slwu89 and thevolatilebit authored Dec 3, 2023
1 parent d1d09c3 commit 87ac18c
Show file tree
Hide file tree
Showing 51 changed files with 4,660 additions and 491 deletions.
32 changes: 11 additions & 21 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,19 +1,14 @@
name = "CEEDesigns"
uuid = "e939450b-799e-4198-a5f5-3f2f7fb1c671"
version = "0.3.5"
version = "0.3.6"

[deps]
Clustering = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5"
Combinatorics = "861a8166-3701-5b0c-9a16-15d98fcdc6aa"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
LibPQ = "194296ae-ab2e-5f79-8cd4-7183a0a5a0d1"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MCTS = "e12ccd36-dcad-5f33-8774-9175229e7b33"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
POMDPSimulators = "e0d0a172-29c6-5d4e-96d0-f262df5d01fd"
POMDPTools = "7588e00f-9cae-40de-98dc-e0c70c48cdd7"
POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Expand All @@ -25,24 +20,19 @@ Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
julia = "1.9"
Plots = "1.39"
ScientificTypes = "3.0"
POMDPTools = "0.1"
DataFrames = "1.6"
HTTP = "1.10"
LinearAlgebra = "1.9"
LibPQ = "1.17"
Combinatorics = "1.0"
Statistics = "1.9"
Random = "1.9"
Reexport = "1.2"
Distances = "0.10"
POMDPs = "0.9"
DataFrames = "1.6"
JSON = "0.21"
Clustering = "0.15"
LinearAlgebra = "1.9"
MCTS = "0.5"
MLJ = "0.20"
POMDPTools = "0.1"
POMDPs = "0.9"
Plots = "1.39"
Random = "1.9"
Reexport = "1.2"
Requires = "1.3"
POMDPSimulators = "0.3"
ScientificTypes = "3.0"
Statistics = "1.9"
StatsBase = "0.34"
julia = "1.9"
7 changes: 7 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,21 @@
BetaML = "024491cd-cc6b-443e-8034-08ea7eb7db2b"
CEEDesigns = "e939450b-799e-4198-a5f5-3f2f7fb1c671"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
Combinatorics = "861a8166-3701-5b0c-9a16-15d98fcdc6aa"
Copulas = "ae264745-0b69-425e-9d9d-cf662c5eec93"
D3Trees = "e3df1716-f71e-5df9-9e2d-98e193103c45"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterMarkdown = "997ab1e6-3595-5248-9280-8efb232c3433"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
MCTS = "e12ccd36-dcad-5f33-8774-9175229e7b33"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
POMDPTools = "7588e00f-9cae-40de-98dc-e0c70c48cdd7"
POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
ScientificTypes = "321657f4-b219-11e9-178b-2701a2544e81"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
11 changes: 9 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,13 @@ using CEEDesigns

# Literate for tutorials
const literate_dir = joinpath(@__DIR__, "..", "tutorials")
const tutorials_src =
["StaticDesigns.jl", "StaticDesignsFiltration.jl", "GenerativeDesigns.jl"]
const tutorials_src = [
"SimpleStatic.jl",
"SimpleGenerative.jl",
"StaticDesigns.jl",
"StaticDesignsFiltration.jl",
"GenerativeDesigns.jl",
]
const generated_dir = joinpath(@__DIR__, "src", "tutorials/")

# copy tutorials src
Expand All @@ -29,6 +34,8 @@ end
pages = [
"index.md",
"Tutorials" => [
"tutorials/SimpleStatic.md",
"tutorials/SimpleGenerative.md",
"tutorials/StaticDesigns.md",
"tutorials/StaticDesignsFiltration.md",
"tutorials/GenerativeDesigns.md",
Expand Down
2 changes: 1 addition & 1 deletion docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ CEEDesigns.GenerativeDesigns.efficient_value
CEEDesigns.GenerativeDesigns.DistanceBased
CEEDesigns.GenerativeDesigns.QuadraticDistance
CEEDesigns.GenerativeDesigns.DiscreteDistance
CEEDesigns.GenerativeDesigns.MahalanobisDistance
CEEDesigns.GenerativeDesigns.SquaredMahalanobisDistance
CEEDesigns.GenerativeDesigns.Exponential
```

Expand Down
6 changes: 3 additions & 3 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ A decision-making framework for the cost-efficient design of experiments, balanc
```

## Static experimental designs
Here we assume that the same experimental design will be used for a population of examined entities, hence the word 'static'.
Here we assume that the same experimental design will be used for a population of examined entities, hence the word "static".

For each subset of experiments, we consider an estimate of the value of acquired information. To give an example, if a set of experiments is used to predict the value of a specific target variable, our framework can leverage a built-in integration with [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) to estimate predictive accuracies of machine learning models fitted over subset of experimental features.

In the cost-sensitive setting of CEEDesigns, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.
In the cost-sensitive setting of CEEDesigns.jl`, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized.

Assuming the information values and optimized experimental costs for each subset of experiments, we then generate a set of cost-efficient experimental designs.

Expand All @@ -23,7 +23,7 @@ Assuming the information values and optimized experimental costs for each subset

We consider 'personalized' experimental designs that dynamically adjust based on the evidence gathered from the experiments. This approach is motivated by the fact that the value of information collected from an experiment generally differs across subpopulations of the entities involved in the triage process.

At the beginning of the triage process, an entity's prior data is used to project a range of cost-efficient experimental designs. Internally, while constructing these designs, we incorporate multiple-step-ahead lookups to model likely experimental outcomes and consider the subsequent decisions for each outcome. Then after choosing a specific decision policy from this set and acquiring additional experimental readouts (sampled from a generative model, hence the word 'generative'), we adjust the continuation based on this evidence.
At the beginning of the triage process, an entity's prior data is used to project a range of cost-efficient experimental designs. Internally, while constructing these designs, we incorporate multiple-step-ahead lookups to model likely experimental outcomes and consider the subsequent decisions for each outcome. Then after choosing a specific decision policy from this set and acquiring additional experimental readouts (sampled from a generative model, hence the word "generative"), we adjust the continuation based on this evidence.

```@raw html
<a><img src="assets/search_tree.png" align="left" alt="code" width="400"></a>
Expand Down
Loading

0 comments on commit 87ac18c

Please sign in to comment.