Skip to content

Commit

Permalink
Expose DiskArrays.cache (#417)
Browse files Browse the repository at this point in the history
* forward DiskArrays.cache

* add some docs

* bump version
  • Loading branch information
meggart authored Jul 26, 2024
1 parent 664590f commit 3b39252
Show file tree
Hide file tree
Showing 5 changed files with 179 additions and 133 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "YAXArrays"
uuid = "c21b50f5-aa40-41ea-b809-c0f5e47bfa5c"
authors = ["Fabian Gans <fgans@bgc-jena.mpg.de>"]
version = "0.5.9"
version = "0.5.10"

[deps]
CFTime = "179af706-886a-5703-950a-314cd64e0468"
Expand Down
18 changes: 18 additions & 0 deletions docs/src/UserGuide/cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Caching YAXArrays

For some applications like interactive plotting of large datasets it can not be avoided that the same data must be accessed several times. In these cases it can be useful to store recently accessed data in a cache. In YAXArrays this can be easily achieved using the `cache` function. For example, if we open a large dataset from a remote source and want to keep data in a cache of size 500MB one can use:

````julia
using YAXArrays, Zarr
ds = open_dataset("path/to/source")
cachesize = 500 #MB
cache(ds,maxsize = cachesize)
````

The above will wrap every array in the dataset into its own cache, where the 500MB are distributed equally across datasets.
Alternatively individual caches can be applied to single `YAXArray`s

````julia
yax = ds.avariable
cache(yax,maxsize = 1000)
````
5 changes: 3 additions & 2 deletions src/Cubes/Cubes.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ The functions provided by YAXArrays are supposed to work on different types of c
Data types that
"""
module Cubes
using DiskArrays: DiskArrays, eachchunk, approx_chunksize, max_chunksize, grid_offset, GridChunks
using DiskArrays: DiskArrays, eachchunk, approx_chunksize, max_chunksize, grid_offset, GridChunks, cache
using Distributed: myid
using Dates: TimeType, Date
using IntervalSets: Interval, (..)
Expand All @@ -17,7 +17,7 @@ using Tables: istable, schema, columns
using DimensionalData: DimensionalData as DD, AbstractDimArray, NoName
import DimensionalData: name

export concatenatecubes, caxes, subsetcube, readcubedata, renameaxis!, YAXArray, setchunks
export concatenatecubes, caxes, subsetcube, readcubedata, renameaxis!, YAXArray, setchunks, cache

"""
This function calculates a subset of a cube's data
Expand Down Expand Up @@ -179,6 +179,7 @@ function Base.permutedims(c::YAXArray, p)
newchunks = DiskArrays.GridChunks(eachchunk(c).chunks[collect(dimnums)])
YAXArray(newdims, newdata, c.properties, newchunks, c.cleaner)
end
DiskArrays.cache(a::YAXArray;maxsize=1000) = DD.rebuild(a,cache(a.data;maxsize))

# DimensionalData overloads

Expand Down
9 changes: 9 additions & 0 deletions src/DatasetAPI/Datasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,15 @@ function Base.getindex(x::Dataset, i::Vector{Symbol})
cubesnew = [j => x.cubes[j] for j in i]
Dataset(; cubesnew...)
end
function DiskArrays.cache(ds::Dataset;maxsize=1000)
#Distribute cache size equally across cubes
maxsize = maxsize ÷ length(ds.cubes)
cachedcubes = OrderedDict{Symbol,YAXArray}(
k => DiskArrays.cache(ds.cubes[k];maxsize) for k in keys(ds.cubes)
)
Dataset(cachedcubes,ds.axes,ds.properties)
end


function fuzzyfind(s::String, comp::Vector{String})
sl = lowercase(s)
Expand Down
Loading

2 comments on commit 3b39252

@meggart
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/111816

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.5.10 -m "<description of version>" 3b392522dda79beb015fbf4d68a22f21c00d0931
git push origin v0.5.10

Please sign in to comment.