Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: add support for other backends apart from sqlite, add file-based backend (basically jsonl) #50

Merged
merged 2 commits into from
Sep 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ Cachew gives the best of two worlds and makes it both **easy and efficient**. Th
- first your objects get [converted](src/cachew/marshall/cachew.py#L34) into a simpler JSON-like representation
- after that, they are mapped into byte blobs via [`orjson`](https://github.com/ijl/orjson).

When the function is called, cachew [computes the hash of your function's arguments ](src/cachew/__init__.py:#L504)
When the function is called, cachew [computes the hash of your function's arguments ](src/cachew/__init__.py:#L466)
and compares it against the previously stored hash value.

- If they match, it would deserialize and yield whatever is stored in the cache database
Expand All @@ -140,18 +140,18 @@ and compares it against the previously stored hash value.



* automatic schema inference: [1](src/cachew/tests/test_cachew.py#L350), [2](src/cachew/tests/test_cachew.py#L364)
* automatic schema inference: [1](src/cachew/tests/test_cachew.py#L371), [2](src/cachew/tests/test_cachew.py#L385)
* supported types:

* primitive: `str`, `int`, `float`, `bool`, `datetime`, `date`, `Exception`

See [tests.test_types](src/cachew/tests/test_cachew.py#L676), [tests.test_primitive](src/cachew/tests/test_cachew.py#L710), [tests.test_dates](src/cachew/tests/test_cachew.py#L630), [tests.test_exceptions](src/cachew/tests/test_cachew.py#L1037)
* [@dataclass and NamedTuple](src/cachew/tests/test_cachew.py#L592)
* [Optional](src/cachew/tests/test_cachew.py#L494) types
* [Union](src/cachew/tests/test_cachew.py#L788) types
* [nested datatypes](src/cachew/tests/test_cachew.py#L410)
See [tests.test_types](src/cachew/tests/test_cachew.py#L697), [tests.test_primitive](src/cachew/tests/test_cachew.py#L731), [tests.test_dates](src/cachew/tests/test_cachew.py#L651), [tests.test_exceptions](src/cachew/tests/test_cachew.py#L1073)
* [@dataclass and NamedTuple](src/cachew/tests/test_cachew.py#L613)
* [Optional](src/cachew/tests/test_cachew.py#L515) types
* [Union](src/cachew/tests/test_cachew.py#L809) types
* [nested datatypes](src/cachew/tests/test_cachew.py#L431)

* detects [datatype schema changes](src/cachew/tests/test_cachew.py#L440) and discards old data automatically
* detects [datatype schema changes](src/cachew/tests/test_cachew.py#L461) and discards old data automatically


# Performance
Expand All @@ -165,20 +165,20 @@ You can find some of my performance tests in [benchmarks/](benchmarks) dir, and


# Using
See [docstring](src/cachew/__init__.py#L329) for up-to-date documentation on parameters and return types.
See [docstring](src/cachew/__init__.py#L281) for up-to-date documentation on parameters and return types.
You can also use [extensive unit tests](src/cachew/tests/test_cachew.py) as a reference.

Some useful (but optional) arguments of `@cachew` decorator:

* `cache_path` can be a directory, or a callable that [returns a path](src/cachew/tests/test_cachew.py#L387) and depends on function's arguments.
* `cache_path` can be a directory, or a callable that [returns a path](src/cachew/tests/test_cachew.py#L408) and depends on function's arguments.

By default, `settings.DEFAULT_CACHEW_DIR` is used.

* `depends_on` is a function which determines whether your inputs have changed, and the cache needs to be invalidated.

By default it just uses string representation of the arguments, you can also specify a custom callable.

For instance, it can be used to [discard cache](src/cachew/tests/test_cachew.py#L89) if the input file was modified.
For instance, it can be used to [discard cache](src/cachew/tests/test_cachew.py#L103) if the input file was modified.

* `cls` is the type that would be serialized.

Expand Down Expand Up @@ -251,6 +251,7 @@ def mcachew(*args, **kwargs):
import cachew
except ModuleNotFoundError:
import warnings

warnings.warn('cachew library not found. You might want to install it to speed things up. See https://github.com/karlicoss/cachew')
return lambda orig_func: orig_func
else:
Expand All @@ -264,9 +265,9 @@ Now you can use `@mcachew` in place of `@cachew`, and be certain things don't br
## Settings


[cachew.settings](src/cachew/__init__.py#L68) exposes some parameters that allow you to control `cachew` behaviour:
[cachew.settings](src/cachew/__init__.py#L66) exposes some parameters that allow you to control `cachew` behaviour:
- `ENABLE`: set to `False` if you want to disable caching for without removing the decorators (useful for testing and debugging).
You can also use [cachew.extra.disabled_cachew](src/cachew/extra.py#L18) context manager to do it temporarily.
You can also use [cachew.extra.disabled_cachew](src/cachew/extra.py#L21) context manager to do it temporarily.
- `DEFAULT_CACHEW_DIR`: override to set a different base directory. The default is the "user cache directory" (see [appdirs docs](https://github.com/ActiveState/appdirs#some-example-output)).
- `THROW_ON_ERROR`: by default, cachew is defensive and simply attemps to cause the original function on caching issues.
Set to `True` to catch errors earlier.
Expand Down
18 changes: 18 additions & 0 deletions benchmarks/20230917.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Running on @karlicoss desktop PC, =python3.10=

Just a comparison of =sqlite= and =file= backends.

#+begin_example
$ pytest --pyargs -k 'test_many and gc_off and 3000000' -s
src/cachew/tests/test_cachew.py::test_many[sqlite-gc_off-3000000] [INFO 2023-09-17 02:02:09,946 cachew __init__.py:657 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: wrote 3000000 objects to cachew (sqlite:/tmp/pytest-of-karlicos/pytest-129/test_many_sqlite_gc_off_3000000/test_many)
test_many: initial write to cache took 13.6s
test_many: cache size is 229.220352Mb
[INFO 2023-09-17 02:02:10,780 cachew __init__.py:662 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: loading 3000000 objects from cachew (sqlite:/tmp/pytest-of-karlicos/pytest-129/test_many_sqlite_gc_off_3000000/test_many)
test_many: reading from cache took 7.0s
PASSED
src/cachew/tests/test_cachew.py::test_many[file-gc_off-3000000] [INFO 2023-09-17 02:02:23,944 cachew __init__.py:657 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: wrote 3000000 objects to cachew (file:/tmp/pytest-of-karlicos/pytest-129/test_many_file_gc_off_3000000_0/test_many)
test_many: initial write to cache took 6.1s
test_many: cache size is 202.555667Mb
[INFO 2023-09-17 02:02:23,945 cachew __init__.py:662 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: loading objects from cachew (file:/tmp/pytest-of-karlicos/pytest-129/test_many_file_gc_off_3000000_0/test_many)
test_many: reading from cache took 5.4s
#+end_example
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
'appdirs' , # default cache dir
'sqlalchemy>=1.0', # cache DB interaction
'orjson', # fast json serialization
'pytz', # used to properly marshall pytz datatimes
]


Expand Down
Loading