Skip to content

Commit

Permalink
feat: rename TABLE_NAMES_CACHE_CONFIG to DATA_CACHE_CONFIG (apache#11509
Browse files Browse the repository at this point in the history
)

* feat: rename TABLE_NAMES_CACHE_CONFIG to DATA_CACHE_CONFIG

The corresponding cache will now also cache the query results.

* Slice use DATA_CACHE_CONFIG CACHE_DEFAULT_TIMEOUT

* Add test for default cache timeout

* rename FAR_FUTURE to ONE_YEAR_IN_SECS
  • Loading branch information
ktmud authored and auxten committed Nov 20, 2020
1 parent 79f94f0 commit b507e12
Show file tree
Hide file tree
Showing 22 changed files with 439 additions and 380 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ __pycache__
cover
.DS_Store
.eggs
.env
.envrc
.idea
.mypy_cache
Expand Down
342 changes: 175 additions & 167 deletions UPDATING.md

Large diffs are not rendered by default.

17 changes: 8 additions & 9 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -367,8 +367,8 @@ Caching

Superset uses `Flask-Cache <https://pythonhosted.org/Flask-Cache/>`_ for
caching purpose. Configuring your caching backend is as easy as providing
a ``CACHE_CONFIG``, constant in your ``superset_config.py`` that
complies with the Flask-Cache specifications.
``CACHE_CONFIG`` and ``DATA_CACHE_CONFIG`, constants in ``superset_config.py``
that complies with `the Flask-Cache specifications <https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching>`_.

Flask-Cache supports multiple caching backends (Redis, Memcached,
SimpleCache (in-memory), or the local filesystem). If you are going to use
Expand All @@ -378,14 +378,13 @@ the `redis <https://pypi.python.org/pypi/redis>`_ Python package: ::

pip install redis

For setting your timeouts, this is done in the Superset metadata and goes
up the "timeout searchpath", from your slice configuration, to your
data source's configuration, to your database's and ultimately falls back
into your global default defined in ``CACHE_CONFIG``.
For chart data, Superset goes up a “timeout search path”, from a slice's configuration
to the datasource’s, the database’s, then ultimately falls back to the global default
defined in ``DATA_CACHE_CONFIG``.

.. code-block:: python
CACHE_CONFIG = {
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
'CACHE_KEY_PREFIX': 'superset_results',
Expand All @@ -400,15 +399,15 @@ object that is compatible with the `Flask-Cache <https://pythonhosted.org/Flask-
from custom_caching import CustomCache
def init_cache(app):
def init_data_cache(app):
"""Takes an app instance and returns a custom cache backend"""
config = {
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
'CACHE_KEY_PREFIX': 'superset_results',
}
return CustomCache(app, config)
CACHE_CONFIG = init_cache
DATA_CACHE_CONFIG = init_data_cache
Superset has a Celery task that will periodically warm up the cache based on
different strategies. To use it, add the following to the `CELERYBEAT_SCHEDULE`
Expand Down
22 changes: 13 additions & 9 deletions docs/src/pages/docs/installation/caching.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,29 @@ version: 1

## Caching

Superset uses [Flask-Cache](https://pythonhosted.org/Flask-Cache/) for caching purpose. Configuring
your caching backend is as easy as providing a `CACHE_CONFIG`, constant in your `superset_config.py`
that complies with the Flask-Cache specifications.
Superset uses [Flask-Cache](https://pythonhosted.org/Flask-Cache/) for caching purpose. For security reasons,
there are two separate cache configs for Superset's own metadata (`CACHE_CONFIG`) and charting data queried from
connected datasources (`DATA_CACHE_CONFIG`). However, Query results from SQL Lab are stored in another backend
called `RESULTS_BACKEND`, See [Async Queries via Celery](/docs/installation/async-queries-celery) for details.

Flask-Cache supports multiple caching backends (Redis, Memcached, SimpleCache (in-memory), or the
local filesystem).
Configuring caching is as easy as providing `CACHE_CONFIG` and `DATA_CACHE_CONFIG` in your
`superset_config.py` that complies with [the Flask-Cache specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).

Flask-Cache supports various caching backends, including Redis, Memcached, SimpleCache (in-memory), or the
local filesystem.

- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
`python-memcached` does not handle storing binary data correctly.
- Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package

Both of these libraries can be installed using pip.

For setting your timeouts, this is done in the Superset metadata and goes up the “timeout
searchpath”, from your slice configuration, to your data source’s configuration, to your database’s
and ultimately falls back into your global default defined in `CACHE_CONFIG`.
For chart data, Superset goes up a “timeout search path”, from a slice's configuration
to the datasource’s, the database’s, then ultimately falls back to the global default
defined in `DATA_CACHE_CONFIG`.

```
CACHE_CONFIG = {
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
'CACHE_KEY_PREFIX': 'superset_results',
Expand Down
1 change: 1 addition & 0 deletions helm/superset/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ CACHE_CONFIG = {
'CACHE_REDIS_DB': 1,
'CACHE_REDIS_URL': f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/1"
}
DATA_CACHE_CONFIG = CACHE_CONFIG

SQLALCHEMY_DATABASE_URI = f"postgresql+psycopg2://{env('DB_USER')}:{env('DB_PASS')}@{env('DB_HOST')}:{env('DB_PORT')}/{env('DB_NAME')}"
SQLALCHEMY_TRACK_MODIFICATIONS = True
Expand Down
2 changes: 1 addition & 1 deletion superset/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,5 @@
results_backend_use_msgpack = LocalProxy(
lambda: results_backend_manager.should_use_msgpack
)
tables_cache = LocalProxy(lambda: cache_manager.tables_cache)
data_cache = LocalProxy(lambda: cache_manager.data_cache)
thumbnail_cache = LocalProxy(lambda: cache_manager.thumbnail_cache)
9 changes: 5 additions & 4 deletions superset/common/query_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,12 @@
import pandas as pd
from flask_babel import gettext as _

from superset import app, cache, db, is_feature_enabled, security_manager
from superset import app, db, is_feature_enabled
from superset.common.query_object import QueryObject
from superset.connectors.base.models import BaseDatasource
from superset.connectors.connector_registry import ConnectorRegistry
from superset.exceptions import QueryObjectValidationError
from superset.extensions import cache_manager, security_manager
from superset.stats_logger import BaseStatsLogger
from superset.utils import core as utils
from superset.utils.core import DTTM_ALIAS
Expand Down Expand Up @@ -233,8 +234,8 @@ def get_df_payload( # pylint: disable=too-many-statements
status = None
query = ""
error_message = None
if cache_key and cache and not self.force:
cache_value = cache.get(cache_key)
if cache_key and cache_manager.data_cache and not self.force:
cache_value = cache_manager.data_cache.get(cache_key)
if cache_value:
stats_logger.incr("loading_from_cache")
try:
Expand Down Expand Up @@ -286,7 +287,7 @@ def get_df_payload( # pylint: disable=too-many-statements
status = utils.QueryStatus.FAILED
stacktrace = utils.get_stacktrace()

if is_loaded and cache_key and cache and status != utils.QueryStatus.FAILED:
if is_loaded and cache_key and status != utils.QueryStatus.FAILED:
set_and_log_cache(
cache_key,
df,
Expand Down
11 changes: 8 additions & 3 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,10 +392,15 @@ def _try_json_readsha( # pylint: disable=unused-argument
# Setup image size default is (300, 200, True)
# IMG_SIZE = (300, 200, True)

CACHE_DEFAULT_TIMEOUT = 60 * 60 * 24
# Default cache timeout (in seconds), applies to all cache backends unless
# specifically overridden in each cache config.
CACHE_DEFAULT_TIMEOUT = 60 * 60 * 24 # 1 day

# Default cache for Superset objects
CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "null"}
TABLE_NAMES_CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "null"}
DASHBOARD_CACHE_TIMEOUT = 60 * 60 * 24 * 365

# Cache for datasource metadata and query results
DATA_CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "null"}

# CORS Options
ENABLE_CORS = False
Expand Down
5 changes: 3 additions & 2 deletions superset/db_engine_specs/hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,11 @@
from sqlalchemy.orm import Session
from sqlalchemy.sql.expression import ColumnClause, Select

from superset import app, cache, conf
from superset import app, conf
from superset.db_engine_specs.base import BaseEngineSpec
from superset.db_engine_specs.presto import PrestoEngineSpec
from superset.exceptions import SupersetException
from superset.extensions import cache_manager
from superset.models.sql_lab import Query
from superset.sql_parse import Table
from superset.utils import core as utils
Expand Down Expand Up @@ -514,7 +515,7 @@ def execute( # type: ignore
cursor.execute(query, **kwargs)

@classmethod
@cache.memoize()
@cache_manager.cache.memoize()
def get_function_names(cls, database: "Database") -> List[str]:
"""
Get a list of function names that are able to be called on the database.
Expand Down
6 changes: 3 additions & 3 deletions superset/db_engine_specs/presto.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
from sqlalchemy.orm import Session
from sqlalchemy.sql.expression import ColumnClause, Select

from superset import app, cache, is_feature_enabled, security_manager
from superset import app, cache_manager, is_feature_enabled, security_manager
from superset.db_engine_specs.base import BaseEngineSpec
from superset.errors import ErrorLevel, SupersetError, SupersetErrorType
from superset.exceptions import SupersetTemplateException
Expand Down Expand Up @@ -930,7 +930,7 @@ def _latest_partition_from_df(cls, df: pd.DataFrame) -> Optional[List[str]]:
return None

@classmethod
@cache.memoize(timeout=60)
@cache_manager.data_cache.memoize(timeout=60)
def latest_partition(
cls,
table_name: str,
Expand Down Expand Up @@ -1030,7 +1030,7 @@ def latest_sub_partition(
return df.to_dict()[field_to_return][0]

@classmethod
@cache.memoize()
@cache_manager.data_cache.memoize()
def get_function_names(cls, database: "Database") -> List[str]:
"""
Get a list of function names that are able to be called on the database.
Expand Down
21 changes: 12 additions & 9 deletions superset/models/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,9 @@
from sqlalchemy.sql import expression, Select
from sqlalchemy_utils import EncryptedType

from superset import app, db_engine_specs, is_feature_enabled, security_manager
from superset import app, db_engine_specs, is_feature_enabled
from superset.db_engine_specs.base import TimeGrain
from superset.extensions import cache_manager, security_manager
from superset.models.helpers import AuditMixinNullable, ImportExportMixin
from superset.models.tags import FavStarUpdater
from superset.result_set import SupersetResultSet
Expand Down Expand Up @@ -452,8 +453,8 @@ def inspector(self) -> Inspector:
return sqla.inspect(engine)

@cache_util.memoized_func(
key=lambda *args, **kwargs: "db:{}:schema:None:table_list",
attribute_in_key="id",
key=lambda self, *args, **kwargs: f"db:{self.id}:schema:None:table_list",
cache=cache_manager.data_cache,
)
def get_all_table_names_in_database(
self,
Expand All @@ -467,7 +468,8 @@ def get_all_table_names_in_database(
return self.db_engine_spec.get_all_datasource_names(self, "table")

@cache_util.memoized_func(
key=lambda *args, **kwargs: "db:{}:schema:None:view_list", attribute_in_key="id"
key=lambda self, *args, **kwargs: f"db:{self.id}:schema:None:view_list",
cache=cache_manager.data_cache,
)
def get_all_view_names_in_database(
self,
Expand All @@ -481,8 +483,8 @@ def get_all_view_names_in_database(
return self.db_engine_spec.get_all_datasource_names(self, "view")

@cache_util.memoized_func(
key=lambda *args, **kwargs: f"db:{{}}:schema:{kwargs.get('schema')}:table_list", # type: ignore
attribute_in_key="id",
key=lambda self, schema, *args, **kwargs: f"db:{self.id}:schema:{schema}:table_list", # type: ignore
cache=cache_manager.data_cache,
)
def get_all_table_names_in_schema(
self,
Expand Down Expand Up @@ -513,8 +515,8 @@ def get_all_table_names_in_schema(
logger.warning(ex)

@cache_util.memoized_func(
key=lambda *args, **kwargs: f"db:{{}}:schema:{kwargs.get('schema')}:view_list", # type: ignore
attribute_in_key="id",
key=lambda self, schema, *args, **kwargs: f"db:{self.id}:schema:{schema}:view_list", # type: ignore
cache=cache_manager.data_cache,
)
def get_all_view_names_in_schema(
self,
Expand Down Expand Up @@ -543,7 +545,8 @@ def get_all_view_names_in_schema(
logger.warning(ex)

@cache_util.memoized_func(
key=lambda *args, **kwargs: "db:{}:schema_list", attribute_in_key="id"
key=lambda self, *args, **kwargs: f"db:{self.id}:schema_list",
cache=cache_manager.data_cache,
)
def get_all_schema_names(
self,
Expand Down
15 changes: 4 additions & 11 deletions superset/models/dashboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,11 @@
from sqlalchemy.orm.session import object_session
from sqlalchemy.sql import join, select

from superset import (
app,
cache,
ConnectorRegistry,
db,
is_feature_enabled,
security_manager,
)
from superset import app, ConnectorRegistry, db, is_feature_enabled, security_manager
from superset.connectors.base.models import BaseDatasource
from superset.connectors.druid.models import DruidColumn, DruidMetric
from superset.connectors.sqla.models import SqlMetric, TableColumn
from superset.extensions import cache_manager
from superset.models.helpers import AuditMixinNullable, ImportExportMixin
from superset.models.slice import Slice
from superset.models.tags import DashboardUpdater
Expand Down Expand Up @@ -224,10 +218,9 @@ def data(self) -> Dict[str, Any]:
"last_modified_time": self.changed_on.replace(microsecond=0).timestamp(),
}

@cache.memoize(
@cache_manager.cache.memoize(
# manage cache version manually
make_name=lambda fname: f"{fname}-v2.1",
timeout=config["DASHBOARD_CACHE_TIMEOUT"],
unless=lambda: not is_feature_enabled("DASHBOARD_CACHE"),
)
def full_data(self) -> Dict[str, Any]:
Expand Down Expand Up @@ -267,7 +260,7 @@ def update_thumbnail(self) -> None:

@debounce(0.1)
def clear_cache(self) -> None:
cache.delete_memoized(Dashboard.full_data, self)
cache_manager.cache.delete_memoized(Dashboard.full_data, self)

@classmethod
@debounce(0.1)
Expand Down
Loading

0 comments on commit b507e12

Please sign in to comment.