Skip to content

Commit

Permalink
fix: disable automatic eviction of diskcache at aggregation
Browse files Browse the repository at this point in the history
Python diskcache, which we use for aggregating trainer updates, have its
own automatic eviction policy, depending on its size_limit and
cull_limit value. Updated it to disable automatic eviction, and added a
logger debug line that tells you the number of updates in the cache.
  • Loading branch information
jaemin-shin committed May 19, 2023
1 parent e328ebf commit 4548f06
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions lib/python/flame/mode/horizontal/syncfl/top_aggregator.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,11 @@ def internal_init(self) -> None:
self.metrics = dict()

# disk cache is used for saving memory in case model is large
# automatic eviction of disk cache is disabled with cull_limit 0
self.cache = Cache()
self.cache.reset('size_limit', 1e15)
self.cache.reset('cull_limit', 0)

self.optimizer = optimizer_provider.get(
self.config.optimizer.sort, **self.config.optimizer.kwargs
)
Expand Down Expand Up @@ -157,6 +161,8 @@ def _aggregate_weights(self, tag: str) -> None:
# save training result from trainer in a disk cache
self.cache[end] = tres

logger.debug(f"received {len(self.cache)} trainer updates in cache")

# optimizer conducts optimization (in this case, aggregation)
global_weights = self.optimizer.do(
deepcopy(self.weights),
Expand Down

0 comments on commit 4548f06

Please sign in to comment.