Skip to content

Releases: CentaurusInfra/regionless-storage-service

V0.2.0 Release

03 Oct 22:00
b37c505
Compare
Choose a tag to compare

Here enters the V0.2.0 release of the Centaurus Regionless Storage Service (RKV).

In this release, one of the main focuses is improving I/O performance (latency, throughput) when the backend storage instances (e.g. Redis) are geo-distributed in multiple availability zones and/or regions.

Release Features

  • A novel sharding design that improves I/O latency and throughput while maintaining storage load balancing and high availability when storage instances are distributed in multiple data centers (e.g. availability zones and/or regions).

  • Adoption of asynchronous replication for sequential consistency.

  • Strong consistency validated programmatically via the linearizability checker Porcupine (similar to Jepsen).

  • A new in-memory storage type for more effective and costly efficient development

  • Improved the "one-key" deployment scripts for multi-region and large-scale testing.

  • Improved throughput by removing the server-scope locking

Performance

The following graph lists the KPI for this release:

image

The storage capacity goal of 100 million keys with 3 replicas was achieved using 54 m5a.2xlarge VMs on AWS, distributed in 4 availability zones in 3 regions, with 2 regions on the east coast and 1 across on the west coast. With sequential consistency, the write latency falls within the 20-30 ms which is limited by the network latency between regions on the same coast. Meanwhile, the read latency fails within the same-region range of less than 10 ms by preferring to read from geographically close-by replicas. Storage load were evenly distributed on all 54 VMs.

With the various bug fixes in this release, the concurrency of RKV has also been increased by at least 5 folds.

Known Issues

  • A manually set latency threshold is needed for the grouping and selection of storage instances. Due to variation of network latency, occasionally it could cause not enough "remote replication hosts" available upon RKV start up.
  • In large-scale test, the YCSB host becomes a bottleneck from scaling up due to connection pool limitation.

Looking forward

  • Expand RKV servers from residing in a single-region to multiple regions.
  • List-watch capability.
  • Global read implementation with multi-region RKV, and optimization with smart caching.
  • Add integration test to CICD.

V0.1.0 Release

02 Aug 16:21
4c1f126
Compare
Choose a tag to compare

Hello World!

Welcome to the first release of the regionless store service. This release focuses on the architecture design and key-component implementation that meet the requirements of highly scalable storage capacity, ETCD-compatible APIs, high availability with replication, and lowest possible PUT/GET latency under the constrain of strong replication consistency.

Release Features

  • APIs Related
    • CRUD with range query and listing all revisions of a key
    • An indexer with host mapping for fast partition server look-up, key listing, and range query
  • Key-space Partition
    • A key partition algorithm based on revision bucketing and consistency hashing
  • Replication & Consistency
    • Configurable data replication for HA
    • Implementation of chain-replication for strong (linearizability) consistency
    • Configurable sequential consistency design
  • Storage Backend
    • Open design and an initial implementation with Redis
  • Deployment
    • "One-button" deploy scripts that provision RKV for service and benchmarking
  • Performance
    • Storage capacity scalable to 30M key-value pairs (projected to 50M+ in the future testing)
    • Scalable for write/read latency

Known Issues

  • Linearizability consistency causing high PUT latency
  • Cross-region latency needs optimization
  • Concurrency of go-ycsb needs optimization

To-dos

  • Distributed indexer to break memory limitation at large data capacity
  • Watcher implementation
  • A 2-level key-sharding algorithm that allows selection of storage instance groups
  • Further txn support
  • More replication consistency optimization and options
  • Smart caching and read latency improvement