Skip to content

Latest commit

 

History

History
68 lines (41 loc) · 5.09 KB

10_附录A-SLO文档示例.md

File metadata and controls

68 lines (41 loc) · 5.09 KB

附录A SLO文档示例



本文介绍了示例游戏服务的SLO。

Status Published
Author Steven Thurgood
Date 2018-02-19
Reviewers David Ferguson
Approvers Betsy Beyer
Approval Date 2018-02-20
Revisit Date 2019-02-01

服务概述

示例游戏服务允许Android和iPhone用户彼此玩游戏。该应用程序在用户的手机上运行,并且移动通过REST API发送回API数据存储区包含所有当前和先前游戏的状态。得分管道读取该表并生成今天,本周和所有时间的最新联赛表。排行榜结果可在应用程序中通过API获得,也可以在公共HTTP服务器上获得。

SLO使用四个星期的滚动窗口。

**SLI和SLO **

Category SLI SLO
API
Availability The proportion of successful requests, as measured from the load balancer metrics. Any HTTP status other than 500–599 is considered successful.
count of "api" http_requests which do not have a 5XX status code divided by count of all "api" http_requests
97% success
延迟 The proportion of sufficiently fast requests, as measured from the load balancer metrics.
"Sufficiently fast" is defined as < 400 ms, or < 850 ms. count of "api" http_requests with a duration less than or equal to "0.4" seconds divided by count of all "api" http_requests

count of "api" http_requests with a duration less than or equal to "0.85" seconds divided by count of all "api" http_requests
90% of requests < 400 ms
99% of requests < 850 ms
HTTP server
可用性 根据负载平衡器指标衡量的成功请求的比例。除500--599以外的任何HTTP状态均被视为成功。
count of "web" http_requests which do not have a 5XX status code divided by count of all "web" http_requests
99%
Latency The proportion of sufficiently fast requests, as measured from the load balancer metrics.
“Sufficiently fast” is defined as < 200 ms, or < 1,000 ms.
count of "web" http_requests with a duration less than or equal to "0.2" seconds divided by count of all "web" http_requests

count of "web" http_requests with a duration less than or equal to "1.0" seconds divided by count of all "web" http_requests
90% of requests < 200 ms
99% of requests < 1,000 ms
Score pipeline
Freshness The proportion of records read from the league table that were updated recently. “Recently” is defined as within 1 minute, or within 10 minutes.
Uses metrics from the API and HTTP server: count of all data_requests for "api" and "web" with freshness less than or equal to 1 minute divided by count of all data_requests

count of all data_requests for "api" and "web" with freshness less than or equal to 10 minutes divided by count of all data_requests
90% of reads use data written within the previous 1 minute. 99% of reads use data written within the previous 10 minutes.
Correctness The proportion of records injected into the state table by a correctness prober that result in the correct data being read from the league table.
A correctness prober injects synthetic data, with known correct outcomes, and exports a success metric: count of all data_requests which were correct divided by count of all data_requests
99.99999% of records injected by the prober result in the correct output.
Completeness The proportion of hours in which 100% of the games in the data store were processed (no records were skipped).
Uses metrics exported by the score pipeline: count of all pipeline runs that processed 100% of the records divided by count of all pipeline runs
99% of pipeline runs cover 100% of the data.

理论

可用性和延迟SLI基于2018年1月1日至2018年1月28日之间的测量。可用性SLO向下舍入到最接近的1%,而延迟SLO时序被舍入到最接近的50 ms。作者选择了所有其他数字,并验证了这些服务正在或高于这些级别运行。

尚未尝试验证这些数字是否与用户体验密切相关。1

错误预算

每个目标都有一个单独的错误预算,定义为该目标的100%减去(-)目标。例如,如果在前四周中有1,000,000个请求发送到API服务器,则API可用性错误预算为1,000,000中的3%(100%-97%):30,000个错误。

当我们的任何目标用尽错误预算时,我们将制定错误预算政策(请参阅附录B)。

说明和警告

  • 请求指标是在负载均衡器上测量的。此度量可能无法准确度量用户请求未到达负载均衡器的情况。

  • 我们仅将HTTP 5XX状态消息视为错误代码;其他一切都算成功。

  • 正确性探测器使用的测试数据包含大约200个测试,每1秒注入一次。我们的错误预算是每四周48个错误。



Footnotes

  1. 即使SLO中的数字不是严格依据证据的,也有必要对此进行记录,以使将来的读者可以理解这一事实,并做出适当的决定。他们可能会决定值得收集更多证据的投资。