Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core/dns): new dns client library #12305

Merged
merged 126 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
6192fb9
refactor(dns): new dns client library
chobits Feb 21, 2024
6d29383
add files to kong-3.7.0-0.rockspec
chobits Feb 21, 2024
8d1e466
30-new-dns-client/02-old_client_spec.lua: use CI nameserver instead
chobits Feb 21, 2024
bf40be8
return last answer error if no available answers
chobits Feb 21, 2024
a46b3c7
set _G.busted_legacy_dns_client for original 21-dns-client/ tests
chobits Feb 22, 2024
b40e3ec
chores: better comment
chobits Feb 22, 2024
b40e417
add changelog
chobits Feb 22, 2024
78d54a3
automatically refresh stale-but-in-use records after @stale_refresh_i…
chobits Feb 26, 2024
07fbc34
revert "automatically refresh stale-but-in-use records after @stale_r…
chobits Feb 27, 2024
e77bd8d
add kong_dns_cache{_miss} shared dict into templates/nginx_kong.lua
chobits Feb 27, 2024
2ed541e
only purge cache for test cases
chobits Feb 27, 2024
18c9c65
use kong.worker_events instead of mlcache shm based ipc
chobits Feb 28, 2024
8d22bdf
remove debug log for @tries
chobits Feb 28, 2024
3a7104d
support req_dyn_hook.run_hooks
chobits Feb 28, 2024
01dfaf2
supports __tostring of @tries table (error list)
chobits Feb 28, 2024
157650b
coding style: use a 2-space indentation and localized some variables
chobits Feb 29, 2024
cb9293c
coding style: change table.insert to table_insert
chobits Feb 29, 2024
1340b97
fix test case of stale updating task
chobits Feb 29, 2024
3b4f0c6
fix bug: should insert `nil` value as missed data into mlcache
chobits Feb 29, 2024
0c3f595
simplify injecting resolver.query logic in tests 30-new-dns-client/*
chobits Mar 5, 2024
904f8b3
optimize cache inserting logic to avoid unnecessary IPC to broadcast …
chobits Mar 6, 2024
8877f9e
avoid running callback from local worker's events and add tests for IPC
chobits Feb 29, 2024
0338f1b
coding style fix and keep the error string consistent with the previo…
chobits Mar 7, 2024
a8cc892
fix shared_dict shm size
chobits Mar 7, 2024
c7f8cba
fix typo: CACHE_ONLY_MISS_ANSWERS -> CACHE_ONLY_ANSWERS
chobits Mar 7, 2024
cb3b499
fix test case: 01-request-debug_spec.lua: dns cache hit
chobits Mar 7, 2024
50e253c
copy the provided opts table with new function copy_options
chobits Mar 8, 2024
1812538
create timer using a static function instead of recreating closures
chobits Mar 8, 2024
4a954c9
add constant LONG_LASTING_TTL for 10 years ttl value
chobits Mar 8, 2024
92b40df
add comment for maximum TTL value: 0xffffffff
chobits Mar 8, 2024
7acc0e0
fix coding styles and add more comments
chobits Mar 8, 2024
c57a449
add comment for sleep(0.2) in 04-round_robin_spec.lua
chobits Mar 8, 2024
89882dd
coding style: removed unnecessary blank line in 04-round_robin_spec.lua
chobits Mar 8, 2024
de35aff
fixed flakiness of stale updating test case in 03-old_client_cache_sp…
chobits Mar 8, 2024
49846e3
fix error message and update test case titles
chobits Mar 12, 2024
6e3fcdf
fix bug that stale records will be not updated if querying nameserver…
chobits Mar 12, 2024
cb1781d
compatible with original dns client: skip the SRV record pointing to …
chobits Mar 13, 2024
a9916e5
revert shm_miss feature, which makes source code more complex
chobits Mar 14, 2024
4a5516c
support admin API "/dns" to get statistics
chobits Mar 13, 2024
c4eedb7
fix lint error
chobits Mar 14, 2024
936a25d
complete the release file: refactor_dns_client.yml
chobits Mar 14, 2024
ada4b54
chore: assign TYPE_LAST to _M.TYPE_LAST instead of -1
chobits Mar 15, 2024
5d124db
Update release file
chobits Mar 15, 2024
8f15f13
fix text of `dns_no_sync` option in refactor_dns_client.yml
chobits Mar 18, 2024
6cc8d4e
process the scenario of timeout=0 in /etc/resolv.conf
chobits Mar 18, 2024
39d18cb
chores(*): fix coding style; add comments; make constant records read…
chobits Mar 18, 2024
1e19818
add a comment to explain of the concurrenct control of asynchronous t…
chobits Mar 18, 2024
0d63172
fixed lock_timeout: r:query() has two IO operations send() & receive()
chobits Mar 19, 2024
b7a6ccc
automatically refresh stale-but-in-use records every 60s triggered by…
chobits Mar 19, 2024
e484ecf
added kong/resty/dns_client/README.md
chobits Mar 21, 2024
ad2cdb5
change statistics API path from /dns to /status/dns
chobits Mar 21, 2024
b0afa32
d11y: add key-value "query_last_time": "<unixtime> <duration>" into s…
chobits Mar 21, 2024
3e533f7
fixed markdown format of kong/resty/dns_client/README.md
chobits Mar 21, 2024
04e5a72
fix format for kong/resty/dns_client/README.md
chobits Mar 25, 2024
47bae5f
add debug logs
chobits Mar 25, 2024
e789fc6
fix refactor_dns_client.yml to make it more user-friendly
chobits Mar 26, 2024
2ccc33d
chore: use string_lower instead of <var>:lower() for debugging
chobits Mar 26, 2024
c3bcbcf
chores: refactor variable names
chobits Mar 27, 2024
223ac01
fixed coding style(add spaces) and fix resolv.options.timeout checking
chobits Mar 27, 2024
e9a6485
move ip address answers generating logic into cache:get callback
chobits Mar 27, 2024
6668b7c
modify some table_insert to "t[i] = v" and check order instead of che…
chobits Mar 27, 2024
1ceb711
use empty table for opts as default value in _M.new()
chobits Mar 27, 2024
23f0dc5
perf: return body directly instead of creating a local variable
chobits Mar 27, 2024
f3cca27
fix status code to 501 if dns stats not implemented for API "/status/…
chobits Mar 27, 2024
22bb755
perf: convert variables (localhosts/empty_answers) to constants
chobits Mar 27, 2024
4ed3f79
perf: firstly check for tailing dot in is_fqdn
chobits Mar 28, 2024
408bbaf
chore: better comment for parseResolvConf-TODO
chobits Mar 28, 2024
c75e7d6
ensure valid_ttl doesn't exceed maximum ttl 0xffffffff
chobits Mar 28, 2024
5b758b2
chore: rename get_round_robin_answers to get_next_round_robin_answers
chobits Mar 28, 2024
bf5f756
perf: dont use table as input parameters for APIs and add a new API `…
chobits Mar 28, 2024
10be035
README.md: add apis `resolve_address` and and fix format
chobits Mar 28, 2024
f28ee49
perf: convert some variables local constants
chobits Mar 28, 2024
5242043
improve readability: list _M.TYPE_XXX value directly
chobits Mar 29, 2024
e9d570f
refactor function name and fix lint issue
chobits Mar 29, 2024
14fe836
refactor function names for better test
chobits Mar 29, 2024
6cba090
chores: do not check for r.destroy before using it
chobits Mar 29, 2024
07a2f75
1
chobits Mar 29, 2024
59fa5f3
move library path to kong/dns
chobits Mar 31, 2024
81746b8
mark it TODO to convert ipc to a module contant
chobits Apr 1, 2024
bb436a4
use do-end block to wrap init_hosts and insert_answer_into_cache
chobits Apr 1, 2024
45d6d86
add comments and test cases for API utils.ipv6_bracket
chobits Apr 1, 2024
46f68b5
remove unused kong.tools.utils requirement in test cases
chobits Apr 1, 2024
d6847bc
add comments for cwid checking and hosts
chobits Apr 1, 2024
011255b
chores: fix some coding styles
chobits Apr 1, 2024
4c224b4
Update kong/dns/README.md: remove `the` word
chobits Apr 2, 2024
7793247
remove use of readonly function for cached DNS records
chobits Apr 2, 2024
4eaace2
fix coding style: localize SWRR logic
chobits Apr 2, 2024
4f31f16
re-insert hosts entries to cache if it is evicted
chobits Apr 2, 2024
75f45c8
chores: remove = aligning
chobits Apr 7, 2024
47a895d
remove empty table creation in hot code paths
chobits Apr 7, 2024
2cb279c
fix lint error: resolve_names -> resolved_names
chobits Apr 7, 2024
4727cf7
chores: fix a couple of missing localizations
chobits Apr 7, 2024
7c5fd97
fix opts initialization in _M.init()
chobits Apr 8, 2024
444f47b
remove local variable options for r:query
chobits Apr 8, 2024
96f9329
avoid checking for `ngx.ctx.has_timing` in recursion
chobits Apr 8, 2024
afdfcb6
use `legacy_dns_client` switch to check if we need to reply 501 in /s…
chobits Apr 8, 2024
b1fa80a
added debug log for EE test cases
chobits Apr 8, 2024
8ebc538
chores: fixed lines exceeding 80 characters by a large margin
chobits Apr 8, 2024
95fc78e
compatible with the modified req dyc debug API
chobits May 30, 2024
22cd510
remove the logic of CNAME and recursive detection
chobits May 29, 2024
03aefed
remove LAST type logic
chobits Jun 4, 2024
6958eb2
only use error_ttl, remove empty_ttl logic
chobits Jun 17, 2024
f9911ea
fix type in readme.md
chobits Jun 17, 2024
34783db
change paths of test cases directory
chobits Jun 18, 2024
a0786a9
set legacy_dns_client off for some cases
chobits Jun 18, 2024
12c63fb
update changelog yml
chobits Jun 18, 2024
a50282d
disable additional section & add tests
chobits Jun 25, 2024
91c896c
further simplify code: either query A/AAAA or SRV
chobits Jun 26, 2024
3504b26
revert pathes modification for conflicts
chobits Jun 26, 2024
602ff4f
fix health check tests for SRV
chobits Jun 26, 2024
654c776
fix /status/dns test cases
chobits Jun 28, 2024
d0d196c
chores(dns): fixed coding style
chobits Jul 1, 2024
350f2bd
chores(dns): fixed coding style: MT -> _MT
chobits Jul 1, 2024
91b1f2d
@chobits chores(dns): fixed coding style: remove () from srv port
chobits Jul 2, 2024
b6955f0
chores(dns): fix coding style
chobits Jul 5, 2024
67813e2
chores(test): fix typo, return `ttl` instead of `tries`
chobits Jul 5, 2024
dd26cf2
fix conflicts: remove modification in test: 01-instrumentations_spec.lua
chobits Jul 7, 2024
b69ec7f
fix conflicts and its tests
chobits Jul 7, 2024
97705b9
chores(dns/README.md): fixed types
chobits Jul 8, 2024
28770c0
perf(dns): reduce table creation
chobits Jul 11, 2024
593f4ed
fixed coding styles: add more blanks and rename some variables
chobits Jul 12, 2024
b0d5455
add option:random_resolver and fixed docs
chobits Jul 12, 2024
7ce9599
change seperator from `:` to `|` in the output of API /status/dns
chobits Jul 12, 2024
37cf30d
add a TODO for more structured `tries`
chobits Jul 12, 2024
48994bc
doc: perf test for memory consumption
chobits Jul 12, 2024
1f0bc17
stale_ttl: fix expired time caculation
chobits Jul 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions changelog/unreleased/kong/refactor_dns_client.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
message: >
Starting from this version, a new DNS client library has been implemented and added into Kong. The new DNS client library has the following changes
- Introduced global caching for DNS records across workers, significantly reducing the query load on DNS servers.
- Introduced observable statistics for the new DNS client, and a new Admin API `/status/dns` to retrieve them.
- Deprecated the `dns_no_sync` option. Multiple DNS queries for the same name will always be synchronized (even across workers). This remains functional with the legacy DNS client library.
- Deprecated the `dns_not_found_ttl` option. It uses the `dns_error_ttl` option for all error responses. This option remains functional with the legacy DNS client library.
- Deprecated the `dns_order` option. By default, SRV, A, and AAAA are supported. Only names in the SRV format (`_service._proto.name`) enable resolving of DNS SRV records.
type: feature
scope: Core
4 changes: 4 additions & 0 deletions kong-3.8.0-0.rockspec
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ build = {

["kong.resty.dns.client"] = "kong/resty/dns/client.lua",
["kong.resty.dns.utils"] = "kong/resty/dns/utils.lua",

["kong.dns.client"] = "kong/dns/client.lua",
["kong.dns.utils"] = "kong/dns/utils.lua",

["kong.resty.ctx"] = "kong/resty/ctx.lua",

["kong.resty.mlcache"] = "kong/resty/mlcache/init.lua",
Expand Down
17 changes: 16 additions & 1 deletion kong/api/routes/kong.lua
Original file line number Diff line number Diff line change
Expand Up @@ -269,5 +269,20 @@ return {
}
return kong.response.exit(200, body)
end
}
},
["/status/dns"] = {
GET = function (self, db, helpers)
if kong.configuration.legacy_dns_client then
return kong.response.exit(501, { message = "not implemented with the legacy DNS client" })
end
chobits marked this conversation as resolved.
Show resolved Hide resolved

return kong.response.exit(200, {
worker = {
id = ngx.worker.id() or -1,
count = ngx.worker.count(),
},
stats = kong.dns.stats(),
})
end
},
}
1 change: 1 addition & 0 deletions kong/conf_loader/constants.lua
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,7 @@ local CONF_PARSERS = {
dns_not_found_ttl = { typ = "number" },
dns_error_ttl = { typ = "number" },
dns_no_sync = { typ = "boolean" },
chobits marked this conversation as resolved.
Show resolved Hide resolved
legacy_dns_client = { typ = "boolean" },
privileged_worker = {
typ = "boolean",
deprecated = {
Expand Down
174 changes: 174 additions & 0 deletions kong/dns/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
Name
chobits marked this conversation as resolved.
Show resolved Hide resolved
====

Kong DNS client - The module is currently only used by Kong, and builds on top of the `lua-resty-dns` and `lua-resty-mlcache` libraries.

Table of Contents
=================

* [Name](#name)
* [APIs](#apis)
* [new](#new)
* [resolve](#resolve)
* [resolve_address](#resolve_address)
* [Performance characteristics](#performance-characteristics)
* [Memory](#memory)

# APIs
chobits marked this conversation as resolved.
Show resolved Hide resolved

The following APIs are for internal development use only within Kong. In the current version, the new DNS library still needs to be compatible with the original DNS library. Therefore, the functions listed below cannot be directly invoked. For example, the `_M:resolve` function in the following APIs will be replaced to ensure compatibility with the previous DNS library API interface specifications `_M.resolve`.

## new

**syntax:** *c, err = dns_client.new(opts)*
**context:** any

**Functionality:**

Creates a dns client object. Returns `nil` and a message string on error.

Performs a series of initialization operations:

* parse `host` file,
* parse `resolv.conf` file (used by the underlying `lua-resty-dns` library),
* initialize multiple TTL options,
* create a mlcache object and initialize it.

**Input parameters:**

`@opts` It accepts a options table argument. The following options are supported:

* TTL options:
* `valid_ttl`: (default: `nil`)
* By default, it caches answers using the TTL value of a response. This optional parameter (in seconds) allows overriding it.
* `stale_ttl`: (default: `3600`)
* the time in seconds for keeping expired DNS records.
* Stale data remains in use from when a record expires until either the background refresh query completes or until `stale_ttl` seconds have passed. This helps Kong stay resilient if the DNS server is temporarily unavailable.
* `error_ttl`: (default: `1`)
* the time in seconds for caching DNS error responses.
* `hosts`: (default: `/etc/hosts`)
* the path of `hosts` file.
* `resolv_conf`: (default: `/etc/resolv.conf`)
* the path of `resolv.conf` file, it will be parsed and passed into the underlying `lua-resty-dns` library.
* `family`: (default: `{ "SRV", "A", "AAAA" }`)
* the types of DNS records that the library should query, it is taken from `kong.conf` option `dns_family`.
* options for the underlying `lua-resty-dns` library:
* `retrans`: (default: `5`)
* the total number of times of retransmitting the DNS request when receiving a DNS response times out according to the timeout setting. When trying to retransmit the query, the next nameserver according to the round-robin algorithm will be picked up.
* If not given, it is taken from `resolv.conf` option `options attempts:<value>`.
* `timeout`: (default: `2000`)
* the time in milliseconds for waiting for the response for a single attempt of request transmission.
* If not given, it is taken from `resolv.conf` option `options timeout:<value>`. But note that its unit in `resolv.conf` is second.
* `random_resolver`: (default: `false`)
* a boolean flag controls whether to randomly pick the nameserver to query first. If `true`, it will always start with the random nameserver.
* If not given, it is taken from `resolv.conf` option `rotate`.
* `nameservers`:
* a list of nameservers to be used. Each nameserver entry can be either a single hostname string or a table holding both the hostname string and the port number. For example, `{"8.8.8.8", {"8.8.4.4", 53} }`.
* If not given, it is taken from `resolv.conf` option `nameserver`.
* `cache_purge`: (default: `false`)
* a boolean flag controls whether to clear the internal cache shared by other DNS client instances across workers.

[Back to TOC](#table-of-contents)

## resolve

**syntax:** *answers, err, tries? = resolve(qname, qtype, cache_only, tries?)*
**context:** *rewrite_by_lua\*, access_by_lua\*, content_by_lua\*, ngx.timer.\**

**Functionality:**

Performs a DNS resolution.

1. Check if the `<qname>` matches SRV format (`\_service.\_proto.name`) to determine the `<qtype>` (SRV or A/AAAA), then use the key `<qname>:<qtype>` to query mlcache. If cached results are found, return them directly.
2. If there are no results available in the cache, it triggers the L3 callback of `mlcache:get` to query records from the DNS servers, details are as follows:
1. Check if `<qname>` has an IP address in the `hosts` file, return if found.
2. Check if `<qname>` is an IP address itself, return if true.
3. Use `mlcache:peek` to check if the expired key still exists in the shared dictionary. If it does, return it directly to mlcache and trigger an asynchronous background task to update the expired data (`start_stale_update_task`). The maximum time that expired data can be reused is `stale_ttl`, but the maximum TTL returned to mlcache cannot exceed 60s. This way, if the expired key is not successfully updated by the background task after 60s, it can still be reused by calling the `resolve` function from the upper layer to trigger the L3 callback to continue executing this logic and initiate another background task for updating.
1. For example, with a `stale_ttl` of 3600s, if the background task fails to update the record due to network issues during this time, and the upper-level application continues to call resolve to get the domain name result, it will trigger a background task to query the DNS result for that domain name every 60s, resulting in approximately 60 background tasks being triggered (3600s/60s).
4. Query the DNS server, with `<qname>:<qtype>` combinations:
1. The `<qname>` is extended according to settings in `resolv.conf`, such as `ndots`, `search`, and `domain`.

**Return value:**

* Return value `answers, err`:
* Return one array-like Lua table contains all the records.
chobits marked this conversation as resolved.
Show resolved Hide resolved
* For example, `{{"address":"[2001:db8:3333:4444:5555:6666:7777:8888]","class":1,"name":"example.test","ttl":30,"type":28},{"address":"192.168.1.1","class":1,"name":"example.test","ttl":30,"type":1},"expire":1720765379,"ttl":30}`.
* IPv6 addresses are enclosed in brackets (`[]`).
* If the server returns a non-zero error code, it will return `nil` and a string describing the error in this record.
* For example, `nil, "dns server error: name error"`, the server returned a result with error code 3 (NXDOMAIN).
* In case of severe errors, such network error or server's malformed DNS record response, it will return `nil` and a string describing the error instead. For example:
* `nil, "dns server error: failed to send request to UDP server 10.0.0.1:53: timeout"`, there was a network issue.
* Return value and input parameter `@tries?`:
* If provided as an empty table, it will be returned as a third result. This table will be an array containing the error message for each (if any) failed try.
* For example, `[["example.test:A","dns server error: 3 name error"], ["example.test:AAAA","dns server error: 3 name error"]]`, both attempts failed due to a DNS server error with error code 3 (NXDOMAIN), indicating a name error.
chobits marked this conversation as resolved.
Show resolved Hide resolved

**Input parameters:**

* `@qname`: the domain name to resolve.
* `@qtype`: (optional: `nil` or DNS TYPE value)
* specify the query type instead of `self.order` types.
* `@cache_only`: (optional: `boolean`)
* control whether to solely retrieve data from the internal cache without querying to the nameserver.
* `@tries?`: see the above section `Return value and input paramter @tries?`.

[Back to TOC](#table-of-contents)

## resolve_address

**syntax:** *ip, port_or_err, tries? = resolve_address(name, port, cache_only, tries?)*
**context:** *rewrite_by_lua\*, access_by_lua\*, content_by_lua\*, ngx.timer.\**

**Functionality:**

Performs a DNS resolution, and return a single randomly selected address (IP and port number).

When calling multiple times on cached records, it will apply load-balancing based on a round-robin (RR) scheme. For SRV records, this will be a _weighted_ round-robin (WRR) scheme (because of the weights it will be randomized). It will apply the round-robin schemes on each level individually.

**Return value:**

* Return value `ip, port_or_err`:
* Return one IP address and port number from records.
* Return `nil, err` if errors occur, with `err` containing an error message.
* Return value and input parameter `@tries?`: same as `@tries?` of `resolve` API.

**Input parameters:**

* `@name`: the domain name to resolve.
* `@port`: (optional: `nil` or port number)
* default port number to return if none was found in the lookup chain (only SRV records carry port information, SRV with `port=0` will be ignored).
* `@cache_only`: (optional: `boolean`)
* control whether to solely retrieve data from the internal cache without querying to the nameserver.

[Back to TOC](#table-of-contents)

# Performance characteristics

## Memory

We evaluated the capacity of DNS records using the following resources:

* Shared memory size:
* 5 MB (by default): `lua_shared_dict kong_dns_cache 5m`.
* 10 MB: `lua_shared_dict kong_dns_cache 10m`.
* DNS response:
* Each DNS resolution response contains some number of A type records.
* Record: ~80 bytes json string, e.g., `{address = "127.0.0.1", name = <domain>, ttl = 3600, class = 1, type = 1}`.
* Domain: ~36 bytes string, e.g., `example<n>.long.long.long.long.test`. Domain names with lengths between 10 and 36 bytes yield similar results.

The results of ) are as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is )? A mistake?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to fix it in #13389


| shared memory size | number of records per response | number of loaded responses |
|--------------------|-------------------|----------|
| 5 MB | 1 | 20224 |
| 5 MB | 2 ~ 3 | 10081 |
| 5 MB | 4 ~ 9 | 5041 |
| 5 MB | 10 ~ 20 | 5041 |
| 5 MB | 21 ~ 32 | 1261 |
| 10 MB | 1 | 40704 |
| 10 MB | 2 ~ 3 | 20321 |
| 10 MB | 4 ~ 9 | 10161 |
| 10 MB | 10 ~ 20 | 5081 |
| 10 MB | 20 ~ 32 | 2541 |


[Back to TOC](#table-of-contents)
Loading
Loading