Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce a balancer interface #2543

Merged
merged 4 commits into from
May 28, 2018

Conversation

ElvinEfendi
Copy link
Member

@ElvinEfendi ElvinEfendi commented May 18, 2018

What this PR does / why we need it:
This PR refactors balancer logic to be more modular, extensible and testable. It also fixes bug with returning 503 by using rewrite_by_lua handler. Now balancer.lua has four public methods namely init_worker(), rewrite(), balance(), and log() which get called in their respective phase.

init_worker() function kicks in when Nginx worker starts. It calls private sync_backends() function that takes care of initializing, updating and deleting respective balancer instances for the backends. sync_backends() also gets called periodically in every 1 second to keep the backend balancers in sync for the given Nginx worker.

rewrite() function takes care of returning 503 when there's no balancer for the backend current request corresponds to.

balancer() gets the balancer instance for given request/backend and calls :balance() function which executes the respective load balancing algorithm and returns host, port pair. balancer() then takes care of calling Nginx API function to set the upstream peer.

log() function gets the balancer instance for given request/backend and calls :after_balance(). Currently only EWMA algorithm implements after_balance(). This is useful when the LB algorithm needs to store some stats related to i.e response times or response status code etc.

--

With this PR if someone wants to introduce a new load balancing algorithm they can create the respective Lua module in balancer/ folder and register it in balancer.lua. Every new load balancing module has to implement at least new(backend), sync(backend) and balance(). Optionally they can implement after_balance() as well.

--

Once this PR gets merged I'll add more unit tests which will most likely make some of the related e2e tests weak enough that we can just delete.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 18, 2018
@ElvinEfendi ElvinEfendi changed the title [ci-skip] [WIP] introduce a balancer interface [WIP] introduce a balancer interface May 18, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2018
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 26, 2018
@ElvinEfendi ElvinEfendi force-pushed the lua-balancer-refactoring branch 2 times, most recently from c711cfa to 422ef31 Compare May 26, 2018 00:11
@codecov-io
Copy link

Codecov Report

Merging #2543 into master will increase coverage by 0.13%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2543      +/-   ##
==========================================
+ Coverage   40.63%   40.77%   +0.13%     
==========================================
  Files          74       74              
  Lines        5094     5094              
==========================================
+ Hits         2070     2077       +7     
+ Misses       2743     2737       -6     
+ Partials      281      280       -1
Impacted Files Coverage Δ
cmd/nginx/main.go 29% <0%> (+5.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5958c0...5d12848. Read the comment docs.

@ElvinEfendi ElvinEfendi changed the title [WIP] introduce a balancer interface introduce a balancer interface May 26, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2018
@ElvinEfendi
Copy link
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 26, 2018
@ElvinEfendi
Copy link
Member Author

/assign @aledbf

@aledbf
Copy link
Member

aledbf commented May 26, 2018

@ElvinEfendi this looks really good. Just one question about this, can you add information about the ingress/service in the log messages? Not necessarily in this PR but I think we need more information than just IP/Port in case of failures. WDYT?

@ElvinEfendi
Copy link
Member Author

@andrewlouis93 @csfrancis @ibawt would be great if some of you can take a look and give feedback as well.

@ElvinEfendi
Copy link
Member Author

ElvinEfendi commented May 26, 2018

Just one question about this, can you add information about the ingress/service in the log messages? Not necessarily in this PR but I think we need more information than just IP/Port in case of failures. WDYT?

@aledbf $proxy_upstream_name is already part of Nginx log message by default, so we can know ingress, service and namespace by looking at proxy_upstream_name's value in Nginx log for the request. And other errors happening for example in sync_backends which is not in request path, are not about particular ingress/service. If I'm missing something please let me know.

Later on I'd like to make balancer.name part of log messages in request path though. Or maybe introduce a new Nginx variable and set it to balancer.name and make it part of Nginx access log.

@aledbf
Copy link
Member

aledbf commented May 28, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 28, 2018
@andrewloux
Copy link
Contributor

/assign @andrewlouis93

@k8s-ci-robot
Copy link
Contributor

@andrewlouis93: GitHub didn't allow me to assign the following users: andrewlouis93.

Note that only kubernetes members and repo collaborators can be assigned.

In response to this:

/assign @andrewlouis93

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

return backend
end
local function get_implementation(backend)
local name = backend["load-balance"] or DEFAULT_LB_ALG
Copy link
Contributor

@andrewloux andrewloux May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe out the scope of this PR - but could we have the DEFAULT_LB_ALG set per ingress?

If a user wanted to use a default lb algorithm it would currently require each app behind the ingress to update their load-balance attribute. WDYT?

Copy link
Member Author

@ElvinEfendi ElvinEfendi May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewlouis93 load-balance is also a configmap property so you can have a default value of it per ingress deployment and then customize further per app using the annotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but currently this lua code is not reading the load-balance configmap property is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend["load-balance"] is not necessary only load-balance annotation. It is whatever is set to the Backend struct in controller side. So yeah Lua will get custom load-balance algorithm when you set it in configmap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, makes sense :)

if not backend then
return nil
if backend["sessionAffinityConfig"] and backend["sessionAffinityConfig"]["name"] == "cookie" then
name = "sticky"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a backend load-balance algorithm is set, and is overriden here since sessionAffinity is required - should we emit a log message here indicating that the load-balance implementation is being ignored?

Copy link
Member Author

@ElvinEfendi ElvinEfendi May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a default behaviour in non dynamic mode as well. We can maybe document better how this annotation overrides load-balance annotation and config, but I don't see how logging here would add any value.

That being said we can do INFO logging if you think it is valuable. But I'd rather see this becoming necessity in practice first and then add the logging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said we can do INFO logging if you think it is valuable. But I'd rather see this becoming necessity in practice first and then add the logging.

💯

@ibawt
Copy link
Contributor

ibawt commented May 28, 2018

code LGTM @ElvinEfendi did you any specific worries you'd like me to go over in depth?

function _M.balance()
local balancer = get_balancer()
if not balancer then
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to log something if not balancer here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewlouis93 https://github.com/kubernetes/ingress-nginx/pull/2543/files#diff-b00d77a6df9c8c05a483044b08e6bc50R103 short-circuits when balancer does not exist and this code does not even get executed. And we get 503 response code

Copy link
Contributor

@andrewloux andrewloux May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I thought about that - but consider the case when a sync_backends that results in balancers = {} occurs between a request's rewrite and balance phase?

end
end

function _M.log()
local balancer = get_balancer()
if not balancer then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElvinEfendi
Copy link
Member Author

@ibawt thanks for looking into, reviewing general design approach is enough for now!

Copy link

@csfrancis csfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only really looked at balancer.lua, but I mostly approve of the interface. I question if rewrite is necessary, but everything else looks good.

return
end

balancer:after_balance()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be checking if the function exists, since it can be optional.

end

local host, port = balance()
local host, port = balancer:balance()
if not host then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you be checking that port is set as well?

if not host then
ngx.status = ngx.HTTP_SERVICE_UNAVAILABLE
return ngx.exit(ngx.status)
ngx.log(ngx.WARN, "no host returned, balancer: " .. balancer.name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you got rid of the 503 here? To me it seems like it would be better to be more explicit about this error case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned at https://github.com/kubernetes/ingress-nginx/pull/2543/files#r191262645 it is not possible to modify status at this phase.

end

ngx_balancer.set_more_tries(1)

local ok, err = ngx_balancer.set_current_peer(host, port)
if not ok then
ngx.log(ngx.ERR, string.format("error while setting current upstream peer to %s", tostring(err)))
ngx.log(ngx.ERR, "error while setting current upstream peer to " .. tostring(err))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here ... should this return a 503?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local balancer = get_balancer()
if not balancer then
ngx.status = ngx.HTTP_SERVICE_UNAVAILABLE
return ngx.exit(ngx.status)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the ingress-nginx config, but I'm guessing this means that every configured upstream will be configured to use Lua load balancing? Likewise, will rewrite_by_lua be called for every configured site? Why not move this logic into balance instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csfrancis this was in balance() before. But it turns out modifying ngx.status is not allowed in balance phase.

but I'm guessing this means that every configured upstream will be configured to use Lua load balancing

that's correct it's either all or none

return
end

if getmetatable(balancer) ~= implementation then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird .. why are you calling getmetatable here?

Copy link
Member Author

@ElvinEfendi ElvinEfendi May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csfrancis because of the way OOP implemented in Lua. Another option would be to compare .name but IMO this is more elegant.

If you notice new() function in implementations return a new Lua table and sets its metatable to self.

So this logic is to detect the situation where an app starts with let's say EWMA, then in next deploy switches to Round Robin. This condition will detect that and replace EWMA instance with new RR instance for the backend.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird .. that was not immediately obvious to me. More elegant, but I feel like .name is more clear. But that's just my opinion, man.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment would make this clearer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csfrancis I wrote this with .name initially - another reason why I switched to the current version was that if we use .name for this then we have to make that field mandatory for every implementation. Currently .name is more for logging purposes and optional.

I'll extract this into a function with a meaningful name :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe just add comment for now

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels May 28, 2018
@ElvinEfendi
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2018
@aledbf
Copy link
Member

aledbf commented May 28, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 28, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, ElvinEfendi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit f099d37 into kubernetes:master May 28, 2018
@ElvinEfendi ElvinEfendi deleted the lua-balancer-refactoring branch May 28, 2018 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants