Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting to Azure Redis Cluster #1074

Closed
hsadoyan opened this issue Feb 17, 2022 · 46 comments · Fixed by #1076
Closed

Connecting to Azure Redis Cluster #1074

hsadoyan opened this issue Feb 17, 2022 · 46 comments · Fixed by #1076

Comments

@hsadoyan
Copy link

Hey folks,

We have a rails app we're trying to connect to azure redis. I have a test cluster provisioned, and can connect as a single instance, but trying to connect in clustered mode gives an error: Redis Client could not connect to any cluster nodes

Here's a minimal config:

host = 'hostname'
key = 'password'
port = 6380
database = 0

connection_url = "rediss://:#{CGI.escape(key)}@#{host}:#{port}/#{database}"

client = Redis.new(cluster: [connection_url])

The documentation stated that I can pass in one instance like this and it'll discover the rest of the nodes through the CLUSTER NODES command.

If I switch from cluster to url I can connect to a redis node, but of course half the writes fail with a MOVED message.

I was able to find one SO post of somebody having a similar issue in 2019 that was never resolved, but otherwise no documentation that's redis-rb and Azure Redis specific

@supercaracal
Copy link
Contributor

The above way of connecting Redis cluster looks like legitimate. I will try to enhance the error messages. It might be caused by ACL.

@hsadoyan
Copy link
Author

I'm trying to connect to a Redis6 cluster, but it doesn't support ACL. Is there another way around this?

@hsadoyan
Copy link
Author

hsadoyan commented Feb 18, 2022

Thanks for the extra logging! Very helpful. The actual error I'm getting is WRONGPASS invalid username-password pair (Redis::Cluster::InitialSetupError)

I'm not actually passing a username, only a password@hostname. Interesting thing is: that configuration worked for non-clustered redis, or if I'm trying to call it as a single instance with url: connection_string. Is the client doing something different when connecting as a cluster? Or is it likely to be some config in Azure redis itself?

@supercaracal
Copy link
Contributor

supercaracal commented Feb 18, 2022

It might be a bug in the client of cluster mode. I'll look into the differences of implementation and behavior between stand-alone mode and cluster mode.

uri = URI(url)
case uri.scheme
when "unix"
defaults[:path] = uri.path
when "redis", "rediss"
defaults[:scheme] = uri.scheme
defaults[:host] = uri.host if uri.host
defaults[:port] = uri.port if uri.port
defaults[:username] = CGI.unescape(uri.user) if uri.user && !uri.user.empty?
defaults[:password] = CGI.unescape(uri.password) if uri.password && !uri.password.empty?
defaults[:db] = uri.path[1..-1].to_i if uri.path
defaults[:role] = :master
else
raise ArgumentError, "invalid uri scheme '#{uri.scheme}'"
end
defaults[:ssl] = true if uri.scheme == "rediss"

uri = URI(addr)
raise InvalidClientOptionError, "Invalid uri scheme #{addr}" unless VALID_SCHEMES.include?(uri.scheme)
db = uri.path.split('/')[1]&.to_i
{ scheme: uri.scheme, username: uri.user, password: uri.password, host: uri.host, port: uri.port, db: db }
.reject { |_, v| v.nil? || v == '' }

def test_option_class
option = Redis::Cluster::Option.new(cluster: %w[redis://127.0.0.1:7000], replica: true)
assert_equal({ '127.0.0.1:7000' => { scheme: 'redis', host: '127.0.0.1', port: 7000 } }, option.per_node_key)
assert_equal true, option.use_replica?
option = Redis::Cluster::Option.new(cluster: %w[redis://127.0.0.1:7000], replica: false)
assert_equal({ '127.0.0.1:7000' => { scheme: 'redis', host: '127.0.0.1', port: 7000 } }, option.per_node_key)
assert_equal false, option.use_replica?
option = Redis::Cluster::Option.new(cluster: %w[redis://127.0.0.1:7000])
assert_equal({ '127.0.0.1:7000' => { scheme: 'redis', host: '127.0.0.1', port: 7000 } }, option.per_node_key)
assert_equal false, option.use_replica?
option = Redis::Cluster::Option.new(cluster: %w[rediss://johndoe:foobar@127.0.0.1:7000/1/namespace])
assert_equal({ '127.0.0.1:7000' => { scheme: 'rediss', username: 'johndoe', password: 'foobar', host: '127.0.0.1', port: 7000, db: 1 } }, option.per_node_key)
option = Redis::Cluster::Option.new(cluster: %w[rediss://127.0.0.1:7000], scheme: 'redis')
assert_equal({ '127.0.0.1:7000' => { scheme: 'rediss', host: '127.0.0.1', port: 7000 } }, option.per_node_key)
option = Redis::Cluster::Option.new(cluster: %w[redis://bazzap:@127.0.0.1:7000], username: 'foobar')
assert_equal({ '127.0.0.1:7000' => { scheme: 'redis', username: 'bazzap', host: '127.0.0.1', port: 7000 } }, option.per_node_key)
option = Redis::Cluster::Option.new(cluster: %w[redis://:bazzap@127.0.0.1:7000], password: 'foobar')
assert_equal({ '127.0.0.1:7000' => { scheme: 'redis', password: 'bazzap', host: '127.0.0.1', port: 7000 } }, option.per_node_key)
option = Redis::Cluster::Option.new(cluster: %w[redis://127.0.0.1:7000/0], db: 1)
assert_equal({ '127.0.0.1:7000' => { scheme: 'redis', host: '127.0.0.1', port: 7000, db: 0 } }, option.per_node_key)
option = Redis::Cluster::Option.new(cluster: [{ host: '127.0.0.1', port: 7000 }])
assert_equal({ '127.0.0.1:7000' => { host: '127.0.0.1', port: 7000 } }, option.per_node_key)
assert_raises(Redis::InvalidClientOptionError) do
Redis::Cluster::Option.new(cluster: nil)
end
assert_raises(Redis::InvalidClientOptionError) do
Redis::Cluster::Option.new(cluster: %w[invalid_uri])
end
assert_raises(Redis::InvalidClientOptionError) do
Redis::Cluster::Option.new(cluster: [{ host: '127.0.0.1' }])
end
end

@hsadoyan
Copy link
Author

I was looking at the cluster_client_options_test.rb file on my own.

option = Redis::Cluster::Option.new(cluster: %w[redis://bazzap:@127.0.0.1:7000], username: 'foobar') 
   assert_equal({ '127.0.0.1:7000' => { scheme: 'redis', username: 'bazzap', host: '127.0.0.1', port: 7000 } }, option.per_node_key) 
  
   option = Redis::Cluster::Option.new(cluster: %w[redis://:bazzap@127.0.0.1:7000], password: 'foobar') 
   assert_equal({ '127.0.0.1:7000' => { scheme: 'redis', password: 'bazzap', host: '127.0.0.1', port: 7000 } }, option.per_node_key) 

These are the two types that make sense. The former (without ':') returns `Redis client could not fetch cluster information: NOAUTH Authentication required. (Redis::Cluster::InitialSetupError)'

The latter returns WRONGPASS invalid username-password pair (Redis::Cluster::InitialSetupError)

I'm not aware of a username I'd even attempt to pair with the password. Someone having a similar issue with the JS client suggested that default worked for them, but it didn't work for me.

@hsadoyan
Copy link
Author

I haven't been able to get the gem to build locally so I can play around with some of the tests myself, but I'll keep trying that

@hsadoyan
Copy link
Author

hsadoyan commented Feb 18, 2022

 option = Redis::Cluster::Option.new(cluster: [{ host: '127.0.0.1', port: 7000 }]) 
   assert_equal({ '127.0.0.1:7000' => { host: '127.0.0.1', port: 7000 } }, option.per_node_key) 

Trying to pass in a hash instead of a format string just results in a Connection timed out. (Trying to pass the same hash as a single instance does connect fine)

connection_url = { host: host, password: key, port: port, ssl: true }
client = Redis.new(cluster: [connection_url])

@supercaracal
Copy link
Contributor

I reproduced it in my local machine. It seems that special characters are doubly escaped. It is a client bug for cluster mode. I'll fix it later.

success in stand-alone mode with a url option:

irb(main):010:0> r = Redis.new url: "redis://:#{CGI.escape('!&<123-abc>')}@127.0.0.1:7000/0"
=> #<Redis client v4.6.0 for redis://127.0.0.1:7000/0>

irb(main):011:0> r.ping
=> "PONG"

success in cluster mode with a password option as plain text:

irb(main):018:0> r = Redis.new cluster: [{password: '!&<123-abc>', host: '127.0.0.1', port: 7000}]
=> #<Redis client v4.6.0 for redis://127.0.0.1:7000/0 redis://127.0.0.1:7001/0 redis://127.0.0.1:7002/0>

irb(main):019:0> r.ping
=> "PONG"

failure in cluster mode with a URI string:

irb(main):012:0> r = Redis.new cluster: %W[redis://:#{CGI.escape('!&<123-abc>')}@127.0.0.1:7000/0]
/path/to/redis-rb/lib/redis/cluster/slot_loader.rb:21:in `load': Redis client could not fetch cluster information: WRONGPASS invalid username-password pair or user is disabled. (Redis::Cluster::InitialSetupError)
        from /path/to/redis-rb/lib/redis/cluster.rb:116:in `fetch_cluster_info!'
        from /path/to/redis-rb/lib/redis/cluster.rb:26:in `initialize'
        from /path/to/redis-rb/lib/redis.rb:84:in `new'
        from /path/to/redis-rb/lib/redis.rb:84:in `initialize'
        from (irb):12:in `new'
        from (irb):12:in `<main>'
        from bin/console:8:in `<main>'

server configuration:

$ diff -u makefile /tmp/redis-rb-makefile
--- makefile    2022-02-20 19:02:23.660431802 +0900
+++ /tmp/redis-rb-makefile      2022-02-20 18:16:35.986257109 +0900
@@ -18,6 +18,7 @@
 CLUSTER_PID_PATHS  := $(addprefix ${TMP}/redis,$(addsuffix .pid,${CLUSTER_PORTS}))
 CLUSTER_CONF_PATHS := $(addprefix ${TMP}/nodes,$(addsuffix .conf,${CLUSTER_PORTS}))
 CLUSTER_ADDRS      := $(addprefix 127.0.0.1:,${CLUSTER_PORTS})
+PASSWORD           := !&<123-abc>

 define kill-redis
   (ls $1 > /dev/null 2>&1 && kill $$(cat $1) && rm -f $1) || true
@@ -43,21 +44,24 @@

 start: ${BINARY}
        @${BINARY}\
-               --daemonize  yes\
-               --pidfile    ${PID_PATH}\
-               --port       ${PORT}\
-               --unixsocket ${SOCKET_PATH}
+               --daemonize   yes\
+               --pidfile     ${PID_PATH}\
+               --port        ${PORT}\
+               --unixsocket  ${SOCKET_PATH}\
+               --requirepass '${PASSWORD}'

 stop_slave:
        @$(call kill-redis,${SLAVE_PID_PATH})

 start_slave: ${BINARY}
        @${BINARY}\
-               --daemonize  yes\
-               --pidfile    ${SLAVE_PID_PATH}\
-               --port       ${SLAVE_PORT}\
-               --unixsocket ${SLAVE_SOCKET_PATH}\
-               --slaveof    127.0.0.1 ${PORT}
+               --daemonize   yes\
+               --pidfile     ${SLAVE_PID_PATH}\
+               --port        ${SLAVE_PORT}\
+               --unixsocket  ${SLAVE_SOCKET_PATH}\
+               --slaveof     127.0.0.1 ${PORT}\
+               --requirepass '${PASSWORD}'\
+               --masterauth  '${PASSWORD}'

 stop_sentinel:
        @$(call kill-redis,${SENTINEL_PID_PATHS})
@@ -72,6 +76,7 @@
                echo 'sentinel down-after-milliseconds ${HA_GROUP_NAME} 5000'                >> $$conf;\
                echo 'sentinel failover-timeout        ${HA_GROUP_NAME} 30000'               >> $$conf;\
                echo 'sentinel parallel-syncs          ${HA_GROUP_NAME} 1'                   >> $$conf;\
+               echo 'sentinel auth-pass               ${HA_GROUP_NAME} ${PASSWORD}'         >> $$conf;\
                ${BINARY} $$conf\
                        --daemonize yes\
                        --pidfile   ${TMP}/redis$$port.pid\
@@ -105,7 +110,9 @@
                        --cluster-node-timeout 5000\
                        --pidfile              ${TMP}/redis$$port.pid\
                        --port                 $$port\
-                       --unixsocket           ${TMP}/redis$$port.sock;\
+                       --unixsocket           ${TMP}/redis$$port.sock\
+                       --requirepass          '${PASSWORD}'\
+                       --masterauth           '${PASSWORD}';\
        done

 create_cluster:
$ diff -u test/support/cluster/orchestrator.rb /tmp/redis-rb-cluster-helper.rb
--- test/support/cluster/orchestrator.rb        2022-02-20 19:02:23.660431802 +0900
+++ /tmp/redis-rb-cluster-helper.rb     2022-02-20 18:16:55.590696731 +0900
@@ -11,6 +11,7 @@
     @clients = node_addrs.map do |addr|
       Redis.new(url: addr,
                 timeout: timeout,
+                password: '!&<123-abc>',
                 reconnect_attempts: 10,
                 reconnect_delay: 1.5,
                 reconnect_delay_max: 10.0)

@hsadoyan
Copy link
Author

Thanks for the quick fix. After merging lastest master I no longer get the WRONGPASS, but I'm getting a new error now.

If I try with the format string: "rediss://:#{CGI.escape(key)}@#{host}:#{port}" I get:

redis-rb-a7c2a5bfacf9/lib/redis/connection/ruby.rb:264:in `connect_nonblock': SSL_connect returned=1 errno=0 state=error: certificate verify failed (unspecified certificate verification error) (OpenSSL::SSL::SSLError)

And if I try with [{ host: host, password: key, port: port, ssl: true }] I get:

redis-rb-a7c2a5bfacf9/lib/redis/connection/ruby.rb:58:in `block in _read_from_socket': Connection timed out (Redis::TimeoutError)

Is it possible there's a different issue underneath? I can connect with the same credentials in non-clustered mode

@supercaracal
Copy link
Contributor

I'll look into the issue later. There might be a another bug.

@supercaracal
Copy link
Contributor

supercaracal commented Feb 23, 2022

I've tried to reproduce SSL/TLS connection error with cluster in local machine using mutual self signed certs. However, it couldn't.

The former error is still under investigation. The latter error is a bug in cluster client. I'll fix it later.

irb(main):001:0> r = Redis.new cluster: %w[rediss://127.0.0.1:7000], ssl_params: {ca_file: File.join('test', 'support', 'ssl', 'trusted-ca.crt'),cert: OpenSSL::X509::Certificate.new(File.read(File.join('test', 'support', 'ssl', 'trusted-cert.crt'))),key: OpenSSL::PKey::RSA.new(File.read(File.join('test', 'support', 'ssl', 'trusted-cert.key')))}
=> #<Redis client v4.6.0 for redis://127.0.0.1:7000/0 redis://127.0.0.1:7001/0 redis://127.0.0.1:7002/0>

irb(main):002:0> r.ping
=> "PONG"

irb(main):003:0> r.cluster 'nodes'
=>
[{"node_id"=>"1014379650c31231d046b8163fcc5d64c1eab738", "ip_port"=>"127.0.0.1:7001@17001", "flags"=>["master"], "master_node_id"=>"-", "ping_sent"=>"0", "pong_recv"=>"1645606584000", "config_epoch"=>"2", "link_state"=>"connected", "slots"=>"5461".."10922"},
 {"node_id"=>"a3d1a4e4eb688ab795a0ddff4ec465d77b43e83c", "ip_port"=>"127.0.0.1:7005@17005", "flags"=>["slave"], "master_node_id"=>"9675ecc0378329e6d0f9903609a617f2da2a370c", "ping_sent"=>"0", "pong_recv"=>"1645606584793", "config_epoch"=>"3", "link_state"=>"connected", "slots"=>nil},
 {"node_id"=>"7ddabe0e224bf8d0846c998447a9e4108b57993a", "ip_port"=>"127.0.0.1:7004@17004", "flags"=>["slave"], "master_node_id"=>"1014379650c31231d046b8163fcc5d64c1eab738", "ping_sent"=>"0", "pong_recv"=>"1645606585597", "config_epoch"=>"2", "link_state"=>"connected", "slots"=>nil},
 {"node_id"=>"fb1e267e1e62d9b8d7d1370eada00705cf606c15", "ip_port"=>"127.0.0.1:7000@17000", "flags"=>["myself", "master"], "master_node_id"=>"-", "ping_sent"=>"0", "pong_recv"=>"1645606585000", "config_epoch"=>"1", "link_state"=>"connected", "slots"=>"0".."5460"},
 {"node_id"=>"dfe3cc76119460d0721d96f37c7d124f5ad148f7", "ip_port"=>"127.0.0.1:7003@17003", "flags"=>["slave"], "master_node_id"=>"fb1e267e1e62d9b8d7d1370eada00705cf606c15", "ping_sent"=>"0", "pong_recv"=>"1645606585000", "config_epoch"=>"1", "link_state"=>"connected", "slots"=>nil},
 {"node_id"=>"9675ecc0378329e6d0f9903609a617f2da2a370c", "ip_port"=>"127.0.0.1:7002@17002", "flags"=>["master"], "master_node_id"=>"-", "ping_sent"=>"0", "pong_recv"=>"1645606585798", "config_epoch"=>"3", "link_state"=>"connected", "slots"=>"10923".."16383"}]

irb(main):004:0> Redis.new(cluster: %w[rediss://127.0.0.1:7000]).ping
/path/to/redis-rb/lib/redis/connection/ruby.rb:264:in `connect_nonblock': SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain) (OpenSSL::SSL::SSLError)

irb(main):005:0> Redis.new(cluster: %w[redis://127.0.0.1:7000]).ping
/path/to/redis-rb/lib/redis/cluster/slot_loader.rb:21:in `load': Redis client could not fetch cluster information: Connection lost (ECONNRESET) (Redis::Cluster::InitialSetupError)

documents:

cert files:
https://github.com/redis/redis-rb/tree/master/test/support/ssl

server configuration:

$ diff -u /tmp/redis-rb-makefile.bk makefile
--- /tmp/redis-rb-makefile.bk   2022-02-23 16:24:08.536745615 +0900
+++ makefile    2022-02-23 17:23:22.028594220 +0900
@@ -104,7 +104,13 @@
                        --cluster-config-file  ${TMP}/nodes$$port.conf\
                        --cluster-node-timeout 5000\
                        --pidfile              ${TMP}/redis$$port.pid\
-                       --port                 $$port\
+                       --port                 0\
+                       --tls-port             $$port\
+                       --tls-cert-file        $(CURDIR)/test/support/ssl/trusted-cert.crt\
+                       --tls-key-file         $(CURDIR)/test/support/ssl/trusted-cert.key\
+                       --tls-ca-cert-file     $(CURDIR)/test/support/ssl/trusted-ca.crt\
+                       --tls-cluster          yes\
+                       --logfile              /tmp/redis.log\
                        --unixsocket           ${TMP}/redis$$port.sock;\
        done
$ diff -u /tmp/redis-rb-cluster-helper.rb.bk test/support/cluster/orchestrator.rb
--- /tmp/redis-rb-cluster-helper.rb.bk  2022-02-23 18:03:21.288383625 +0900
+++ test/support/cluster/orchestrator.rb        2022-02-23 18:03:23.656437261 +0900
@@ -1,6 +1,7 @@
 # frozen_string_literal: true

 require 'redis'
+require 'openssl'

 class ClusterOrchestrator
   SLOT_SIZE = 16_384
@@ -11,6 +12,12 @@
     @clients = node_addrs.map do |addr|
       Redis.new(url: addr,
                 timeout: timeout,
+                ssl: true,
+                ssl_params: {
+                  ca_file: File.join(__dir__, '..', 'ssl', 'trusted-ca.crt'),
+                  cert: OpenSSL::X509::Certificate.new(File.read(File.join(__dir__, '..', 'ssl', 'trusted-cert.crt'))),
+                  key: OpenSSL::PKey::RSA.new(File.read(File.join(__dir__, '..', 'ssl', 'trusted-cert.key'))),
+                },
                 reconnect_attempts: 10,
                 reconnect_delay: 1.5,
                 reconnect_delay_max: 10.0)

@supercaracal
Copy link
Contributor

supercaracal commented Feb 23, 2022

Ah, forget about my comment for the latter. We can specify the option like the follows instead.

Redis.new(cluster: [{ host: '127.0.0.1', port: 7000 }], password: 'mysecret', ssl: true)

@supercaracal
Copy link
Contributor

Perhaps, X509_V_ERR_UNSPECIFIED error might be related to SNI. But I don't know how does Azure Redis endpoint work SSL termination. Would you try to connect with options like the followings? @ftlc

Redis.new(cluster: [{ host: 'my-redis.example.com', port: 6379 }], password: 'mysecret', ssl: true, ssl_params: { verify_hostname: true })

ssl_sock = new(tcp_sock, ctx)
ssl_sock.hostname = host

unless ctx.verify_mode == OpenSSL::SSL::VERIFY_NONE || (
ctx.respond_to?(:verify_hostname) &&
!ctx.verify_hostname
)
ssl_sock.post_connection_check(host)
end

@hsadoyan
Copy link
Author

Tried a few things. Results:

Connection timed out (Redis::TimeoutError)

client = Redis.new(cluster: [{ host: host, port: port, password: key, ssl: true }])

certificate verify failed (unspecified certificate verification error) (OpenSSL::SSL::SSLError)

client = Redis.new(cluster: [{ host: host, port: port }], password: key, ssl: true)
client = Redis.new(cluster: [{ host: host, port: port }], password: key, ssl: true, ssl_params: { verify_hostname: true })

Note: for this I used 6380, since 6379 is blocked on my azure instance (expects non ssl connections). If I try with 6379 I get a (Redis::TimeoutError) (Redis::Cluster::InitialSetupError)

I have the minimum TLS version set to 1.0 in Azure in case the client wasn't using 1.1/1.2 yet.

@supercaracal
Copy link
Contributor

supercaracal commented Feb 24, 2022

I've tried to check with AWS ElastiCache. It works.

irb(main):001:0> r = Redis.new(cluster: %w[rediss://:********@clustercfg.my-redis.******.apne1.cache.amazonaws.com:6379])
=> #<Redis client v4.6.0 for redis://my-redis-0001-001.my-redis.******.apne1.cache.amazonaws.com:6379/0 redis://my-redis-0002-001.my-redis.******.apne1.cache.amazonaws.com:6379/0 redis://my-redis-0003-001.my-redis.******.apne1.cache.amazonaws.com:6379/0>

irb(main):002:0> r.ping
=> "PONG"

irb(main):003:0> r.cluster 'nodes'
=>
[{"node_id"=>"21dc8ceac97842eba7adeceff74552d8bebeaba7", "ip_port"=>"my-redis-0001-001.my-redis.******.apne1.cache.amazonaws.com:6379@1122", "flags"=>["master"], "master_node_id"=>"-", "ping_sent"=>"0", "pong_recv"=>"1645675726327", "config_epoch"=>"6", "link_state"=>"connected", "slots"=>"0".."5461"},
 {"node_id"=>"bb9f8aaa3bcbf5efea95e668efb574325c8c5582", "ip_port"=>"my-redis-0003-001.my-redis.******.apne1.cache.amazonaws.com:6379@1122", "flags"=>["myself", "master"], "master_node_id"=>"-", "ping_sent"=>"0", "pong_recv"=>"1645675725000", "config_epoch"=>"8", "link_state"=>"connected", "slots"=>"10923".."16383"},
 {"node_id"=>"a132a43e7590676d701bc3c1dea381f839bed5e5", "ip_port"=>"my-redis-0002-002.my-redis.******.apne1.cache.amazonaws.com:6379@1122", "flags"=>["slave"], "master_node_id"=>"4f84d4a5097f893fa504b34ac4c402d8e1609343", "ping_sent"=>"0", "pong_recv"=>"1645675727000", "config_epoch"=>"7", "link_state"=>"connected", "slots"=>nil},
 {"node_id"=>"9431f6f9cabfdf6d9993bb5f220e4406274cf56c", "ip_port"=>"my-redis-0003-002.my-redis.******.apne1.cache.amazonaws.com:6379@1122", "flags"=>["slave"], "master_node_id"=>"bb9f8aaa3bcbf5efea95e668efb574325c8c5582", "ping_sent"=>"0", "pong_recv"=>"1645675728349", "config_epoch"=>"8", "link_state"=>"connected", "slots"=>nil},
 {"node_id"=>"4f84d4a5097f893fa504b34ac4c402d8e1609343", "ip_port"=>"my-redis-0002-001.my-redis.******.apne1.cache.amazonaws.com:6379@1122", "flags"=>["master"], "master_node_id"=>"-", "ping_sent"=>"0", "pong_recv"=>"1645675729350", "config_epoch"=>"7", "link_state"=>"connected", "slots"=>"5462".."10922"},
 {"node_id"=>"73e7a22b9a59d528cb9b2d9511c51a7f833412b4", "ip_port"=>"my-redis-0001-002.my-redis.******.apne1.cache.amazonaws.com:6379@1122", "flags"=>["slave"], "master_node_id"=>"21dc8ceac97842eba7adeceff74552d8bebeaba7", "ping_sent"=>"0", "pong_recv"=>"1645675729000", "config_epoch"=>"6", "link_state"=>"connected", "slots"=>nil}]

irb(main):004:0>

Since you said stand-alone mode is success, would you inform us of the following command's response with some masked sensitive texts?

Redis.new(url: 'rediss://:yoursecret@yourhost:yourport').cluster(:nodes).split("\n")

@hsadoyan
Copy link
Author

So this is interesting.

Redis.new(connection_url).cluster(:nodes).split("\n") this command worked. I was able to get a list of 4 nodes (2 master 2 slave) in the same format as you.

However, trying Redis.new(cluster: [connection_url]) gives Redis::TimeoutError

Since the first case worked, is it possible to build a cluster client manually with the list of nodes?

@supercaracal
Copy link
Contributor

supercaracal commented Feb 25, 2022

I'd say that it is a hard way for building cluster client manually. I think the following directives in server configuration may be related. They were added by AWS folks and available since Redis 7.*.

At CLUSTER NODES command, which do Azure Redis servers reply, IP addresses or host names? AWS ElastiCache returns host names. Maybe we should use CLUSTER SLOTS instead of CLUSTER NODES.

@hsadoyan
Copy link
Author

I got IP addresses instead of host names. .slots also returned IPs.

@supercaracal
Copy link
Contributor

supercaracal commented Feb 25, 2022

We've got a root cause but the matter looks like Azure Redis service side. I'll catch up some documents and look for solutions later.

@supercaracal
Copy link
Contributor

supercaracal commented Feb 25, 2022

I found a document but it was for redis-cli.

https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-premium-clustering#can-i-directly-connect-to-the-individual-shards-of-my-cache

Does redis-cli work fine for Azure Redis cluster with SSL/TLS? Please check some redirection behavior by GET command.

$ redis-cli --tls -c -h <host> -p <port> -a <password>
sequenceDiagram
    participant Client
    participant Server Shard 1
    participant Server Shard 2
    participant Server Shard 3
    Client->>+Server Shard 1: CLUSTER SLOTS
    Server Shard 1-->>-Client: nodes and slots data
    Note over Client,Server Shard 1: host names needed if using SSL/TLS
    Client->>+Server Shard 1: GET key1
    Server Shard 1-->>-Client: value1
    Client->>+Server Shard 2: GET key2
    Server Shard 2-->>-Client: value2
    Client->>+Server Shard 3: GET key3
    Server Shard 3-->>-Client: value3
    Client->>+Server Shard 3: GET key1
    Server Shard 3-->>-Client: MOVED Server Shard 1
    Note over Client,Server Shard 3: Client needs to redirect to correct node
    Client->>+Server Shard 2: MGET key2 key3
    Server Shard 2-->>-Client: CROSSSLOTS
    Note over Client,Server Shard 2: Cannot command across shards
Loading

@hsadoyan
Copy link
Author

hsadoyan commented Feb 25, 2022

Yeah the cluster works from the cli with same config.

I can SET and GET to both nodes.

image

@supercaracal
Copy link
Contributor

supercaracal commented Feb 26, 2022

I'm sorry for my misunderstanding. It seems that the SSL certificate is for a IP address, not a common name. As expected, there might be some bugs in cluster client of redis-rb. I will continue to try to find out that cause.

https://datatracker.ietf.org/doc/html/rfc5280#section-4.2.1.6

@supercaracal
Copy link
Contributor

supercaracal commented Feb 26, 2022

Some documents say:

https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-premium-clustering

Once the cache is created, you connect to it and use it just like a non-clustered cache. Redis distributes the data throughout the Cache shards.

https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-premium-clustering#how-do-i-connect-to-my-cache-when-clustering-is-enabled

You can connect to your cache using the same endpoints, ports, and keys that you use when connecting to a cache that doesn't have clustering enabled. Redis manages the clustering on the backend so you don't have to manage it from your client.

But:

https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-premium-clustering#do-all-redis-clients-support-clustering

Many clients support Redis clustering but not all.

It seems that it's not like the endpoint proxies requests, clients need to support cluster protocols. Behaving of the above redis-cli is a evidence to back up it.

@supercaracal
Copy link
Contributor

supercaracal commented Feb 26, 2022

Please try to connect to Azure Redis cluster with SSL/TLS by using hiredis driver for isolating the problem.

https://github.com/redis/redis-rb#hiredis

hiredis is also used by redis-cli.

https://github.com/redis/redis/blob/1dc89e2d0230f4bcadf21ee8185b79a12b001cf0/src/redis-cli.c#L49-L53

@supercaracal
Copy link
Contributor

I found that, unfortunately, SSL/TLS support with hiredis is not enough in redis-rb currently. Forget about that. I'm sorry.

@supercaracal
Copy link
Contributor

supercaracal commented Feb 26, 2022

Could you try to connect to Azure Redis cluster with SSL/TLS by patching code like the follows?

$ diff -u /tmp/redis-rb-conn-rb.rb.bk lib/redis/connection/ruby.rb
--- /tmp/redis-rb-conn-rb.rb.bk 2022-02-26 14:00:25.862435640 +0900
+++ lib/redis/connection/ruby.rb        2022-02-26 14:00:52.355047769 +0900
@@ -252,7 +252,7 @@
           ctx.set_params(ssl_params || {})

           ssl_sock = new(tcp_sock, ctx)
-          ssl_sock.hostname = host
+          #ssl_sock.hostname = host

           begin
             # Initiate the socket connection in the background. If it doesn't fail

The above patch is comment out the following line.

ssl_sock.hostname = host

I assume SNI might be fail if using certificate for a IP address.

@supercaracal
Copy link
Contributor

supercaracal commented Feb 26, 2022

It seems that there is a same issue in other libraries. We are not able to know how Azure Redis cluster works internally. In a general way, client of Redis Cluster use node addresses fetched from server but the way tends to fail with SSL/TLS at Azure Redis.

case PHP:

There is a workaround using the host name of endpoint constantly. It may works indeed but it might be a bit ad-hoc and minor use case. It seems that Azure Redis has single IP address and multiple ports. Internally, there might be proxy servers such that stunnel or something like that. The proxy server doesn't support redirection. It does only SSL/TLS termination.

graph TB
  client(Cluster Client)

  subgraph Azure Redis Cache
    subgraph Endpoint
      endpoint(Active)
      endpoint_sb(Standby)
    end

    subgraph Cluster
      node0(Node0)
      node1(Node1)
      node2(Node2)
    end
  end

  endpoint-.-endpoint_sb
  node0-.-node1-.-node2-.-node0
  client--rediss://vip:15000-->endpoint--redis://real:6379-->node0
Loading

On the other hand, in my opinion, I'd say that AWS ElastiCache behaves friendly to typical clients.

graph TB
  client(Cluster Client)

  subgraph AWS ElastiCache
    node0(Node0)
    node1(Node1)
    node2(Node2)
  end

  node0-.-node1-.-node2-.-node0
  client--rediss://node0:6379-->node0
Loading

case Java:

https://stackoverflow.com/questions/37471419/azure-redis-ssl-cluster-lettuce-java-edit-lettuce-version-4-2

The cluster without SSL working code looks like this:
Enabling SSL, It hangs during 1 minute and log4j logs looks like this:
Keeping SSL and disabling cluster works:
So that's not just an SSL issue, its a SSL + cluster combo issue.

https://github.com/lettuce-io/lettuce-core/releases/tag/4.2.0.Final

Redis Cluster and SSL:
You should disable the verifyPeer option if the SSL endpoints cannot provide a valid certificate.
When creating a RedisClusterClient using RedisClusterClientFactoryBean the verifyPeer option is disabled by default.
Lettuce was successfully tested with Azure Redis with SSL and authentication.

I think this is a last resort but it may also work.

Redis.new(cluster: [{ host: 'foo.example.com', port: 6379 }], password: 'bar', ssl: true, ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE })

Since your endpoint looks like public, I've checked behavior with SSL/TLS options in my local machine. It seems that our client works fine if we disable to verify hostname.

irb(main):001:0> Redis.new(cluster: %w[rediss://13.82.175.253:15000]).ping
/path/to/redis-rb/lib/redis/connection/ruby.rb:264:in `connect_nonblock': SSL_connect returned=1 errno=0 state=error: certificate verify failed (Hostname mismatch) (OpenSSL::SSL::SSLError)

irb(main):002:0> Redis.new(cluster: %w[rediss://13.82.175.253:15000], ssl_params: { verify_hostname: false }).ping
/path/to/redis-rb/lib/redis/cluster/slot_loader.rb:21:in `load': Redis client could not fetch cluster information: NOAUTH Authentication required. (Redis::Cluster::InitialSetupError)

irb(main):003:0> Redis.new(cluster: %w[rediss://13.82.175.253:15000], ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE }).ping
/path/to/redis-rb/lib/redis/cluster/slot_loader.rb:21:in `load': Redis client could not fetch cluster information: NOAUTH Authentication required. (Redis::Cluster::InitialSetupError)

The certificate may not be set IP addresses to SAN so we probably cannot verify it with IP addresses correctly.

$ openssl s_client -connect 13.82.175.253:15000 < /dev/null 2> /dev/null | openssl x509 -noout -text | grep -A1 -i alternative
            X509v3 Subject Alternative Name:
                DNS:*.redis.cache.windows.net, DNS:*.geo.redis.cache.windows.net

https://github.com/ruby/openssl/blob/e3a40937ac2b18ac02203e3539c4e90c539a36f9/lib/openssl/ssl.rb#L279-L293

Since you received a timeout error, Azure Redis cluster nodes might be returning private or plain text port numbers by CLUSTER command.

https://github.com/redis/redis/blob/1dc89e2d0230f4bcadf21ee8185b79a12b001cf0/redis.conf#L1709-L1744

Are your cluster nodes in Azure Redis returning 1500N port numbers by CLUSTER NODES and CLUSTER SLOTS commands? If not, redis-rb doesn't work fine even though disabling to verify certificate.

@hsadoyan
Copy link
Author

hsadoyan commented Feb 28, 2022

YES! { verify_mode: OpenSSL::SSL::VERIFY_NONE } did the trick. I can create a cluster client and set/get from both shards.

Are there any adverse security implications for this? Doesn't setting VERIFY_NONE open us up to MITM attacks? Should I open a ticket with the Azure Redis Cluster support team to try and address this?

The cluster nodes/slots commands are returning ports in the 1500n range.

@supercaracal
Copy link
Contributor

supercaracal commented Mar 1, 2022

In a use case of cluster mode with SSL/TLS, Azure Redis architecture expects to client to be able to access with FQDN of single endpoint but our redis-rb expects to servers to be able to reply FQDNs by CLUSTER commands or to be able to verify certificate with IP addresses. I'm not familiar with security problems, but there might be only a way to disable verification of certificates currently. It might be a good idea to inquire about the issue to the support team of Azure Cache for Redis.

@hsadoyan
Copy link
Author

hsadoyan commented Mar 2, 2022

I opened a ticket with the Azure Redis team and this was their response:

Disabling the VerifyPeers flag does present an increases security risk. However, it should not be required to disable it. The most likely reason that verification is failing for connecting to the shards of the cluster is that the subject name of the certificate needs to be verified against the original hostname of the cache, not the IP address seen in the CLUSTER NODES output.

Since we have the hostname from when we initially configure the client, would that work?

@supercaracal
Copy link
Contributor

supercaracal commented Mar 3, 2022

Yes, it does. However, Azure Redis servers currently return IP addresses in reply to CLUSTER NODES. The ability to reply with FQDNs is only possessed by AWS ElastiCache and Redis 7.*. So it needs to modify the client in cluster mode to be able to pass FQDN for verifying certificates somehow.

ssl_sock.hostname = host

@hsadoyan
Copy link
Author

hsadoyan commented Mar 3, 2022

Since Azure uses the same FQDN for all the nodes (just different ports), can we bring in the config? Would that require a huge refactor?

@supercaracal
Copy link
Contributor

I’m not sure about the scale of refactoring, but I think an additional option may be needed such that fixed_hostname: 'foo.example.com'.

@hsadoyan
Copy link
Author

hsadoyan commented Mar 3, 2022

Yeah that would make sense

@supercaracal
Copy link
Contributor

supercaracal commented Mar 3, 2022

Could you try to test the following version of the client? @ftlc

gem 'redis', git: 'https://github.com/supercaracal/redis-rb.git', branch: 'support-azure-cache-for-redis-with-cluster-mode-and-ssl-tls'
Redis.new(cluster: %w[rediss://foo-endpoint.example.com:6379], fixed_hostname: 'foo-endpoint.example.com')

@hsadoyan
Copy link
Author

hsadoyan commented Mar 3, 2022

Tried with this client. Following config:

connection_url = { host: host, password: key, port: port, ssl: true }
client = Redis.new(cluster: [connection_url], fixed_hostname: host)

Got a timeout like before, so it looks like peer verification may still have failed.

.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/redis-rb-ab7c3714ca79/lib/redis/connection/ruby.rb:58:in `block in _read_from_socket': Connection timed out (Redis::TimeoutError)

@hsadoyan
Copy link
Author

hsadoyan commented Mar 3, 2022

OH! But using a format string did work!

connection_url = "rediss://:#{CGI.escape(key)}@#{host}:#{port}"
client = Redis.new(cluster: [connection_url], fixed_hostname: host)

@supercaracal
Copy link
Contributor

Thank you for your testing. The timeout error in the former is weird.

@hsadoyan
Copy link
Author

hsadoyan commented Mar 3, 2022

Thanks for pushing updates so fast!

@supercaracal
Copy link
Contributor

supercaracal commented Mar 3, 2022

Ah, the former may works if we specify options like this:

connection_url = { host: host, password: key, port: port }
client = Redis.new(cluster: [connection_url], fixed_hostname: host, ssl: true)

The ssl parameter in the cluster option is ignored.

@hsadoyan
Copy link
Author

hsadoyan commented Mar 4, 2022

Oh yup, moving SSL out of the cluster did it. It works now!

Thanks so much! Once everything is merged I'll work on deploying it and make sure there are no issues with high traffic, but it looks perfect now.

@hsadoyan
Copy link
Author

hsadoyan commented Mar 10, 2022

@supercaracal Hey, do you know when this might be merged? I'm trying to decide if I should plan integrating this work in the upcoming sprint

@supercaracal
Copy link
Contributor

@byroot I understand you are occupied, but I would appreciate it if you could review the following pull request.

@byroot
Copy link
Collaborator

byroot commented Mar 11, 2022

@supercaracal apologies, I didn't see it was ready for review. Never hesitate pinging me for these things. I'll have a look right now.

@hsadoyan
Copy link
Author

Thanks for merging it in!

When is 4.7 scheduled? For the bigger services we'll probably wait for the official release instead of pointing our gemfile directly at master

@supercaracal
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants