Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS discovery mode not working any more after OpenShift Update from 4.7.9 to 4.7.11 #323

Closed
npomaroli opened this issue Jun 11, 2021 · 6 comments
Milestone

Comments

@npomaroli
Copy link
Contributor

We have a StatefulSet with pods running Java software that embeds OrientDB 3.1 in a cluster.
We configured hazelcast to use com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy with DNS discovery of the hazelcast service in OpenShift.

Everything worked fine in OpenShift 4.7.9 but after updating to OpenShift 4.7.11 the cluster nodes could not find each other any more. The logged error messages were all like DNS lookup for serviceDns 'service-name' failed: name not found.

After digging into the code, we could narrow the issue down to the method used for DNS resolving in class com.hazelcast.kubernetes.DnsEndpointResolver. We extracted this method (which uses JNDI and com.sun.jndi.dns.DnsContextFactory) into a sample program and got the Exception

Exception in thread "main" javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name 'service-name'
        at jdk.naming.dns/com.sun.jndi.dns.DnsClient.checkResponseCode(Unknown Source)
        at jdk.naming.dns/com.sun.jndi.dns.DnsClient.isMatchResponse(Unknown Source)
        at jdk.naming.dns/com.sun.jndi.dns.DnsClient.doUdpQuery(Unknown Source)
        at jdk.naming.dns/com.sun.jndi.dns.DnsClient.query(Unknown Source)
        at jdk.naming.dns/com.sun.jndi.dns.Resolver.query(Unknown Source)
        at jdk.naming.dns/com.sun.jndi.dns.DnsContext.c_getAttributes(Unknown Source)
        at java.naming/com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(Unknown Source)
        at java.naming/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(Unknown Source)
        at java.naming/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(Unknown Source)
        at java.naming/javax.naming.directory.InitialDirContext.getAttributes(Unknown Source)
        at DnsTest.lookup(DnsTest.java:43)
        at DnsTest.main(DnsTest.java:30)

whereas resolving the service DNS simply by calling

InetAddress.getAllByName(serviceDns);

always worked and returned the IP addresses of all pods.

Is there a reason, why DNS resolving is done with JNDI and com.sun.jndi.dns.DnsContextFactory and not with InetAddress.getAllByName()? or could this be changed?

The JRE we are using is

openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode) 
@npomaroli
Copy link
Contributor Author

Sample program source:

import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.Hashtable;
import java.util.List;
import java.util.Set;

import javax.naming.Context;
import javax.naming.NamingEnumeration;
import javax.naming.NamingException;
import javax.naming.directory.Attribute;
import javax.naming.directory.Attributes;
import javax.naming.directory.DirContext;
import javax.naming.directory.InitialDirContext;

public class DnsTest {
	protected static DirContext dirContext;

	public static void main(String[] args) throws Exception {
		if (args.length < 1) {
			System.err.println("Give me the hostname!");
			System.exit(-1);
		}
		String host = args[0];

		dirContext = createDirContext(1);
		lookup(host);
	}

	private static DirContext createDirContext(int serviceDnsTimeout) throws NamingException {
		Hashtable<String, String> env = new Hashtable<String, String>();
		env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.dns.DnsContextFactory");
		env.put(Context.PROVIDER_URL, "dns:");
		env.put("com.sun.jndi.dns.timeout.initial", String.valueOf(serviceDnsTimeout * 1000L));
		return new InitialDirContext(env);
	}

	private static List<String> lookup(String serviceDns) throws NamingException, UnknownHostException {
		Set<String> addresses = new HashSet<String>();
		Attributes attributes = dirContext.getAttributes(serviceDns, new String[] { "SRV" });
		Attribute srvAttribute = attributes.get("srv");
		if (srvAttribute != null) {
			NamingEnumeration<?> servers = srvAttribute.getAll();
			while (servers.hasMore()) {
				String server = (String) servers.next();
				String serverHost = extractHost(server);
				InetAddress address = InetAddress.getByName(serverHost);
				addresses.add(address.getHostAddress());
				System.out.println("Found node service with address: " + address);
			}
		}

		if (addresses.size() == 0) {
			System.out.println("Could not find any service for serviceDns '" + serviceDns + "'");
			return Collections.emptyList();
		}

		return new ArrayList<>(addresses);
	}

	/**
	 * Extracts host from the DNS record.
	 * <p>
	 * Sample record: "10 25 0
	 * 6235386366386436.my-release-hazelcast.default.svc.cluster.local".
	 */
	private static String extractHost(String server) {
		String host = server.split(" ")[3];
		return host.replaceAll("\\\\.$", "");
	}
}

Working example:

import java.net.InetAddress;

public class DnsTestPatched {
	public static void main(String[] args) throws Exception {
		if (args.length < 1) {
			System.err.println("Give me the hostname!");
			System.exit(-1);
		}
		String host = args[0];

		for (InetAddress inetAddress : InetAddress.getAllByName(host)) {
			System.out.println(inetAddress.getHostAddress());
		}
	}
}

@leszko leszko self-assigned this Jun 11, 2021
@leszko
Copy link

leszko commented Jun 11, 2021

I think the solution with InetAddress is fine. Would you mind sending a PR?

@leszko leszko removed their assignment Jun 11, 2021
npomaroli added a commit to npomaroli/hazelcast-kubernetes that referenced this issue Jun 14, 2021
@npomaroli
Copy link
Contributor Author

PR is here: #325

npomaroli added a commit to npomaroli/hazelcast-kubernetes that referenced this issue Jun 15, 2021
npomaroli added a commit to npomaroli/hazelcast-kubernetes that referenced this issue Jun 15, 2021
@npomaroli
Copy link
Contributor Author

Here is the other PR: #327

leszko pushed a commit that referenced this issue Jun 15, 2021
* Change DNS lookup method (#323)

* Fix checkstyle errors

* Improve exception handling.

* Unwrap and throw UnknownHostException from ExecutionException
Add testcases for handling UnknownHostException and TimeoutException

* Fix indentation
leszko pushed a commit that referenced this issue Jun 15, 2021
* Change DNS lookup method (#323)

* Fix checkstyle errors

* Improve exception handling.

* Unwrap and throw UnknownHostException from ExecutionException
Add testcases for handling UnknownHostException and TimeoutException

* Fix indentation
@leszko leszko closed this as completed Jun 16, 2021
@leszko leszko modified the milestones: 2.2.3, 1.5.6 Jun 16, 2021
@npomaroli
Copy link
Contributor Author

Thanks for you work on this issue!

@leszko
Copy link

leszko commented Jun 18, 2021

The fix is released in versions 1.5.6 and 2.2.3

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants