Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighty.io could not handle many incoming netconf callhome connections #1655

Open
sekobey opened this issue Nov 21, 2023 · 10 comments
Open

Lighty.io could not handle many incoming netconf callhome connections #1655

sekobey opened this issue Nov 21, 2023 · 10 comments
Assignees

Comments

@sekobey
Copy link

sekobey commented Nov 21, 2023

Hi,

Describe the bug
One of our microservices (running in docker container) is using lighty.io v15.1. This service is responsible for handling incoming device connections. Devices perform netconf callhome connection after SSH connection between the device and lighty.io is performed. When we run 4500 devices and the microservice in the same network in AWS (network latency is around 0,386 ms), all devices connect to the service successfully in 10 minutes. But when we moved microservice to another AWS region where network latency is around 180ms, all of these devices could not connect in a reasonable time. It takes hours to connect all of them. But when we first run 1500 devices, they connect in 10 minutes. After that we run another 1500 devices, again they connect in 10 minutes etc. However, when we run 4500 devices at the same time, they cannot connect in 1 or 2 hours.
All the device SSH public keys are loaded into the Allowed Devices store of lighty.io at the startup of the microservice. So, we see that SSH connections are performed well. We overrided onDataTreeChanged method of DataTreeChangeListener interface to get changes in topology tree when doing netconf callhome connection between device and microservice as follows:

@Service
public class CallHomeListenerService implements DataTreeChangeListener<Node> {

  public static final InstanceIdentifier<Topology> NETCONF_TOPO_IID =
      InstanceIdentifier.create(NetworkTopology.class).child(Topology.class,
              new TopologyKey(new TopologyId(TopologyNetconf.QNAME.getLocalName())));
  
  private static final Logger LOGGER = LoggerFactory.getLogger(CallHomeListenerService.class);

  @SuppressWarnings("unused")
  private DataBroker dataBroker;
  
  private DeviceHandlerRegistry deviceHandlerRegistry;
  
  private DeviceHandlerFactory deviceHandlerFactory;
  
  private Utils utils;
  
  private String ngwIdentifier;

  private NetconfDataBrokerService netconfDataBrokerService;

  @Autowired
  public CallHomeListenerService(@Qualifier("BindingDataBroker") DataBroker dataBroker, DeviceHandlerFactory deviceHandlerFactory,
      DeviceHandlerRegistry deviceHandlerRegistry, NetconfDataBrokerService netconfDataBrokerService, Utils utils) {
    this.utils = utils;
    this.dataBroker = dataBroker;
    this.deviceHandlerFactory = deviceHandlerFactory;
    this.deviceHandlerRegistry = deviceHandlerRegistry;
    this.netconfDataBrokerService = netconfDataBrokerService;
    this.ngwIdentifier = utils.getNGWIdentifier();
    dataBroker.registerDataTreeChangeListener(DataTreeIdentifier.create(LogicalDatastoreType.OPERATIONAL, NETCONF_TOPO_IID.child(Node.class)), this);
  }
  
  @Override
  public void onDataTreeChanged(@NonNull Collection<DataTreeModification<Node>> changes) {
    
    DataObjectModification<Node> rootNode;
    InstanceIdentifier<Node> instanceIdentifier;
    NodeId nodeId;
    NetconfNode netconfNode;
    ConnectionStatus connectionStatus;
    String serialNumber;
    Optional<Products> productModel;

    for (DataTreeModification<Node> change : changes) {
      rootNode = change.getRootNode();
      instanceIdentifier = change.getRootPath().getRootIdentifier();
      nodeId = getNodeId(instanceIdentifier);
      serialNumber = nodeId.getValue();
      
      productModel = utils.getProductModel(serialNumber);

      if (!productModel.isPresent()) {
        LOGGER.warn("Undefined product model for device [{}]", serialNumber);
        continue;
      }

      switch (rootNode.getModificationType()) {
        case WRITE:
        case SUBTREE_MODIFIED:
          netconfNode = rootNode.getDataAfter().augmentation(NetconfNode.class);
          connectionStatus = netconfNode.getConnectionStatus();
          if (connectionStatus == ConnectionStatus.Connecting) {
            LOGGER.info("Netconf device {} connecting...", serialNumber);
          } else if (connectionStatus == ConnectionStatus.Connected) {
            LOGGER.info("Netconf device {} connected", serialNumber);
            netconfDataBrokerService.asyncProcessCallHomeListenerConnectedState(serialNumber, ngwIdentifier, productModel.get().getLabel());
          } else if (connectionStatus == ConnectionStatus.UnableToConnect) {
            LOGGER.info("Netconf device {} could not connect", serialNumber);
          } else {
            LOGGER.info("Undefined connections state {} for device {}", connectionStatus, serialNumber);
          }
          break;
        case DELETE:
          if (isNetconfNode(rootNode.getDataBefore())) {
            LOGGER.info("Netconf device {} disconnected", serialNumber);
            netconfDataBrokerService.asyncProcessCallHomeListenerDisconnectedState(serialNumber, ngwIdentifier);
          }
          break;
        default:
          break;
      }
    }
  }
  
  private static boolean isNetconfNode(final Node node) {
    return node.augmentation(NetconfNode.class) != null;
  }
  
  private NodeId getNodeId(final InstanceIdentifier<?> path) {
    NodeKey key = path.firstKeyOf(Node.class);
    return key != null ? key.getNodeId() : null;
  }
}

When we run 4500 devices at the same time, we see "Netconf device connecting..." and "Netconf device disconnected" logs consecutively very much in the microservice logs for the same device in high latency network (~180 ms). But we do not see these logs in low latency network (~0,386 ms).

What can be the reason? Is there a configuration parameter at lighty.io to handle this situation? Like increasing some timeout value, maybe this is happening because of network latency?

Thank you.

Branch
lighty.io branch 15.1

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Java Version 11.0.13
  • Maven version 3.6.3

Additional context
Add any other context about the problem here.

Thank you for creating your report! We will get back to you, as soon as possible. Please note, that our support on GitHub is based on a non-guaranteed best effort. As soon as we are able, we will get back to you.

If you are in a hurry and have an inquiry regarding commercial support, please via this contact form: https://pantheon.tech/contact-us

@ihrasko
Copy link
Collaborator

ihrasko commented Nov 23, 2023

Hi,

the configuration for every call-home device is hard-coded here: https://github.com/opendaylight/netconf/blob/a2d875954fbf4a82dc0f3b91fa360dfa96ee213a/apps/callhome-provider/src/main/java/org/opendaylight/netconf/callhome/mount/CallHomeMountSessionContext.java#L78.

You can try to create "real" devices with that configuration to see if they are connecting and disconnecting consecutively in a high latency network.

If yes, you can find out the values which makes connection stable.

Then, to solve this bug an issue report into ODL netconf project is needed.

Best

@ihrasko ihrasko self-assigned this Nov 23, 2023
@rkashapov-bf
Copy link

rkashapov-bf commented Nov 23, 2023

Actually the server configuration is using netty bootstrap defaults, so these can be updated here

https://github.com/opendaylight/netconf/blob/a2d875954fbf4a82dc0f3b91fa360dfa96ee213a/apps/callhome-provider/src/main/java/org/opendaylight/netconf/callhome/mount/IetfZeroTouchCallHomeServerProvider.java#L110C1-L122C6

the code referenced by Ivan is used to create the topology node entry to track the device status,
configuration there do not affect the connection in any way

@ihrasko
Copy link
Collaborator

ihrasko commented Nov 27, 2023

@sekobey can you please provide logs to investigate why devices have been disconnected?

@ihrasko
Copy link
Collaborator

ihrasko commented Nov 27, 2023

Additionally are you able to test with lighty.io version 15.2? To be sure you have fix for https://jira.opendaylight.org/browse/NETCONF-832. In version 15.1 call-home devices are not sending keep-alive RPCs to keep connection up.

@hbilling
Copy link

Hi Experts,

There are a few questions about the code pasted above.

  1. The onDataTreeChanged() API is accepting a collection as the argument. But it has been observed that it always contains only one element, despite there being thousands of devices connecting at the same time. Is this expected by design? Or there needs to be some tuning done in the lightyio/ODL side to get the callback in batches with multiple elements in one callback?

  2. Also, there is one more observation. For a particular serial Number(Device ie the Netconf Server) , we observe the following logs .

At t0 time,
LOGGER.info("Netconf device {} connecting...", serialNumber);
At about t0+10 minutes,
LOGGER.info("Netconf device {} disconnected", serialNumber);

Would connecting log mean that the underlying TCP connection (from device to controller) + SSH connection + the Netconf connection are OK. But there is an issue thereafter (probably when the yang models/schema from device are being downloaded) which causes the device to move from connecting to disconnected state?

  1. A related observation is that the schema download seems to be happening also in a serial fashion, one after the other. Is there a configuration switch that should make it parallel to speed up the things?

@ihrasko as requested, the logs should be updated as well, today in a few hours for more in-depth analysis into this.

@fatihozkan51200
Copy link

Hi @ihrasko ,
We have tested our issue with version 15.2.0 and we have faced with same problem and same error logs. I am sharing two states of device's in log, one is UnableToConnect , the other one is disconnection(DELETE from tree) .

UnableToConnect:
`2023-11-28 13:29:50.769 INFO 1 .m.n.s.CallHomeListenerService : Netconf device 2150001479 connecting...
2023-11-28 13:29:50.770 ERROR 1 .n.t.s.AbstractNetconfTopology : Connector for 2150001479 failed

java.io.IOException: Broken pipe
at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.write(Unknown Source)
at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
at java.base/sun.nio.ch.IOUtil.write(Unknown Source)
at java.base/sun.nio.ch.IOUtil.write(Unknown Source)
at java.base/sun.nio.ch.SocketChannelImpl.write(Unknown Source)
at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:408)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:949)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:913)
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
at org.opendaylight.netconf.shaded.sshd.netty.NettyIoSession.lambda$writeBuffer$2(NettyIoSession.java:148)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:184)
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
at org.opendaylight.netconf.shaded.sshd.netty.NettyIoSession.writeBuffer(NettyIoSession.java:146)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractSession.doWritePacket(AbstractSession.java:1051)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractSession.writePacket(AbstractSession.java:946)
at org.opendaylight.netconf.shaded.sshd.common.channel.AbstractChannel.writePacket(AbstractChannel.java:774)
at org.opendaylight.netconf.shaded.sshd.common.channel.throttle.DefaultChannelStreamWriter.writeData(DefaultChannelStreamWriter.java:46)
at org.opendaylight.netconf.shaded.sshd.common.channel.ChannelAsyncOutputStream.doWriteIfPossible(ChannelAsyncOutputStream.java:209)
at org.opendaylight.netconf.shaded.sshd.common.channel.ChannelAsyncOutputStream.writeBuffer(ChannelAsyncOutputStream.java:106)
at org.opendaylight.netconf.nettyutil.handler.ssh.client.AsyncSshHandlerWriter.writeWithPendingDetection(AsyncSshHandlerWriter.java:100)
at org.opendaylight.netconf.nettyutil.handler.ssh.client.AsyncSshHandlerWriter.write(AsyncSshHandlerWriter.java:83)
at org.opendaylight.netconf.callhome.protocol.MinaSshNettyChannel$1.write(MinaSshNettyChannel.java:59)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:113)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:113)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:306)
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.sendMessage(AbstractNetconfSessionNegotiator.java:295)
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.start(AbstractNetconfSessionNegotiator.java:118)
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.startNegotiation(AbstractNetconfSessionNegotiator.java:94)
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.channelActive(AbstractNetconfSessionNegotiator.java:310)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:522)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:480)
at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
at org.opendaylight.netconf.callhome.protocol.CallHomeSessionContext.doActivate(CallHomeSessionContext.java:120)
at org.opendaylight.netconf.callhome.protocol.CallHomeSessionContext.lambda$newSshFutureListener$0(CallHomeSessionContext.java:86)
at org.opendaylight.netconf.callhome.mount.CallHomeMountSessionContext.activateNetconfChannel(CallHomeMountSessionContext.java:109)
at org.opendaylight.netconf.callhome.mount.CallHomeMountDispatcher.activateChannel(CallHomeMountDispatcher.java:105)
at org.opendaylight.netconf.callhome.mount.CallHomeMountDispatcher.createReconnectingClient(CallHomeMountDispatcher.java:97)
at org.opendaylight.netconf.sal.connect.netconf.listener.NetconfDeviceCommunicator.initializeRemoteConnection(NetconfDeviceCommunicator.java:161)
at org.opendaylight.netconf.topology.spi.AbstractNetconfTopology.setupConnection(AbstractNetconfTopology.java:179)
at org.opendaylight.netconf.topology.spi.AbstractNetconfTopology.connectNode(AbstractNetconfTopology.java:134)
at org.opendaylight.netconf.callhome.mount.CallHomeMountDispatcher.onNetconfSubsystemOpened(CallHomeMountDispatcher.java:123)
at org.opendaylight.netconf.callhome.protocol.CallHomeSessionContext.lambda$newSshFutureListener$1(CallHomeSessionContext.java:85)
at org.opendaylight.netconf.shaded.sshd.common.future.AbstractSshFuture.notifyListener(AbstractSshFuture.java:159)
at org.opendaylight.netconf.shaded.sshd.common.future.DefaultSshFuture.notifyListeners(DefaultSshFuture.java:209)
at org.opendaylight.netconf.shaded.sshd.common.future.DefaultSshFuture.setValue(DefaultSshFuture.java:112)
at org.opendaylight.netconf.shaded.sshd.client.future.DefaultOpenFuture.setOpened(DefaultOpenFuture.java:68)
at org.opendaylight.netconf.shaded.sshd.client.channel.AbstractClientChannel.handleOpenSuccess(AbstractClientChannel.java:360)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractConnectionService.channelOpenConfirmation(AbstractConnectionService.java:545)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractConnectionService.process(AbstractConnectionService.java:456)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractSession.doHandleMessage(AbstractSession.java:526)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractSession.handleMessage(AbstractSession.java:452)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractSession.decode(AbstractSession.java:1524)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractSession.messageReceived(AbstractSession.java:412)
at org.opendaylight.netconf.shaded.sshd.common.session.helpers.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:64)
at org.opendaylight.netconf.shaded.sshd.netty.NettyIoSession.channelRead(NettyIoSession.java:252)
at org.opendaylight.netconf.shaded.sshd.netty.NettyIoSession$Adapter.channelRead(NettyIoSession.java:296)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)

2023-11-28 13:29:50.770 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=2150001479}.
2023-11-28 13:29:50.770 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=2150001479}.
2023-11-28 13:29:50.771 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=2150001479}.
2023-11-28 13:29:50.771 INFO 1 .m.n.s.CallHomeListenerService : Netconf device 2150001479 could not connect`

Disconnect:
`2023-11-28 13:31:07.657 INFO 1 .n.c.m.CallHomeMountDispatcher : Provisioning fake config Node{nodeId=Uri{_value=3600010593}, augmentation=[NetconfNode{actorResponseWaitTime=5, betweenAttemptsTimeoutMillis=2000, concurrentRpcLimit=0, connectionTimeoutMillis=20000, credentials=LoginPassword{password=omitted, username=omitted}, defaultRequestTimeoutMillis=60000, host=Host{_ipAddress=IpAddress{_ipv4Address=Ipv4Address{_value=172.23.0.5}}}, keepaliveDelay=120, maxConnectionAttempts=0, port=PortNumber{_value=50326}, protocol=Protocol{name=SSH}, reconnectOnChangedSchema=false, schemaless=false, sleepFactor=1.5, tcpOnly=false}]}
2023-11-28 13:31:07.657 INFO 1 .n.t.s.AbstractNetconfTopology : Connecting RemoteDevice{Uri{_value=3600010593}} , with config Node{nodeId=Uri{_value=3600010593}, augmentation=[NetconfNode{actorResponseWaitTime=5, betweenAttemptsTimeoutMillis=2000, concurrentRpcLimit=0, connectionTimeoutMillis=20000, credentials=***, defaultRequestTimeoutMillis=60000, host=Host{_ipAddress=IpAddress{_ipv4Address=Ipv4Address{_value=172.23.0.5}}}, keepaliveDelay=120, maxConnectionAttempts=0, port=PortNumber{_value=50326}, protocol=Protocol{name=SSH}, reconnectOnChangedSchema=false, schemaless=false, sleepFactor=1.5, tcpOnly=false}]}
2023-11-28 13:31:07.657 INFO 1 .n.t.s.AbstractNetconfTopology : Adding keepalive facade, for device Uri{_value=3600010593}
2023-11-28 13:31:07.657 WARN 1 i.DefaultSchemaResourceManager : schema-cache-directory for 3600010593 is null or empty; using the default schema
2023-11-28 13:31:07.657 INFO 1 .n.t.s.AbstractNetconfTopology : Concurrent rpc limit is smaller than 1, no limit will be enforced for device RemoteDevice{3600010593}
2023-11-28 13:31:07.658 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=3600010593}.
2023-11-28 13:31:07.658 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=3600010593}.
2023-11-28 13:31:07.658 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=3600010593}.

2023-11-28 13:31:07.658 INFO 1 .m.n.s.CallHomeListenerService : Netconf device 3600010593 connecting...

2023-11-28 13:33:22.976 INFO 1 .n.c.m.CallHomeMountDispatcher : Removing Uri{_value=3600010593} from Netconf Topology.
2023-11-28 13:33:22.977 WARN 1 .n.l.NetconfDeviceCommunicator : RemoteDevice{3600010593}: Session terminated Session closed
2023-11-28 13:33:22.977 INFO 1 .n.c.m.CallHomeMountDispatcher : Removing Uri{_value=3600010593} from Netconf Topology.
2023-11-28 13:33:22.978 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=3600010593}.
2023-11-28 13:33:22.978 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=3600010593}.
2023-11-28 13:33:22.978 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting failed status for callhome device id:Uri{_value=3600010593}.
2023-11-28 13:33:23.004 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting disconnected status for callhome device id:Uri{_value=3600010593}.
2023-11-28 13:33:23.004 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting disconnected status for callhome device id:Uri{_value=3600010593}.
2023-11-28 13:33:23.004 INFO 1 o.n.c.m.CallhomeStatusReporter : Setting disconnected status for callhome device id:Uri{_value=3600010593}.

2023-11-28 13:33:23.005 INFO 1 .n.l.NetconfDeviceCommunicator : RemoteDevice{3600010593}: Session went down: End of input
2023-11-28 13:33:23.010 INFO 1 .m.n.s.CallHomeListenerService : Netconf device 3600010593 connecting...
2023-11-28 13:33:23.010 INFO 1 .m.n.s.CallHomeListenerService : Netconf device 3600010593 disconnected
2023-11-28 13:33:23.019 INFO 1 .m.n.s.d.DeviceHandlerRegistry : Device: 3600010593 is removed.
`

@ihrasko
Copy link
Collaborator

ihrasko commented Nov 29, 2023

@hbilling

For point 1: there is no way you can guarantee your devices are connecting/connected/unconnected at the same time (their status in operational data store is updated on the same time). For example if you check org.opendaylight.netconf.topology.impl.NetconfTopologyImpl class and make PUT request into configuration with multiple devices you get collection of how many devices you have inserted. There is no tunning for this - it depends on the events itself.

For point 2: connection termination after 10 minutes has been resolved in https://jira.opendaylight.org/browse/NETCONF-832 (lighty.io 15.2.0).

For point 3: the schema resolution is run for each device in separate thread (its like that by implementation):

Schema setup for: RemoteDevice{17833-sim-device} in instance: org.opendaylight.netconf.sal.connect.netconf.NetconfDevice@d3ac375 in thread: globalWorkerGroup-3-10
Schema setup for: RemoteDevice{17832-sim-device} in instance: org.opendaylight.netconf.sal.connect.netconf.NetconfDevice@57eba02f in thread: globalWorkerGroup-3-1
Schema setup for: RemoteDevice{17830-sim-device} in instance: org.opendaylight.netconf.sal.connect.netconf.NetconfDevice@637cbb49 in thread: globalWorkerGroup-3-11
Schema setup for: RemoteDevice{17831-sim-device} in instance: org.opendaylight.netconf.sal.connect.netconf.NetconfDevice@5b47be52 in thread: globalWorkerGroup-3-12
Schema setup for: RemoteDevice{17834-sim-device} in instance: org.opendaylight.netconf.sal.connect.netconf.NetconfDevice@402c131f in thread: globalWorkerGroup-3-2

@hbilling
Copy link

Thanks @ihrasko We are executing tests now with 15.2.0 version and will see if that helps. Appreciate the response on my queries.

@ihrasko
Copy link
Collaborator

ihrasko commented Nov 30, 2023

@fatihozkan51200 From logs I assume that the device has been never successfully connected (even for "reconnecting" scenario). It suddenly gets removed, that can happen when:

  • you remove device from allowed devices
  • device has changed port which is used for calling home
  • it has been dropped by 3rd party logic (are you using some?)

What is the state of 'http://[ODL_IP_ADDRESS]:8181/rests/data/odl-netconf-callhome-server:netconf-callhome-server'. Are you able to see 4500 device there?

What are the logs from one of connecting devices?

@ihrasko
Copy link
Collaborator

ihrasko commented Dec 1, 2023

Yes, IMO the solution here can really be make call-home devices defaults configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants