Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batchId:73 is not the firstly:72 #625

Closed
mennyzfy opened this issue May 7, 2018 · 17 comments
Closed

batchId:73 is not the firstly:72 #625

mennyzfy opened this issue May 7, 2018 · 17 comments

Comments

@mennyzfy
Copy link

mennyzfy commented May 7, 2018

2018-05-06 10:26:53.805 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - prepare to find start position just last position
{"identity":{"slaveId":-1,"sourceAddress":{"address":"localhost","port":3306}},"postion":{"included":false,"journalName":"mysql-bin.000002","position":26359074,"serverId":1,"timestamp":1525570849000}}
2018-05-06 10:26:53.845 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - find start position : EntryPosition[included=false,journalName=mysql-bin.000002,position=26359074,serverId=1,timestamp=
1525570849000]
2018-05-06 10:29:45.154 [New I/O server worker #1-1] ERROR com.alibaba.otter.canal.server.netty.NettyUtils - ErrotCode:400 , Caused by :
something goes wrong with channel:[id: 0x61b78711, /192.168.0.111:60228 => /192.168.0.111:11111], exception=com.alibaba.otter.canal.meta.exception.CanalMetaManagerException: batchId:72 is not the firstly:71

2018-05-06 10:29:45.171 [New I/O server worker #1-1] ERROR c.a.otter.canal.server.netty.handler.SessionHandler - something goes wrong with channel:[id: 0x61b78711, /192.168.0.111:60228 :> /192.168.0.111:11111], exception=java.nio.channels.ClosedChannelException
at org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:649)
at org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:370)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:137)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76)
at org.jboss.netty.channel.Channels.write(Channels.java:611)
at org.jboss.netty.channel.Channels.write(Channels.java:578)
at com.alibaba.otter.canal.server.netty.NettyUtils.write(NettyUtils.java:28)
at com.alibaba.otter.canal.server.netty.handler.SessionHandler.messageReceived(SessionHandler.java:144)
at org.jboss.netty.handler.timeout.IdleStateAwareChannelHandler.handleUpstream(IdleStateAwareChannelHandler.java:48)
at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:276)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:526)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:507)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:542)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.channelDisconnected(ReplayingDecoder.java:450)
at org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:360)
at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:599)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:119)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76)
at org.jboss.netty.channel.Channels.close(Channels.java:720)
at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:208)
at org.jboss.netty.channel.ChannelFutureListener$1.operationComplete(ChannelFutureListener.java:46)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
at org.jboss.netty.channel.DefaultChannelFuture.addListener(DefaultChannelFuture.java:148)
at com.alibaba.otter.canal.server.netty.NettyUtils.write(NettyUtils.java:30)
at com.alibaba.otter.canal.server.netty.NettyUtils.error(NettyUtils.java:51)
at com.alibaba.otter.canal.server.netty.handler.SessionHandler.messageReceived(SessionHandler.java:200)
at org.jboss.netty.handler.timeout.IdleStateAwareChannelHandler.handleUpstream(IdleStateAwareChannelHandler.java:48)
at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:276)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:526)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:507)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:444)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350)
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

@agapple
Copy link
Member

agapple commented May 7, 2018

没有顺序ack

@mennyzfy
Copy link
Author

mennyzfy commented May 8, 2018

没有顺序ack?什么意思?能不能说得更详细点,谢谢

@zwangbo
Copy link

zwangbo commented May 8, 2018

@mennyzfy 调用getWithoutAck获取Message的时候会生成一个递增的batchId并赋值到Message的id字段。在ack的时候需要客户端保证按照获取Message的顺序来ack确认消息。

@mennyzfy
Copy link
Author

mennyzfy commented May 8, 2018

@zwangbo 我是按照这个demo来运行的,下面是代码:
while (running) {
try {
MDC.put("destination", destination);
connector.connect();
connector.subscribe("");
while (running) {
Message message = connector.getWithoutAck(batchSize); // 获取指定数量的数据
long batchId = message.getId();
int size = message.getEntries().size();
if (batchId == -1 || size == 0) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
}
} else {
resolveEntry(message.getEntries());
}
connector.ack(batchId); // 提交确认
}
} catch (Exception e) {
logger.error("process error!", e);
} finally {
connector.disconnect();
MDC.remove("destination");
}
}
也是消息被消费后才确认的,而且是单线程,为什么顺序还会不一样?

@cjj137783
Copy link

@agapple 虽然,确实是由于没有顺序确认ack引起的,但是,这无形中就让我的吞吐量和tps下降了不少,我这边是通过开多线程处理的,现在只能单线程处理

@zwangbo
Copy link

zwangbo commented Jul 3, 2018

@cjj137783 这一块可以考虑自己走异步处理然后做一个buffer来控制ack顺序以及ack不丢。虽然我觉得这个功能实现在canal这一边也是不错的。

@agapple
Copy link
Member

agapple commented Jul 16, 2018

otter的做法就是做了一个异步buffer,来顺序ack batchId

@LongWarren
Copy link

@mennyzfy 怎么解决的问题啊

@LongWarren
Copy link

为啥会漏掉一个branchID

@agapple
Copy link
Member

agapple commented Jul 31, 2018

@agapple agapple closed this as completed Jul 31, 2018
@Fangioo
Copy link

Fangioo commented Mar 12, 2019

问5: 出现batchId:73 is not the firstly:72 或 clientId:1001 batchId:50560 is not exist , please check

答5:

第一个原因是client在ack的问题,前两个异常是client没有按照顺序ack对应的batchId
最后一个是ack的batchId在服务端被清理了ps. 服务端发生清理只有两个原因:
client发起过一次rollback
server端发生过一次instance的重启,比如scan=true时发现文件变化自动restart了

2019-03-11 17:30:59.443 [pool-3-thread-1] ERROR com.alibaba.otter.canal.server.CanalMQStarter - batchId:4 is not the firstly:3
com.alibaba.otter.canal.meta.exception.CanalMetaManagerException: batchId:4 is not the firstly:3
2019-03-11 17:30:59.443 [pool-3-thread-1] ERROR com.alibaba.otter.canal.server.CanalMQStarter - batchId:4 is not the firstly:3
com.alibaba.otter.canal.meta.exception.CanalMetaManagerException: batchId:4 is not the firstly:3
2019-03-11 17:31:03.150 [pool-3-thread-1] ERROR com.alibaba.otter.canal.server.CanalMQStarter - batchId:5 is not the firstly:3
com.alibaba.otter.canal.meta.exception.CanalMetaManagerException: batchId:5 is not the firstly:3
2019-03-11 17:31:03.150 [pool-3-thread-1] ERROR com.alibaba.otter.canal.server.CanalMQStarter - batchId:5 is not the firstly:3
com.alibaba.otter.canal.meta.exception.CanalMetaManagerException: batchId:5 is not the firstly:3
2019-03-11 17:31:06.558 [pool-3-thread-1] ERROR com.alibaba.otter.canal.server.CanalMQStarter - batchId:7 is not the firstly:3
com.alibaba.otter.canal.meta.exception.CanalMetaManagerException: batchId:7 is not the firstly:3
2019-03-11 17:31:06.558 [pool-3-thread-1] ERROR com.alibaba.otter.canal.server.CanalMQStarter - batchId:7 is not the firstly:3
com.alibaba.otter.canal.meta.exception.CanalMetaManagerException: batchId:7 is not the firstly:3

只说了原因,解决方法是什么?

@agapple
Copy link
Member

agapple commented Mar 21, 2019

尝试一下1.1.3最新版,1.1.x的早期版本存在漏ack的情况

@suzawalee
Copy link

尝试一下1.1.3最新版,1.1.x的早期版本存在漏ack的情况

1:没有rollback
2:scan=false
cannal:1.1.3 mysql:5.8 环境win10,服务端和客户端都在本机运行,当运行多个client时,依然报 batchid is not the firstly.

@agapple
Copy link
Member

agapple commented Jun 26, 2019

多个client需要使用cluster模式才行,否则两个client分别做了get操作,但对应的ack顺序又不一致就会报错了

@Xlinlin
Copy link

Xlinlin commented Jul 23, 2019

删除mete.data,重启就好了。我们这个场景还只是做来缓存的,要是数据强一致这法子不好解决

@DyzYpp
Copy link

DyzYpp commented Sep 3, 2021

怎么解决呢 很多都只说了原因 ack顺序问题,具体怎么解决呢

@swatchion
Copy link

删除mete.data,重启就好了。我们这个场景还只是做来缓存的,要是数据强一致这法子不好解决

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants