Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Route validate as invalid using fort 1.5.3 while cloudflare/routinator validate unknown/valid #98

Open
beego89 opened this issue Jul 14, 2023 · 12 comments
Labels

Comments

@beego89
Copy link

beego89 commented Jul 14, 2023

hi,

i'm having issue when using fort, route validate as invalid. fort record show different as whois database to.

any help?

tq

@lukastribus
Copy link

A rpki-validator like Fort or routinator does not decide whether a route is invalid or not (or unknown for that matter). A rpki-validator just provides the list of validated prefixes, it is the BGP router that decides whether a BGP prefix is unkown, valid or invalid, based on the validated prefixes of the rpki validator.

Which prefix are you having issue with exactly, what is the prefix in the BGP routing table that shows as invalid?

@beego89
Copy link
Author

beego89 commented Jul 15, 2023

total count for invalid route prefix more then 700. compare randomly as attach files:

Fort.xlsx

@lukastribus
Copy link

I checked the first 3, 4 prefixes in the excel file and Forts outputs matches those of other validators.

Perhaps you are running an old version of Fort or perhaps your Fort instance is serving stale data?

Did you check if you are running fort 1.5.4? When was the last time your running Fort instance completed a validation cycle?

@beego89
Copy link
Author

beego89 commented Jul 17, 2023

check on log have error "ERR: poll() returned revents 32."..can advice if there any mistake during installation or my configuration?

Fort Journalctl.txt

not yet running fort 1.5.4, will try update if there is no solution.

i have min config:
{
"tal": "/etc/fort/tal",
"local-repository": "/var/lib/fort/repository",
"slurm": "/etc/fort/slurm",
"server": {
"address": ["172.24.58.2"],
"port": "8323"
},
"log": {
"output": "syslog"
}
}

tq

@lukastribus
Copy link

What release are you running currently? I would definitely suggest upgrading to latest.

@ydahhrk
Copy link
Member

ydahhrk commented Jul 17, 2023

ERR: poll() returned revents 32.

Ok, this looks like a programming error. In my environment, "revents 32" is POLLNVAL, which means

Invalid request: fd not open

@beego89 Just to make sure: Can you please post the output of grep "#define\s\+POLL" /usr/include -r? I need to make sure if my 32 is the same as your 32.

It seems Fort is closing the File Descriptor before it's done sending the table to the router, although it's strange that the router is seemingly not dropping the table as a result. Will investigate.

@beego89
Copy link
Author

beego89 commented Jul 18, 2023

hi ydahhrk,

output as below:

[xxx@vkcprpkprdap200 ~]$ grep "#define\s+POLL" /usr/include -r
/usr/include/asm-generic/poll.h:#define POLLIN 0x0001
/usr/include/asm-generic/poll.h:#define POLLPRI 0x0002
/usr/include/asm-generic/poll.h:#define POLLOUT 0x0004
/usr/include/asm-generic/poll.h:#define POLLERR 0x0008
/usr/include/asm-generic/poll.h:#define POLLHUP 0x0010
/usr/include/asm-generic/poll.h:#define POLLNVAL 0x0020
/usr/include/asm-generic/poll.h:#define POLLRDNORM 0x0040
/usr/include/asm-generic/poll.h:#define POLLRDBAND 0x0080
/usr/include/asm-generic/poll.h:#define POLLWRNORM 0x0100
/usr/include/asm-generic/poll.h:#define POLLWRBAND 0x0200
/usr/include/asm-generic/poll.h:#define POLLMSG 0x0400
/usr/include/asm-generic/poll.h:#define POLLREMOVE 0x1000
/usr/include/asm-generic/poll.h:#define POLLRDHUP 0x2000
/usr/include/asm-generic/poll.h:#define POLLFREE 0x4000 /* currently only for epoll /
/usr/include/asm-generic/poll.h:#define POLL_BUSY_LOOP 0x8000
/usr/include/asm-generic/siginfo.h:#define POLL_IN (__SI_POLL|1) /
data input available /
/usr/include/asm-generic/siginfo.h:#define POLL_OUT (__SI_POLL|2) /
output buffers available /
/usr/include/asm-generic/siginfo.h:#define POLL_MSG (__SI_POLL|3) /
input message available /
/usr/include/asm-generic/siginfo.h:#define POLL_ERR (__SI_POLL|4) /
i/o error /
/usr/include/asm-generic/siginfo.h:#define POLL_PRI (__SI_POLL|5) /
high priority input available /
/usr/include/asm-generic/siginfo.h:#define POLL_HUP (__SI_POLL|6) /
device disconnected /
/usr/include/bits/poll.h:#define POLLIN 0x001 /
There is data to read. /
/usr/include/bits/poll.h:#define POLLPRI 0x002 /
There is urgent data to read. /
/usr/include/bits/poll.h:#define POLLOUT 0x004 /
Writing now will not block. /
/usr/include/bits/poll.h:#define POLLERR 0x008 /
Error condition. /
/usr/include/bits/poll.h:#define POLLHUP 0x010 /
Hung up. /
/usr/include/bits/poll.h:#define POLLNVAL 0x020 /
Invalid polling request. */
[xxx@vkcprpkprdap200 ~]$

@beego89
Copy link
Author

beego89 commented Jul 18, 2023

@lukastribus sure..will proceed to upgrade 1.5.4 to see if problem resolve. tq

ydahhrk added a commit that referenced this issue Jul 21, 2023
On closer inspection, none of the error messages logged in #98 imply a
problem, so I have reduced their severities, removed the stack traces
and improved the error messages.

Fixes the error messages half of #98. I still need to look into the
alleged discrepancies with Routinator and Cloudflare.
@ydahhrk
Copy link
Member

ydahhrk commented Jul 21, 2023

Report:

Thanks for the output; we're in sync.

The error messages you're getting in the log (such as ERR: poll() returned revents 32) are all inoffensive. They simply mean some router happened to get disconnected in the middle of a data transfer. I have reduced their severities and improved the strings in the latest commit so they stop confusing people.

On a different note, it's weird that you're getting so many of those error messages, even if they're scattered through several days. Is there something in your network that might be disconnecting Fort and the Routers every now and then?

I will start investigating the discrepancies with Routinator and Cloudflare now. If you can provide the list of ROAs that don't match, or at least a reasonable subset of them, I should be able to figure it out faster.

@ydahhrk
Copy link
Member

ydahhrk commented Jul 22, 2023

Ok, sorry for taking so long. I agree with everything @lukastribus has said.

@beego89 I just realized I might be lagging in understanding the problem. In the title of the issue, you say Fort yields "invalid," but all the seeming relevant records of your spreadsheet are marked "valid."

What's the problem?

@beego89
Copy link
Author

beego89 commented Jul 26, 2023

hi @ydahhrk ..thanks for checking on the error. we have multiple gateway upstream/peering GW router across region and maybe due to router faulty.

@lukastribus prefix sample in excel is "invalid" output from our GW router using fort 1.5.3. when do comparison with cloudflare and routinator, it yields different validation "unknown" and "valid". perhaps you can compare the output from your fort validator with my fort output in excell to see if it's the same.

tq

@lukastribus
Copy link

I did that when you posted the excel file, as explained in my answer to that.

Again, I suggest you check when Fort last completed a validation cycle. Also perhaps you want to enable validation logging and post the full output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants