Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF NaN errors when starting navigation after amcl #1458

Closed
maxlein opened this issue Jan 10, 2020 · 32 comments
Closed

TF NaN errors when starting navigation after amcl #1458

maxlein opened this issue Jan 10, 2020 · 32 comments

Comments

@maxlein
Copy link
Contributor

maxlein commented Jan 10, 2020

I am passing the initial pose by parameter at startup.
But occasionally I get this error:

[amcl-2] 1578667890.521587533: [osg_1.amcl] [WARN]  AMCL covariance or pose is NaN, likely due to an invalid configuration or faulty sensor measurements! Pose is not available!
[amcl-2] Error:   TF_NAN_INPUT: Ignoring transform for child_frame_id "odom" from authority "Authority undetectable" because of a nan value in the transform (-nan -nan -nan) (-nan -nan -nan -nan)
[amcl-2]          at line 282 in /tmp/binarydeb/ros-eloquent-tf2-0.12.4/src/buffer_core.cpp

And only solution without restarting amcl is to pass a new inital pose with rviz for example.

And I found out, that if I start amcl with a delay, there is no problem.
So I think it has to do something with odometry missing when amcl is setting the intial pose.
And therefore the odom covariance not being set and so the filter is not correct, ... something like that

I am setting these parameters:

  set_initial_pose
  initial_pose.x
  initial_pose.y
  initial_pose.z
  initial_pose.yaw

Is setting initial pose via parameters supported or is this something from ROS1 times?

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04
  • Version or commit hash:
    • Eloquent
  • DDS implementation:
    • CycloneDDS

Steps to reproduce issue

Set intial pose parameters and start amcl without odometry and pass it a laser scan.

Expected behavior

AMCL waits till all data is correctly set and then starts up

Actual behavior

AMCL covariance or pose is NaN, likely due to an invalid configuration or faulty sensor measurements! Pose is not available!

@SteveMacenski
Copy link
Member

What specifically are the values you’re putting into it?

NaNs usually show up from poor conditioning of quaternions.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 10, 2020

I don‘t have them by hand now, but the Parameter values passed in, are definitely correct.
So there is only odometry left. Covariance is fixed there, so is not the problem. So left is only odom position and maybe some not initialized values in amcl. Maybe when calculating the covariance for the published amcl pose. I have to look at the odom position but I also don‘t think the problem is there, as we work with the driver for a few months now and NaNs in the data „should“ have been detected by now.

@SteveMacenski
Copy link
Member

Try to find those values for me. Also make sure your TF tree is fully defined. The only way you get NaN in position and orientation is by having messed up quaternions if you think about how homogenous transforms are calculated

@maxlein
Copy link
Contributor Author

maxlein commented Jan 13, 2020

These are my inputs:

Odom - base link tf:

At time 1578907806.396800475
- Translation: [0.000, 0.000, 0.000]
- Rotation: in Quaternion [0.000, 0.000, 0.000, 1.000]
[INFO]   Activating
[INFO]   initialPoseReceived
[INFO]   Setting pose (1578907703.734933): -0.008 3.613 1.659

And after trying to reproduce this, I found out that if I start odom driver, amcl and laser, all is fine.
But if I start the navigation stack afterwards in another screen, the errors occur... Every time.

TF_DENORMALIZED_QUATERNION: Ignoring transform for child_frame_id "odom" from authority "Authority undetectable" because of an invalid quaternion in the transform (-nan -nan -nan -nan)
[amcl-2]          at line 295 in /tmp/binarydeb/ros-eloquent-tf2-0.12.4/src/buffer_core.cpp

If I start navigation before amcl, it's fine also.
FYI: In my setup the localization is split from navigation in different screens.
Screen 1( map_server, amcl )
Screen 2( controller_server, planner_server, recoveries_server, bt_navigator, waypoint_follower )

@maxlein maxlein changed the title AMCL: Initial pose error when odom was not received first TF NaN errors when starting navigation after amcl Jan 13, 2020
@SteveMacenski
Copy link
Member

Mhm. Is someone sending a transform on startup with an uninitialized quaternion for its (0,0,0,0) or something?

I'm not going to have cycles to look into this this week, but it sounds like you have a pretty good bet of where the issue is. PRs always welcome :-)

@SteveMacenski
Copy link
Member

https://github.com/ros-planning/navigation2/blob/master/nav2_amcl/src/amcl_node.cpp#L600

That line shows that it should exit if it can't get odometry. Above it also checks for the laser -> base link transform.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 14, 2020

I'm not going to have cycles to look into this this week, but it sounds like you have a pretty good bet of where the issue is.

No, I am not really sure anymore. :-)
None of the navigation nodes should somehow have influence on the tf tree/behavior but it looks like it does.
Also not sure how to debug this best...
Maybe run node by node from nav stack and then see when error occurs...

Update
So it doesn't have anything to do with nav stack I think...
Started mapping (slam_toolbox) and as soon as it's up I get these errors on every screen then...
Error: TF_DENORMALIZED_QUATERNION: Ignoring transform for child_frame_id "odom" from authority "Authority undetectable" because of an invalid quaternion in the transform (-nan -nan -nan -nan) [slam_toolbox-1] at line 295 in /tmp/binarydeb/ros-eloquent-tf2-0.12.4/src/buffer_core.cpp

@SteveMacenski
Copy link
Member

SteveMacenski commented Jan 14, 2020

Wait, are you saying you have both a SLAM library and AMCL running? That’s not how this works. AMCL is a localizer and ST is a mapper. They both produce map->odom transformations, essentially replaceable with each other. Having both run at the same time in the same namespace will have undefined behavior.

Ok, from that comment, I need to know alot more about your setup.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 14, 2020

Oh, no I don‘t think so, because I switched off publishing tf in mapping node. I need to check parameter tomorrow, but I am quite sure.
But it doesn‘t change anything regarding running the nav stack, as there was never a mapping node running. Just amcl.

@SteveMacenski
Copy link
Member

Then why mention or run SLAM?

Can you try this without that running anywhere?

@maxlein
Copy link
Contributor Author

maxlein commented Jan 14, 2020

It was only coincidence. I was just running the mapping process as I wanted to try something out and then got the error again.
SLAM was never running during my other tests here.

@SteveMacenski
Copy link
Member

SteveMacenski commented Jan 14, 2020

Ah ok. So you think its something upstream of AMCL, and also ST. That would place it squarely on the TF from base->sensor, TF odom->base. It could be actually an invalid report from the first reading of the base driver’s odometry or gazebo. Is this simulation or hardware? Any info you can provide there?

And I think it would make sense in gazebo if navigation isn’t launched yet. I think that provides the robot state publisher (not sure off hand) so there’s no valid transformations which the gazebo plugin might not check for.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 14, 2020

It‘s on real hardware. Didn‘t try to reproduce in gazebo yet. The first reading of the drivers odometry I posted a few comments before.

@SteveMacenski
Copy link
Member

Try in gazebo. Lets try to isolate where its coming from. I suspect highly its the odometry source.

@crdelsey
Copy link
Contributor

It might also be worthwhile to log the transforms by doing ros2 topic echo /tf > tf_logfile.txt We might be able to catch the offending transform and see exactly what's going on

@SteveMacenski
Copy link
Member

Well the problem is it could also be not a direct transform but NaNs introduced in multiplying transforms. I do agree though that is likely to find the culprit but just a note that if you don't see it, that doesn't rule it out.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 15, 2020

So on the real hardware I recorded tf.
Without the navigation stack, all looks fine, no errors.
As soon as I start navigation, there are NaN tf's published from map->odom.

NaN transforms after amcl is running ok and navigation is started
transforms:
- header:
    stamp:
      sec: 1579076054
      nanosec: 513660785
    frame_id: odom
  child_frame_id: base_link
  transform:
    translation:
      x: 0.0
      y: 0.0
      z: 0.0
    rotation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
---
transforms:
- header:
    stamp:
      sec: 1579076055
      nanosec: 484485973
    frame_id: map
  child_frame_id: odom
  transform:
    translation:
      x: -0.008200000000000318
      y: 3.613400000000122
      z: 0.0
    rotation:
      x: -0.0
      y: -0.0
      z: 0.7376276023126299
      w: 0.6752077608458901
---
transforms:
- header:
    stamp:
      sec: 1579076054
      nanosec: 563661504
    frame_id: odom
  child_frame_id: base_link
  transform:
    translation:
      x: 0.0
      y: 0.0
      z: 0.0
    rotation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
---
transforms:
- header:
    stamp:
      sec: 1579076055
      nanosec: 561734325
    frame_id: map
  child_frame_id: odom
  transform:
    translation:
      x: -0.008200000000000318
      y: 3.613400000000122
      z: 0.0
    rotation:
      x: -0.0
      y: -0.0
      z: 0.7376276023126299
      w: 0.6752077608458901
---
transforms:
- header:
    stamp:
      sec: 1579076054
      nanosec: 613665905
    frame_id: odom
  child_frame_id: base_link
  transform:
    translation:
      x: 0.0
      y: 0.0
      z: 0.0
    rotation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
---

...

transforms:
- header:
    stamp:
      sec: 1579076054
      nanosec: 813662176
    frame_id: odom
  child_frame_id: base_link
  transform:
    translation:
      x: 0.0
      y: 0.0
      z: 0.0
    rotation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
---
transforms:
- header:
    stamp:
      sec: 1579076055
      nanosec: 799623705
    frame_id: map
  child_frame_id: odom
  transform:
    translation:
      x: .nan
      y: .nan
      z: .nan
    rotation:
      x: .nan
      y: .nan
      z: .nan
      w: .nan
---

Complete logfiles:
tf_log_nan.txt
tf_log_startup_ok.txt

Update
Somehow it looks like it has something to do with fetching the robot pose from tf.
Restarting amcl cleans the tf tree from the invalid transform.

Next update
When I run with ST localization node instead of amcl, I don't have these errors.

@SteveMacenski
Copy link
Member

Please post your amcl configuration, TF tree from RQT, and a sample laser message

@maxlein
Copy link
Contributor Author

maxlein commented Jan 15, 2020

Scan msg
header:
  stamp:
    sec: 1579108738
    nanosec: 805037969
  frame_id: scan_link
angle_min: -2.399827718734741
angle_max: 2.3998003005981445
angle_increment: 0.00672220578417182
time_increment: 4.3000000005122274e-05
scan_time: 0.03999999910593033
range_min: 0.5
range_max: 40.0
ranges: [1.3259999752044678, 1.3320000171661377, 1.3389999866485596, 1.347000002861023, 1.3539999723434448, 1.3669999837875366, 1.4390000104904175, 1.781000018119812, 1.9220000505447388, 1.9390000104904175, 1.95200002193$
intensities: [81.0, 81.0, 80.0, 80.0, 81.0, 81.0, 51.0, 82.0, 82.0, 80.0, 80.0, 80.0, 80.0, 69.0, 69.0, 85.0, 84.0, 85.0, 85.0, 85.0, 85.0, 84.0, 85.0, 85.0, 84.0, 84.0, 81.0, 79.0, 81.0, 78.0, 79.0, 78.0, 76.0, 78.0, 81$
Amcl config
"set_initial_pose": True,
                    "initial_pose.x": x,
                    "initial_pose.y": y,
                    "initial_pose.z": 0.,
                    "initial_pose.yaw": yaw,
                    "save_pose_rate": 5.0,
                    "alpha1": 0.005,
                    "alpha2": 0.005,
                    "alpha3": 0.005,
                    "alpha4": 0.005,
                    "alpha5": 0.005,
                    "base_frame_id": "base_link",
                    "beam_skip_distance": 0.5,
                    "beam_skip_error_threshold": 0.9,
                    "beam_skip_threshold": 0.3,
                    "do_beamskip": False,
                    "global_frame_id": "map",
                    "lambda_short": 0.1,
                    "laser_likelihood_max_dist": 2.0,
                    "laser_max_range": 30.0,
                    "laser_min_range": -1.0,
                    "laser_model_type": "likelihood_field",
                    "max_beams": 60,
                    "max_particles": 2000,
                    "min_particles": 500,
                    "odom_frame_id": "odom",
                    "pf_err": 0.05,
                    "pf_z": 0.99,
                    "recovery_alpha_fast": 0.0,
                    "recovery_alpha_slow": 0.0,
                    "resample_interval": 1,
                    "robot_model_type": "differential",
                    "save_pose_rate": 0.5,
                    "sigma_hit": 0.2,
                    "tf_broadcast": True,
                    "transform_tolerance": 1.0,
                    "update_min_a": 0.2,
                    "update_min_d": 0.25,
                    "z_hit": 0.5,
                    "z_max": 0.05,
                    "z_rand": 0.5,
                    "z_short": 0.05

TF tree coming soon. Can't see RQT plugin atm...

tf monitor
Frames:
Frame: /camera_link, published by <no authority available>, Average Delay: 0.0237926, Max Delay: 0.979575
Frame: /caster_back_left_link, published by <no authority available>, Average Delay: 0.023788, Max Delay: 0.979574
Frame: /caster_back_right_link, published by <no authority available>, Average Delay: 0.0237885, Max Delay: 0.979574
Frame: /caster_front_left_link, published by <no authority available>, Average Delay: 0.0237878, Max Delay: 0.979574
Frame: /caster_front_right_link, published by <no authority available>, Average Delay: 0.0237878, Max Delay: 0.979574
Frame: /ifm_camera_link, published by <no authority available>, Average Delay: 0.0237869, Max Delay: 0.979574
Frame: /imu_link, published by <no authority available>, Average Delay: 0.0237869, Max Delay: 0.979574
Frame: /scan_0, published by <no authority available>, Average Delay: 0.0238071, Max Delay: 0.979573
Frame: /scan_link, published by <no authority available>, Average Delay: 0.0238071, Max Delay: 0.979573
Frame: base_link, published by <no authority available>, Average Delay: 0.00129464, Max Delay: 0.00788069
Frame: camera_optical_link, published by <no authority available>, Average Delay: 1629.37, Max Delay: 1629.37
Frame: odom, published by <no authority available>, Average Delay: -0.969808, Max Delay: 0

@SteveMacenski
Copy link
Member

SteveMacenski commented Jan 15, 2020

Your config has x/y/yaw, what are those specific values? I might guess that your yaw is invalid.

Your z_* variables, the ones in use for your specific sensor model, must add up to exactly 1.0. Make sure that is true, I forget off hand which ones your sensor model uses.

Post your TF tree when you can.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 15, 2020

Setting pose (1578907703.734933): -0.008 3.613 1.659

The last one is yaw.
I will look tomorrow.

@SteveMacenski
Copy link
Member

Whats your scanner to base link transform? I dont see it in tf (it'll be in tf_static)

@maxlein
Copy link
Contributor Author

maxlein commented Jan 16, 2020

- header:
    stamp:
      sec: 1579166761
      nanosec: 272669373
    frame_id: /base_link
  child_frame_id: /scan_link
  transform:
    translation:
      x: 0.363
      y: 0.0
      z: 1.015
    rotation:
      x: 1.0
      y: 0.0
      z: 0.0
      w: 6.123233995736766e-17

TF graph is not working as tf2_frames server is not released yet.

@SteveMacenski
Copy link
Member

Your frames are incorrect.

You have base_link in your TF publishers and /base_link as the frame ID for your scan transform. Those are 2 different frames in TF2.

If you look at your TF monitor, you'll see all but 3 have the / on them, really none should have them, but if you have them, they should all have them. TF2 will not remove the slash that TF1 did.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 17, 2020

Well I don't do that intentionally.
I just use default ROS tools/ways of publishing tf's.

So robot_state_publisher is adding a slash.
Static tf's launched like this ( doesn't matter if I add namespace or not ):

Node(package='robot_state_publisher',
             node_namespace=LaunchConfiguration('namespace'),
             node_name='robot_state_publisher',
             node_executable='robot_state_publisher',
             arguments=[LaunchConfiguration('urdf')],
             output='screen'),

If you say that base_link and /base_link are two different frames, then I shouldn't be able to transform from map to camera for example, because they are part of different trees.
But I can do that.

@SteveMacenski
Copy link
Member

Frankly at this point, I'm not exactly sure. I'd need you to give me exact instructions to reproduce to look any further.

Do you see this with the default stack or is it only with your specific hardware / configurations / software? If you start with default stuff and change things block-by-block until it occurs again that will at least give you an idea of where to look. Unfortunately, from what you've described above, there's not much more I can do without more context or ability to trigger myself.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 31, 2020

Can someone explain the idea behind this here:

void
AmclNode::parameterEventCallback(const rcl_interfaces::msg::ParameterEvent::SharedPtr & /*event*/)
{
    initParameters();
    initMessageFilters();
    initOdometry();
    initParticleFilter();
}

So when I start another node, in my case navigation stack, the parameter callback of amcl is triggered, because we can't subscribe to a specific parameter yet.
But amcl doesn't care what parameter was changed and calls all the init functions.

So amcl

  • doesn't look for a correct parameter
  • and if there would be a correct parameter, handling it can lead to an invalid pose
[amcl-6] nbefore lasers_update_ pf sample[0] pose x:-4.441800 y:-2.963400
[amcl-6] n1580482229.778642658: [osg_1.amcl] [DEBUG]        Parameter event received for node: 
[amcl-6] 0 -nan -nan -nan

For now I just commented the functions in parameterEventCallback() and it works...

@SteveMacenski
Copy link
Member

In AMCL (https://github.com/ros-planning/navigation/blob/melodic-devel/amcl/src/amcl_node.cpp#L499) for ROS1, the dynamic reconfigure (the way we think about updating parameters) callback would change all the parameters, initialize a new particle filter, odometry model, etc.

For ROS2, that's done with the parameter callbacks. What you point out there is essentially the same process but factored into steps. This is give or take, a direct port.

#1417

This PR reverted the function you listed above as a problem for the current master branch. So that method is no longer even in the codebase.

@maxlein
Copy link
Contributor Author

maxlein commented Jan 31, 2020

Omfg, why didn‘t I see this pr ...!?
Sorry...

@maxlein maxlein closed this as completed Jan 31, 2020
@maxlein
Copy link
Contributor Author

maxlein commented Jan 31, 2020

Well, PR was only for master.
Can we have this for eloquent too?

@maxlein maxlein reopened this Jan 31, 2020
@SteveMacenski
Copy link
Member

Huh, he closed it, well, that's cool. I guess he's running master but had some issue on an old hash in is workspace

reopens ticket

Ah, as I expected :-)

Commentary aside, that would be lovely. I think we're at a point for a Eloquent sync. @crdelsey do you mind setting up an Eloquent release with commits?

@SteveMacenski
Copy link
Member

Closing ticket, we've figured out the issue. Next time eloquent is synced it will be handled there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants