Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta] Potential fix for issue #801 #802

Closed
wants to merge 1 commit into from
Closed

[meta] Potential fix for issue #801 #802

wants to merge 1 commit into from

Conversation

rajkumar38
Copy link
Contributor

Signed-off-by: Rajkumar Pennadam Ramamoorthy rpennadamram@marvell.com

Issue:
#801

Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
@@ -1268,7 +1268,8 @@ std::string sai_serialize_acl_resource_list(
return j.dump();
}

json arr = json::array();
json::array_t a;
json arr(a);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why this fixes the existing issue? also if you grep this file by json::array() its used in 7 places, so it would require to be fixed in all of them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line, "json arr = json::array();" calls constructor, "basic_json(init, false, value_t::array);" in file json.hpp:1615, whereas, "json::array_t a; json arr(a);" invokes different constructor "basic_json(const array_t& val)".

Here, the issue appears to be stack corruption, where passed parameter seems to have a value different from the one passed by the calling method. We see this issue only in this flow on API "sai_serialize_acl_resource_list".
When modified the constructor call to pass only 2 arg and assign 3 arg from default value as below, issue is not seen.
"basic_json(init, false);" in file json.hpp:1615

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is actual root cause, since the same exact line is used in different serialization methods in this file as well, and if they don't cause same error, then this could potentially by compiler bug or some memory alignment, for example same constructor us used in sai_serialize_json_fdb_event_notification_data and sai_serialize_fdb_event_ntf or sai_serialize_port_oper_status_ntf. So if you don't hit this error anywhere else, then it ether means that you don't receive those notifications in your case (if error is persistent), or the error is related to some compiler/optimization. Also, this error is not triggered on x86/64 platform, only on armhf, which would suggest armhf platform is flawed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will you be able to see if you will hit this error without -O2 enabled ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do you have idea which actual line in json.hpp is causing this error? maybe it's better to fix this in json.hpp rather than here, if many different components may use json.hpp and this array constructor

Copy link
Contributor

@tahmed-dev tahmed-dev Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lguohan, it is not clear to me from the stack backtrace why boost v 1.71 is at play here. @kcudnik are we using API across libraries that are using boost? My understanding is the swss-common had compile time dependency on boost smart pointers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boost was recently added as dependency as am' aware, and sairedis library is not currently using any boost bindings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was in azure pipeline build docker. Wondering if @rajkumar38 has mismatched binaries by any chance???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was in azure pipeline build docker. Wondering if @rajkumar38 has mismatched binaries by any chance???

I believe all packages are installed in build docker using apt-get command and I don't see any package mismatch (verified with dpkg -l in build docker env)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will you be able to see if you will hit this error without -O2 enabled ?

With changes from PR 6532, I recompiled libswsscommon, libsairedis and libsaimetadata debs with O2 flag disabled. With this, I'm not seeing the crash.
Issue is seen when PR 6532 and O2 flag enabled together.
I don't think any compiler settings changed recently and is not clear how PR 6532 is causing the regression.

@lguohan
Copy link
Contributor

lguohan commented Feb 26, 2021

do we need this for 202012 release?

@carl-nokia
Copy link

Yes this is needing backport to 202012 -because I believe the boost upgrade PR ( PR6532 ) was backported to 202012

Copy link
Collaborator

@kcudnik kcudnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do you have idea which actual line in json.hpp is causing this error? maybe it's better to fix this in json.hpp rather than here, if many different components may use json.hpp and this array constructor

@rajkumar38
Copy link
Contributor Author

rajkumar38 commented Mar 3, 2021

also, do you have idea which actual line in json.hpp is causing this error? maybe it's better to fix this in json.hpp rather than here, if many different components may use json.hpp and this array constructor

also, do you have idea which actual line in json.hpp is causing this error? maybe it's better to fix this in json.hpp rather than here, if many different components may use json.hpp and this array constructor

We can have below changes in json.hpp. Let me know if we can have this fix, I will raise PR accordingly.

diff --git a/common/json.hpp b/common/json.hpp
index b231601..039c2a2 100644
--- a/common/json.hpp
+++ b/common/json.hpp
@@ -1612,7 +1612,7 @@ class basic_json
     static basic_json array(std::initializer_list<basic_json> init =
                                 std::initializer_list<basic_json>())
     {
-        return basic_json(init, false, value_t::array);
+        return basic_json(init, false);
     }

     /*!

@rajkumar38
Copy link
Contributor Author

rajkumar38 commented Mar 10, 2021

@kcudnik @lguohan . Please comment. Let know if we can raise the PR with above changes (diff) in json.hpp ?

@kcudnik
Copy link
Collaborator

kcudnik commented Mar 10, 2021

yes, seems good to me, please rise pr for swss-common

@rajkumar38
Copy link
Contributor Author

yes, seems good to me, please rise pr for swss-common

Thanks Kamil.
Raised PR sonic-net/sonic-swss-common#466 in swss-common repo.

@kcudnik
Copy link
Collaborator

kcudnik commented Apr 3, 2021

potential fixes:
sonic-net/sonic-swss#1689
sonic-net/sonic-swss-common#471

as analysis progresses on #801, probably cross compiler issue, if those 2 above will be successful, this one could be closed

@rajkumar38
Copy link
Contributor Author

Closing this PR as analysis and progress tracked in #801.

@rajkumar38 rajkumar38 closed this Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants