You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is triggered by aws-sdk-cpp >= 1.10.18 and is due to the legacy TransferManager which may still have threads running after the SDK has shut down and thus after aws_json_module_cleanup() has been called. More context is in aws/aws-sdk-cpp#2274 - the implications for aws-c-common are described below.
Problem description
With aws-sdk-cpp >= 1.10.18 we are repeatedly seeing failed program traces like this:
Fatal error condition occurred in external/aws-c-common/source/allocator.c:209: allocator != ((void *)0)Exiting Application#19 at 0x5643bc8cb05a in aws_mem_release#18 at 0x5643bc8d844b in cJSON_Delete#17 at 0x5643bc8d8485 in cJSON_Delete#16 at 0x5643bc8c781a in s_endpoints_ruleset_destroy#15 at 0x5643bc8ce995 in aws_ref_count_release#14 at 0x5643bc8c9a42 in aws_endpoints_ruleset_release#13 at 0x5643bc8c084a in s_endpoints_rule_engine_destroy#12 at 0x5643bc8ce995 in aws_ref_count_release#11 at 0x5643bc8c1a12 in aws_endpoints_rule_engine_release#10 at 0x5643bc11cf46 in std::_Sp_counted_base<>::_M_release()#9 at 0x5643bc278077 in Aws::S3::S3Client::~S3Client()#8 at 0x5643bc11cf46 in std::_Sp_counted_base<>::_M_release()#7 at 0x5643bc25e648 in Aws::Transfer::TransferManager::~TransferManager()#6 at 0x5643bc11cf46 in std::_Sp_counted_base<>::_M_release()#5 at 0x5643bc258dc3 in std::_Function_base::_Base_manager<>::_M_manager()#4 at 0x5643bc244b26 in Aws::S3::Model::GetObjectRequest::~GetObjectRequest()#3 at 0x5643bc2ef17d in std::_Function_base::_Base_manager<>::_M_manager()#2 at 0x5643bc8123cd in std::thread::_State_impl<>::~_State_impl()#1 at 0x7fa5e10c76e8 in <?>
Also a more deeply-nested JSON document de-allocation failure:
1/6/2023, 7:16:56 AM UTC stderr Fatal error condition occurred in external/aws-c-common/source/allocator.c:209: allocator != ((void *)0)1/6/2023, 7:16:56 AM UTC stderr Exiting Application1/6/2023, 7:16:56 AM UTC stderr #32 at 0x55d0184e7eca in aws_mem_release1/6/2023, 7:16:56 AM UTC stderr #31 at 0x55d0184f5294 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #30 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #29 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #28 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #27 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #26 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #25 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #24 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #23 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #22 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #21 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #20 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #19 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #18 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #17 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #16 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #15 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #14 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #13 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #12 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #11 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #10 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #9 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #8 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #7 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #6 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #5 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #4 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #3 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #2 at 0x55d0184f52a5 in cJSON_Delete1/6/2023, 7:16:56 AM UTC stderr #1 at 0x55d0184f52a5 in cJSON_Delete
Analysis
The failed assertion occurs after main() has returned and after aws_json_module_cleanup() has set the s_aws_json_module_allocator to NULL (see aws/aws-sdk-cpp#2274 for details):
// aws-c-common/source/allocator.cvoidaws_mem_release(structaws_allocator*allocator, void*ptr) {
AWS_FATAL_PRECONDITION(allocator!=NULL); // <== LINE 209AWS_FATAL_PRECONDITION(allocator->mem_release!=NULL);
if (ptr!=NULL) {
allocator->mem_release(allocator, ptr);
}
}
The cJSON_Delete / aws_mem_release call was initiated through aws_endpoints_rule_engine_release, the aws-c-sdkutilsdependency calling into aws-c-common.
As described in aws/aws-sdk-cpp#2274, the aws-sdk-cpp may still have threads running afteraws_json_module_cleanup() has been called. These threads want to deallocate memory. There is currently no straightforward solution to coordinate/await shutdown of legacy TransferManager threads before the SDK performs API shutdown calls.
The failed assertion in aws_mem_release causes (potentially long-running) programs to fail after they already successfully completed their main() routine.
What to do
Simply ignoring the NULL allocator in aws_mem_release would not help, since this would create a memory leak sanitizers would pick up on.
To make aws-c-common robust against the problems described in aws/aws-sdk-cpp#2274, the best work-around for the moment would be to not set the module_allocator to NULL. The rationale for this is that aws_json_module_cleanup() is called at the end as a shutdown function so that another call to aws_json_module_init() is extremely unlikely. Not clearing the module_allocator would allow "late threads" to de-allocate properly.
The text was updated successfully, but these errors were encountered:
grrtrr
added a commit
to grrtrr/aws-c-common
that referenced
this issue
Jan 6, 2023
This keeps the JSON module_allocator alive even after clean-up, to prevent
late-deallocation issues occurring in aws-sdk-cpp from causing programs to
fail after main() has returned.
Resolvesawslabs#964.
This issue is triggered by
aws-sdk-cpp
>= 1.10.18 and is due to the legacyTransferManager
which may still have threads running after the SDK has shut down and thus afteraws_json_module_cleanup()
has been called. More context is in aws/aws-sdk-cpp#2274 - the implications foraws-c-common
are described below.Problem description
With
aws-sdk-cpp
>= 1.10.18 we are repeatedly seeing failed program traces like this:Also a more deeply-nested JSON document de-allocation failure:
Analysis
The failed assertion occurs after
main()
has returned and afteraws_json_module_cleanup()
has set thes_aws_json_module_allocator
toNULL
(see aws/aws-sdk-cpp#2274 for details):The
cJSON_Delete / aws_mem_release
call was initiated throughaws_endpoints_rule_engine_release
, theaws-c-sdkutils
dependency calling intoaws-c-common
.As described in aws/aws-sdk-cpp#2274, the
aws-sdk-cpp
may still have threads running afteraws_json_module_cleanup()
has been called. These threads want to deallocate memory. There is currently no straightforward solution to coordinate/await shutdown of legacyTransferManager
threads before the SDK performs API shutdown calls.The failed assertion in
aws_mem_release
causes (potentially long-running) programs to fail after they already successfully completed theirmain()
routine.What to do
Simply ignoring the
NULL
allocator inaws_mem_release
would not help, since this would create a memory leak sanitizers would pick up on.To make
aws-c-common
robust against the problems described in aws/aws-sdk-cpp#2274, the best work-around for the moment would be to not set the module_allocator toNULL
. The rationale for this is thataws_json_module_cleanup()
is called at the end as a shutdown function so that another call toaws_json_module_init()
is extremely unlikely. Not clearing the module_allocator would allow "late threads" to de-allocate properly.The text was updated successfully, but these errors were encountered: