Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws_json_module] AWS C++ SDK needs module_allocator to be alive even after cleanup #964

Closed
grrtrr opened this issue Jan 6, 2023 · 1 comment

Comments

@grrtrr
Copy link

grrtrr commented Jan 6, 2023

This issue is triggered by aws-sdk-cpp >= 1.10.18 and is due to the legacy TransferManager which may still have threads running after the SDK has shut down and thus after aws_json_module_cleanup() has been called. More context is in aws/aws-sdk-cpp#2274 - the implications for aws-c-common are described below.

Problem description

With aws-sdk-cpp >= 1.10.18 we are repeatedly seeing failed program traces like this:

Fatal error condition occurred in external/aws-c-common/source/allocator.c:209: allocator != ((void *)0)
Exiting Application

#19      at 0x5643bc8cb05a in aws_mem_release
#18      at 0x5643bc8d844b in cJSON_Delete
#17      at 0x5643bc8d8485 in cJSON_Delete
#16      at 0x5643bc8c781a in s_endpoints_ruleset_destroy
#15      at 0x5643bc8ce995 in aws_ref_count_release
#14      at 0x5643bc8c9a42 in aws_endpoints_ruleset_release
#13      at 0x5643bc8c084a in s_endpoints_rule_engine_destroy
#12      at 0x5643bc8ce995 in aws_ref_count_release
#11      at 0x5643bc8c1a12 in aws_endpoints_rule_engine_release
#10      at 0x5643bc11cf46 in std::_Sp_counted_base<>::_M_release()
#9       at 0x5643bc278077 in Aws::S3::S3Client::~S3Client()
#8       at 0x5643bc11cf46 in std::_Sp_counted_base<>::_M_release()
#7       at 0x5643bc25e648 in Aws::Transfer::TransferManager::~TransferManager()
#6       at 0x5643bc11cf46 in std::_Sp_counted_base<>::_M_release()
#5       at 0x5643bc258dc3 in std::_Function_base::_Base_manager<>::_M_manager()
#4       at 0x5643bc244b26 in Aws::S3::Model::GetObjectRequest::~GetObjectRequest()
#3       at 0x5643bc2ef17d in std::_Function_base::_Base_manager<>::_M_manager()
#2       at 0x5643bc8123cd in std::thread::_State_impl<>::~_State_impl()
#1       at 0x7fa5e10c76e8 in <?>

Also a more deeply-nested JSON document de-allocation failure:

1/6/2023, 7:16:56 AM UTC	stderr	Fatal error condition occurred in external/aws-c-common/source/allocator.c:209: allocator != ((void *)0)
1/6/2023, 7:16:56 AM UTC	stderr	Exiting Application

1/6/2023, 7:16:56 AM UTC	stderr	#32	 at 0x55d0184e7eca in aws_mem_release
1/6/2023, 7:16:56 AM UTC	stderr	#31	 at 0x55d0184f5294 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#30	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#29	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#28	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#27	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#26	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#25	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#24	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#23	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#22	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#21	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#20	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#19	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#18	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#17	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#16	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#15	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#14	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#13	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#12	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#11	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#10	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#9	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#8	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#7	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#6	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#5	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#4	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#3	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#2	 at 0x55d0184f52a5 in cJSON_Delete
1/6/2023, 7:16:56 AM UTC	stderr	#1	 at 0x55d0184f52a5 in cJSON_Delete

Analysis

The failed assertion occurs after main() has returned and after aws_json_module_cleanup() has set the s_aws_json_module_allocator to NULL (see aws/aws-sdk-cpp#2274 for details):

// aws-c-common/source/allocator.c
void aws_mem_release(struct aws_allocator *allocator, void *ptr) {
    AWS_FATAL_PRECONDITION(allocator != NULL);  // <== LINE 209
    AWS_FATAL_PRECONDITION(allocator->mem_release != NULL);

    if (ptr != NULL) {
        allocator->mem_release(allocator, ptr);
    }
}

The cJSON_Delete / aws_mem_release call was initiated through aws_endpoints_rule_engine_release, the aws-c-sdkutils dependency calling into aws-c-common.

As described in aws/aws-sdk-cpp#2274, the aws-sdk-cpp may still have threads running after aws_json_module_cleanup() has been called. These threads want to deallocate memory. There is currently no straightforward solution to coordinate/await shutdown of legacy TransferManager threads before the SDK performs API shutdown calls.

The failed assertion in aws_mem_release causes (potentially long-running) programs to fail after they already successfully completed their main() routine.

What to do

Simply ignoring the NULL allocator in aws_mem_release would not help, since this would create a memory leak sanitizers would pick up on.

To make aws-c-common robust against the problems described in aws/aws-sdk-cpp#2274, the best work-around for the moment would be to not set the module_allocator to NULL. The rationale for this is that aws_json_module_cleanup() is called at the end as a shutdown function so that another call to aws_json_module_init() is extremely unlikely. Not clearing the module_allocator would allow "late threads" to de-allocate properly.

grrtrr added a commit to grrtrr/aws-c-common that referenced this issue Jan 6, 2023
This keeps the JSON module_allocator alive even after clean-up, to prevent
late-deallocation issues occurring in aws-sdk-cpp from causing programs to
fail after main() has returned.

Resolves awslabs#964.
@grrtrr
Copy link
Author

grrtrr commented Feb 2, 2023

After upgrading aws-sdk-cpp to 1.10.54 (which includes aws/aws-sdk-cpp#2291), no longer able to reproduce this condition.

@grrtrr grrtrr closed this as completed Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant