Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better vectorization and crc64 #79

Merged
merged 119 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
c6248e0
Added CRC32C AVX512 support.
javazque Jun 22, 2023
cf22bca
Fixed routine name to indicate crc32c
pbadari Jun 27, 2023
def3a68
Merge pull request #1 from pbadari/avx512_support
pbadari Jun 27, 2023
375fa35
Add sse42 avx512_intrinsics support
javazque Jul 14, 2023
9e18b50
Merge pull request #2 from pbadari/sse42_avx512_intrinsics
pbadari Jul 14, 2023
469ef71
Merge branch 'main' into main
JonathanHenson Jul 14, 2023
2eb5578
Refactoring work for the AVX512 code path. Testing shows it not quite…
JonathanHenson Jul 19, 2023
837d5a1
Keep the naive avx512 path on for figuring out codebuild capabilities…
JonathanHenson Jul 19, 2023
2289c96
Fix build and do correct cpu feature detection.
JonathanHenson Jul 19, 2023
ee3e5da
fix 32-bit builds and builds that need to work without intrinsics ava…
JonathanHenson Jul 19, 2023
005ed7c
Not sure how the avx512 code got called. hopefully coedebuild is just…
JonathanHenson Jul 19, 2023
39094d4
Found why the wrong build files were being used at least.
JonathanHenson Jul 19, 2023
d4ffdc1
Make test pass when it passes.
JonathanHenson Jul 19, 2023
bf79936
Try it again.
JonathanHenson Jul 19, 2023
1e24d06
fix leftover symbol collision.
JonathanHenson Jul 19, 2023
907e721
Added more compile gates and assertions.
JonathanHenson Jul 19, 2023
5ab0046
Fix osx build.
JonathanHenson Jul 20, 2023
ca43c51
make the bitflips uniform.
JonathanHenson Jul 20, 2023
28dde8b
add additional runtime cpuid check and run formatter.
JonathanHenson Jul 20, 2023
a00a8e3
work around nasty bitflipping logic.
JonathanHenson Jul 20, 2023
5138407
Addressed review comments, use ternary logic instructions and optimiz…
pbadari Jan 1, 2024
2a084ac
Added crc64 implementations for arm and intel, Added some avx512 code…
JonathanHenson Jan 18, 2024
3f0092f
ran formatter.
JonathanHenson Jan 18, 2024
d78dcb8
Updated formatter and intel compile branch.
JonathanHenson Jan 18, 2024
1cda49a
lets try again.
JonathanHenson Jan 18, 2024
145267d
Manually format and updated compiler flag.
JonathanHenson Jan 18, 2024
53fb00f
msvc fix, more formatting, append compiler flag rather than resetting…
JonathanHenson Jan 18, 2024
1e50c18
sucks to only have an arm machine to test this on.
JonathanHenson Jan 18, 2024
fca8adc
See if cleaning this up helps.
JonathanHenson Jan 22, 2024
dab6a82
Merge branch 'main' into better_vectorization_and_crc64
JonathanHenson Jan 23, 2024
05799a5
Another clmul typo.
JonathanHenson Jan 23, 2024
ae55f55
who knows.
JonathanHenson Jan 23, 2024
160d587
use the new macros
JonathanHenson Jan 23, 2024
dd35f50
missed one.
JonathanHenson Jan 23, 2024
765313f
okay, now we're back to code being broken babuy
JonathanHenson Jan 23, 2024
166ddd9
there's the vl we needed.
JonathanHenson Jan 23, 2024
9bad62d
maybe this is all i needed.
JonathanHenson Jan 23, 2024
543c487
add sse4.2 flag back to the avx512 build for gcc8.
JonathanHenson Jan 23, 2024
ea25508
windows build fixes, as well as x86 build mismatch.
JonathanHenson Jan 23, 2024
7b63f06
more windows fixes and macros.
JonathanHenson Jan 23, 2024
58ece21
non-standard intrinsics headers?
JonathanHenson Jan 23, 2024
c6e65ea
incorrect macro syntax.
JonathanHenson Jan 23, 2024
b074483
run formatter, fix function delcarations.
JonathanHenson Jan 23, 2024
a3ab193
Maybe the headers i need for older compilers.
JonathanHenson Jan 23, 2024
d7ccb7d
more formatter fixes.
JonathanHenson Jan 23, 2024
4a04be9
linters.
JonathanHenson Jan 23, 2024
42f6e10
Magical IDE tabs are the frickin worst.
JonathanHenson Jan 23, 2024
b0875b6
Check 64-bit arch before assuming can use arm8.1
JonathanHenson Jan 23, 2024
3456c37
don't compile the crc64 arm stuff if not 64 bit.
JonathanHenson Jan 23, 2024
032a7e5
use consistent header includes.
JonathanHenson Jan 23, 2024
5473f71
use an actually widely documented intrinsic.
JonathanHenson Jan 23, 2024
d61841a
fix test build and exported symbol needed for tests.
JonathanHenson Jan 23, 2024
f9a7709
Use sse2 on the msvc version when 4.2. is specified.
JonathanHenson Jan 23, 2024
856e2a2
work around old microsoft compiler.
JonathanHenson Jan 23, 2024
7ad72d5
add it to the correct file this time.
JonathanHenson Jan 23, 2024
0cda5eb
get your types right dude.
JonathanHenson Jan 23, 2024
73330e1
don't use the clmul version on old msvc.
JonathanHenson Jan 23, 2024
af2952e
run formatter.
JonathanHenson Jan 23, 2024
24165d7
format again.
JonathanHenson Jan 23, 2024
bfb8600
Added more thorough testing to make sure all the hw accelerated branc…
JonathanHenson Jan 24, 2024
86ca022
msvc compiler errors.
JonathanHenson Jan 24, 2024
82a8ed3
i think the warning was actually right.
JonathanHenson Jan 24, 2024
449c8b4
Use runtime cpu checks for arm.
JonathanHenson Jan 24, 2024
995ea61
Clean up cmake.
JonathanHenson Jan 24, 2024
1a3d6bd
Remove unneeded glob.
JonathanHenson Jan 24, 2024
52ed9e7
make sure tests use the testing allocator.
JonathanHenson Jan 24, 2024
a8fecf8
fix windows test compiler watning.
JonathanHenson Jan 24, 2024
c91a81d
restructured the code so the fallthroughs are less complicated.
JonathanHenson Jan 29, 2024
4319bab
typo.
JonathanHenson Jan 29, 2024
34de264
put the asm file back.
JonathanHenson Jan 29, 2024
1fa581a
compile guard on the sse cmul fallback.
JonathanHenson Jan 29, 2024
aaa5a02
make sure that uber file has the right flags.
JonathanHenson Jan 29, 2024
048bf58
Move includes inside their macro guards.
JonathanHenson Jan 29, 2024
333a6d3
Add the null implementation back
JonathanHenson Jan 30, 2024
7021062
fix the null inclusion in cmake.
JonathanHenson Jan 30, 2024
e028f3e
add the generic implementation, renamed from null.
JonathanHenson Jan 30, 2024
8516cc9
Shave that yak!
JonathanHenson Jan 30, 2024
a43f739
just learned why a c-style cast is memory unsafe even when you know y…
JonathanHenson Jan 31, 2024
43e87ad
try just making sure the data is aligned first.
JonathanHenson Jan 31, 2024
4ac54af
fix constness.
JonathanHenson Jan 31, 2024
a7d22dd
run formatter and fix conditional.
JonathanHenson Jan 31, 2024
0e31476
Use the correct branch this time.
JonathanHenson Jan 31, 2024
4870411
see what happens without an alignment on those arrays.
JonathanHenson Feb 1, 2024
0d4f728
see what happens without an alignment on those arrays.
JonathanHenson Feb 1, 2024
ef83ed9
Try not telling ASAN quite so much info about the type and see if it…
JonathanHenson Feb 1, 2024
86604f0
restrict the input size to always hit the sw implmentation on smaller…
JonathanHenson Feb 1, 2024
19d5344
Try intrinsics we can actually use everywhere.
JonathanHenson Feb 1, 2024
2e73b17
zmm, not xmm.
JonathanHenson Feb 1, 2024
778ed1d
Fix xlmuil intel build.
JonathanHenson Feb 1, 2024
96067a6
Use cmake function more widely available.
JonathanHenson Feb 1, 2024
3dfaaf6
update crc32c and clean up macros.
JonathanHenson Feb 5, 2024
1c2625c
Don't do the generic fallback file as its unneeded.
JonathanHenson Feb 5, 2024
4c51525
Remove unneeded clang format stuff.
JonathanHenson Feb 5, 2024
729bf32
visual studio not saving without an explicit ctrl+s is some b.s.
JonathanHenson Feb 5, 2024
0ab865e
More build fixes.
JonathanHenson Feb 5, 2024
2e61be3
Update tests to have randomized input data.
JonathanHenson Feb 5, 2024
d86e10c
Fixed avx512 detection for compiling crc64_avx512 impl.
JonathanHenson Feb 6, 2024
9cd74b9
Added runner for apple arm.
JonathanHenson Feb 8, 2024
8fd01c7
fix build of profiler run.
JonathanHenson Feb 8, 2024
4eb218d
use branch name for builder.
JonathanHenson Feb 8, 2024
c776234
use actual branch name.
JonathanHenson Feb 8, 2024
8b11a21
why is it not using the host arch
JonathanHenson Feb 8, 2024
3f37ea6
read the source code i guess...
JonathanHenson Feb 8, 2024
49fefa1
specify target.
JonathanHenson Feb 8, 2024
49e1aa0
specify target.
JonathanHenson Feb 8, 2024
95bd82e
try a different target.
JonathanHenson Feb 8, 2024
5983351
Fix memory read for stack allocated buffers.
JonathanHenson Feb 8, 2024
94b6f5d
Fix windows conversion errors on profile run.
JonathanHenson Feb 8, 2024
1657ac5
run profiler as part of tests.
JonathanHenson Feb 8, 2024
499ec69
run profiler as part of tests.
JonathanHenson Feb 8, 2024
d86d672
use correct value name for test steaps.
JonathanHenson Feb 8, 2024
4a61537
try just invoking ctest.
JonathanHenson Feb 8, 2024
eb95b28
use default tester path.
JonathanHenson Feb 8, 2024
1657807
switch over crc64 to nvme flavor
DmitriyMusatkin Sep 3, 2024
15b9716
fix benchmark
DmitriyMusatkin Sep 3, 2024
4915177
Merge branch 'main' into better_vectorization_and_crc64
DmitriyMusatkin Sep 3, 2024
0517612
lint
DmitriyMusatkin Sep 3, 2024
478c6cb
lets try again
DmitriyMusatkin Sep 3, 2024
2f55d46
minor
DmitriyMusatkin Sep 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
- 'main'

env:
BUILDER_VERSION: v0.9.62
BUILDER_VERSION: v0.9.63
BUILDER_SOURCE: releases
BUILDER_HOST: https://d19elf31gohf1l.cloudfront.net
PACKAGE_NAME: aws-checksums
Expand Down Expand Up @@ -146,6 +146,18 @@ jobs:
chmod a+x builder
./builder build -p ${{ env.PACKAGE_NAME }}

osx-m1:
runs-on: macos-14-xlarge # latest arm build
strategy:
matrix:
arch: [ macos-armv8 ]
steps:
- name: Build ${{ env.PACKAGE_NAME }} + consumers
run: |
python3 -c "from urllib.request import urlretrieve; urlretrieve('${{ env.BUILDER_HOST }}/${{ env.BUILDER_SOURCE }}/${{ env.BUILDER_VERSION }}/builder.pyz?run=${{ env.RUN }}', 'builder')"
chmod a+x builder
./builder build -p ${{ env.PACKAGE_NAME }} --target=${{matrix.arch}}

macos-x64:
runs-on: macos-14-large # latest
steps:
Expand Down
109 changes: 60 additions & 49 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ string(REPLACE ";" "${AWS_MODULE_DIR};" AWS_MODULE_PATH "${CMAKE_PREFIX_PATH}${A
# Append that generated list to the module search path
list(APPEND CMAKE_MODULE_PATH ${AWS_MODULE_PATH})

include(AwsSIMD)
include(AwsCFlags)
include(AwsCheckHeaders)
include(AwsSharedLibSetup)
Expand All @@ -53,54 +54,6 @@ if(MSVC)
source_group("Source Files" FILES ${AWS_CHECKSUMS_SRC})
endif()

file(GLOB AWS_ARCH_SRC
"source/generic/*.c"
)

if (USE_CPU_EXTENSIONS)
if(AWS_ARCH_INTEL)
# First, check if inline assembly is available. Inline assembly can also be supported by MSVC if the compiler in use is Clang.
if(AWS_HAVE_GCC_INLINE_ASM)
file(GLOB AWS_ARCH_SRC
"source/intel/asm/*.c"
)
elseif (MSVC)
file(GLOB AWS_ARCH_SRC
"source/intel/visualc/*.c"
)
source_group("Source Files\\intel\\visualc" FILES ${AWS_ARCH_SRC})
endif()
endif()

if (MSVC AND AWS_ARCH_ARM64)
file(GLOB AWS_ARCH_SRC
"source/arm/*.c"
)
source_group("Source Files\\arm" FILES ${AWS_ARCH_SRC})

elseif (AWS_ARCH_ARM64)
file(GLOB AWS_ARCH_SRC
"source/arm/*.c"
)
SET_SOURCE_FILES_PROPERTIES(source/arm/crc32c_arm.c PROPERTIES COMPILE_FLAGS -march=armv8-a+crc )
elseif ((NOT MSVC) AND AWS_ARCH_ARM32)
set(CMAKE_REQUIRED_FLAGS "-march=armv8-a+crc -Werror")
check_c_source_compiles("
#include <arm_acle.h>
int main() {
int crc = __crc32d(0, 1);
return 0;
}" AWS_ARM32_CRC)
unset(CMAKE_REQUIRED_FLAGS)
if (AWS_ARM32_CRC)
file(GLOB AWS_ARCH_SRC
"source/arm/*.c"
)
SET_SOURCE_FILES_PROPERTIES(source/arm/crc32c_arm.c PROPERTIES COMPILE_FLAGS -march=armv8-a+crc )
endif()
endif()
endif()

file(GLOB CHECKSUMS_COMBINED_HEADERS
${AWS_CHECKSUMS_HEADERS}
${AWS_CHECKSUMS_PRIV_HEADERS}
Expand All @@ -109,11 +62,11 @@ file(GLOB CHECKSUMS_COMBINED_HEADERS
file(GLOB CHECKSUMS_COMBINED_SRC
${AWS_CHECKSUMS_SRC}
${AWS_CHECKSUMS_PLATFORM_SOURCE}
${AWS_ARCH_SRC}
)


add_library(${PROJECT_NAME} ${CHECKSUMS_COMBINED_HEADERS} ${CHECKSUMS_COMBINED_SRC})

aws_set_common_properties(${PROJECT_NAME})
aws_prepare_symbol_visibility_args(${PROJECT_NAME} "AWS_CHECKSUMS")
aws_check_headers(${PROJECT_NAME} ${AWS_CHECKSUMS_HEADERS})
Expand All @@ -123,6 +76,63 @@ aws_add_sanitizers(${PROJECT_NAME})
# We are not ABI stable yet
set_target_properties(${PROJECT_NAME} PROPERTIES VERSION 1.0.0)

if (USE_CPU_EXTENSIONS)
if (AWS_ARCH_INTEL)
file (GLOB AWS_ARCH_INTEL_SRC
"source/intel/*.c"
)

if (MSVC)
file(GLOB AWS_ARCH_INTRIN_SRC
"source/intel/intrin/*.c"
)

source_group("Source Files\\intel" FILES ${AWS_ARCH_INTEL_SRC})
source_group("Source Files\\intel\\intrin" FILES ${AWS_ARCH_INTRIN_SRC})
else()
if (AWS_HAVE_GCC_INLINE_ASM)
simd_append_source_and_features(${PROJECT_NAME} "source/intel/asm/crc32c_sse42_asm.c" ${AWS_SSE4_2_FLAG})
endif()
endif()


set(UBER_FILE_FLAGS "")
if (AWS_HAVE_AVX512_INTRINSICS)
list(APPEND UBER_FILE_FLAGS ${AWS_AVX512_FLAG})
list(APPEND UBER_FILE_FLAGS ${AWS_AVX512vL_FLAG})
list(APPEND UBER_FILE_FLAGS ${AWS_AVX2_FLAG})
simd_append_source_and_features(${PROJECT_NAME} "source/intel/intrin/crc64nvme_avx512.c" ${AWS_AVX512_FLAG} ${AWS_AVX512vL_FLAG} ${AWS_AVX2_FLAG} ${AWS_CLMUL_FLAG} ${AWS_SSE4_2_FLAG})

endif()

if (AWS_HAVE_CLMUL)
list(APPEND UBER_FILE_FLAGS ${AWS_CLMUL_FLAG})
endif()

list(APPEND UBER_FILE_FLAGS "${AWS_SSE4_2_FLAG}")

# this file routes all of the implementations together based on available cpu features. It gets built regardless
# of which flags exist. The c file sorts it out.
simd_append_source_and_features(${PROJECT_NAME} "source/intel/intrin/crc32c_sse42_avx512.c" ${UBER_FILE_FLAGS})

if (AWS_HAVE_CLMUL)
simd_append_source_and_features(${PROJECT_NAME} "source/intel/intrin/crc64nvme_clmul.c" ${AWS_AVX2_FLAG} ${AWS_CLMUL_FLAG} ${AWS_SSE4_2_FLAG})
endif()


elseif(AWS_ARCH_ARM64 OR (AWS_ARCH_ARM32 AND AWS_HAVE_ARM32_CRC))
simd_append_source_and_features(${PROJECT_NAME} "source/arm/crc32c_arm.c" ${AWS_ARMv8_1_FLAG})
simd_append_source_and_features(${PROJECT_NAME} "source/arm/crc64_arm.c" ${AWS_ARMv8_1_FLAG})

if (MSVC)
file(GLOB AWS_ARCH_SRC
"source/arm/*.c"
)
source_group("Source Files\\arm" FILES ${AWS_ARCH_SRC})
endif()
endif()
endif()

target_include_directories(${PROJECT_NAME} PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:include>)
Expand Down Expand Up @@ -156,4 +166,5 @@ install(FILES "${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}-config.cmake"
include(CTest)
if (BUILD_TESTING)
add_subdirectory(tests)
add_subdirectory(bin/benchmark)
endif ()
29 changes: 29 additions & 0 deletions bin/benchmark/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
project(checksum-profile C)

list(APPEND CMAKE_MODULE_PATH "${CMAKE_INSTALL_PREFIX}/lib/cmake")

file(GLOB PROFILE_SRC
"*.c"
)

set(PROFILE_PROJECT_NAME checksum-profile)
add_executable(${PROFILE_PROJECT_NAME} ${PROFILE_SRC})
aws_set_common_properties(${PROFILE_PROJECT_NAME})


target_include_directories(${PROFILE_PROJECT_NAME} PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:include>)

target_link_libraries(${PROFILE_PROJECT_NAME} PRIVATE aws-checksums)

if (BUILD_SHARED_LIBS AND NOT WIN32)
message(INFO " checksum-profile will be built with shared libs, but you may need to set LD_LIBRARY_PATH=${CMAKE_INSTALL_PREFIX}/lib to run the application")
endif()

install(TARGETS ${PROFILE_PROJECT_NAME}
EXPORT ${PROFILE_PROJECT_NAME}-targets
COMPONENT Runtime
RUNTIME
DESTINATION bin
COMPONENT Runtime)
127 changes: 127 additions & 0 deletions bin/benchmark/main.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
/**
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
* SPDX-License-Identifier: Apache-2.0.
*/

#include <aws/checksums/crc.h>
#include <aws/checksums/private/crc64_priv.h>
#include <aws/checksums/private/crc_priv.h>

#include <aws/common/allocator.h>
#include <aws/common/byte_buf.h>
#include <aws/common/clock.h>
#include <aws/common/cpuid.h>
#include <aws/common/device_random.h>

#include <inttypes.h>

struct aws_allocator_types {
struct aws_allocator *allocator;
const char *name;
};

struct checksum_profile_run {
void (*profile_run)(struct aws_byte_cursor checksum_this);
const char *name;
};

static void s_runcrc32_sw(struct aws_byte_cursor checksum_this) {
uint32_t crc = aws_checksums_crc32_sw(checksum_this.ptr, (int)checksum_this.len, 0);
(void)crc;
}

static void s_runcrc32(struct aws_byte_cursor checksum_this) {
uint32_t crc = aws_checksums_crc32(checksum_this.ptr, (int)checksum_this.len, 0);
(void)crc;
}

static void s_runcrc32c_sw(struct aws_byte_cursor checksum_this) {
uint32_t crc = aws_checksums_crc32c_sw(checksum_this.ptr, (int)checksum_this.len, 0);
(void)crc;
}

static void s_runcrc32c(struct aws_byte_cursor checksum_this) {
uint32_t crc = aws_checksums_crc32c(checksum_this.ptr, (int)checksum_this.len, 0);
(void)crc;
}

static void s_runcrc64_sw(struct aws_byte_cursor checksum_this) {
uint64_t crc = aws_checksums_crc64nvme_sw(checksum_this.ptr, (int)checksum_this.len, 0);
(void)crc;
}

static void s_runcrc64(struct aws_byte_cursor checksum_this) {
uint64_t crc = aws_checksums_crc64nvme(checksum_this.ptr, (int)checksum_this.len, 0);
(void)crc;
}

int main(void) {

fprintf(stdout, "hw features for this run:\n");
fprintf(stdout, "clmul: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_CLMUL) ? "true" : "false");
fprintf(stdout, "sse4.1: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_SSE_4_1) ? "true" : "false");
fprintf(stdout, "sse4.2: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_SSE_4_2) ? "true" : "false");
fprintf(stdout, "avx2: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_AVX2) ? "true" : "false");
fprintf(stdout, "avx512: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_AVX512) ? "true" : "false");
fprintf(stdout, "arm crc: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_ARM_CRC) ? "true" : "false");
fprintf(stdout, "bmi2: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_BMI2) ? "true" : "false");
fprintf(stdout, "vpclmul: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_VPCLMULQDQ) ? "true" : "false");
fprintf(stdout, "arm pmull: %s\n", aws_cpu_has_feature(AWS_CPU_FEATURE_ARM_PMULL) ? "true" : "false");
fprintf(stdout, "arm crypto: %s\n\n", aws_cpu_has_feature(AWS_CPU_FEATURE_ARM_CRYPTO) ? "true" : "false");

struct aws_allocator_types allocators[2];
allocators[0].allocator = aws_default_allocator();
allocators[0].name = "Default runtime allocator";
allocators[1].allocator = aws_aligned_allocator();
allocators[1].name = "Aligned allocator";

struct checksum_profile_run profile_runs[] = {
{.profile_run = s_runcrc32_sw, .name = "crc32 C only"},
{.profile_run = s_runcrc32, .name = "crc32 with hw optimizations"},
{.profile_run = s_runcrc32c_sw, .name = "crc32c C only"},
{.profile_run = s_runcrc32c, .name = "crc32c with hw optimizations"},
{.profile_run = s_runcrc64_sw, .name = "crc64nvme C only"},
{.profile_run = s_runcrc64, .name = "crc64nvme with hw optimizations"},
};

const size_t allocators_array_size = AWS_ARRAY_SIZE(allocators);
const size_t profile_runs_size = AWS_ARRAY_SIZE(profile_runs);

for (size_t i = 0; i < profile_runs_size; ++i) {
fprintf(stdout, "--------Profile %s---------\n", profile_runs[i].name);

for (size_t j = 0; j < allocators_array_size; ++j) {
fprintf(stdout, "%s\n\n", allocators[j].name);

struct aws_allocator *allocator = allocators[j].allocator;

// get buffer sizes large enough that all the simd code paths get hit hard, but
// also measure the smaller buffer paths since they often can't be optimized as thoroughly.
size_t buffer_sizes[] = {8, 16, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536};
size_t buffer_sizes_len = AWS_ARRAY_SIZE(buffer_sizes);

// warm it up to factor out the cpuid checks:
struct aws_byte_cursor warmup_cur = aws_byte_cursor_from_array(buffer_sizes, buffer_sizes_len);
profile_runs[i].profile_run(warmup_cur);

for (size_t k = 0; k < buffer_sizes_len; ++k) {
struct aws_byte_buf x_bytes;
aws_byte_buf_init(&x_bytes, allocator, buffer_sizes[k]);
aws_device_random_buffer(&x_bytes);
uint64_t start_time = 0;
aws_high_res_clock_get_ticks(&start_time);
profile_runs[i].profile_run(aws_byte_cursor_from_buf(&x_bytes));
uint64_t end_time = 0;
aws_high_res_clock_get_ticks(&end_time);
fprintf(
stdout,
"buffer size %zu (bytes), latency: %" PRIu64 " ns\n",
buffer_sizes[k],
end_time - start_time);
aws_byte_buf_clean_up(&x_bytes);
}
fprintf(stdout, "\n");
}
}
return 0;
}
4 changes: 4 additions & 0 deletions builder.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,9 @@
"downstream": [
{ "name": "aws-c-event-stream" },
{ "name": "aws-c-s3" }
],
"test_steps": [
"test",
"{install_dir}/bin/checksum-profile{exe}"
]
}
14 changes: 12 additions & 2 deletions include/aws/checksums/crc.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,25 @@ AWS_EXTERN_C_BEGIN
* Pass 0 in the previousCrc32 parameter as an initial value unless continuing
* to update a running crc in a subsequent call.
*/
AWS_CHECKSUMS_API uint32_t aws_checksums_crc32(const uint8_t *input, int length, uint32_t previousCrc32);
AWS_CHECKSUMS_API uint32_t aws_checksums_crc32(const uint8_t *input, int length, uint32_t previous_crc32);

/**
* The entry point function to perform a Castagnoli CRC32c (iSCSI) computation.
* Selects a suitable implementation based on hardware capabilities.
* Pass 0 in the previousCrc32 parameter as an initial value unless continuing
* to update a running crc in a subsequent call.
*/
AWS_CHECKSUMS_API uint32_t aws_checksums_crc32c(const uint8_t *input, int length, uint32_t previousCrc32);
AWS_CHECKSUMS_API uint32_t aws_checksums_crc32c(const uint8_t *input, int length, uint32_t previous_crc32c);

/**
* The entry point function to perform a CRC64-NVME (a.k.a. CRC64-Rocksoft) computation.
* Selects a suitable implementation based on hardware capabilities.
* Pass 0 in the previousCrc64 parameter as an initial value unless continuing
* to update a running crc in a subsequent call.
* There are many variants of CRC64 algorithms. This CRC64 variant is bit-reflected (based on
* the non bit-reflected polynomial 0xad93d23594c93659) and inverts the CRC input and output bits.
*/
AWS_CHECKSUMS_API uint64_t aws_checksums_crc64nvme(const uint8_t *input, int length, uint64_t previous_crc64);

AWS_EXTERN_C_END
AWS_POP_SANE_WARNING_LEVEL
Expand Down
Loading