Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Use spans to solve a number of memory safety issues #4148

Merged
merged 2 commits into from
Mar 2, 2024

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Feb 10, 2024

  • exif.cpp: use spans to avoid several possible buffer overruns
  • paramlist_test.cpp: use spans to be sure we're ok on memory bounds
  • jpeg: use spans in jpeg icc profile manipulation for memory safety
  • psd output: switch cmyk conversion to use spans for memory safety

I've been cobbling these together bit by bit over the last several weeks, tracking down the memory safety issues identified by the Sonar static analysis. I think that mostly they are not real bugs, but the code is confusing enough that it's hard to verify. This refactor makes it safer, and even if the code was previously correct but merely hard to analyze, the bounds checking in span (for debug mode) gives the static analyzer the clues it needs to know this is safe.

As further explanation, these situations generally involve something that looks like this, schematically:

caller () {
    T buffer[known_size];
    call function(buffer, ...params...);
}

function(T* buffer, ...params...) {
    i, j = ...very complex things, hard to analyze...
    foo = buffer[i];
    buffer[j] = bar;
    // Did we do a buffer overrun??? ¯\_(ツ)_/¯
    // This function doesn't necessarily have knowldege of the
    // true bounds, even if it wanted to check every access.
}

By rewriting this function as follows:

function(span<T> buffer) {
    // all indexed access to buffer will be bounds checked in debug
}

we are passing the bounds as well, and at least for debug builds, are also checking bounds with an assertion on every indexed access into buffer, without the function needing a lot of clutter. Fire and forget.

* exif.cpp: use spans to avoid several possible buffer overruns
* paramlist_test.cpp: use spans to be sure we're ok on memory bounds
* jpeg: use spans in jpeg icc profile manipulation for memory safety
* psd output: switch cmyk conversion to use spans for memory safety

I've been cobbling these together bit by bit over the last several
weeks, tracking down the memory safety issues identified by the Sonar
static analysis. I think that *mostly* they are not real bugs, but the
code is confusing enough that it's hard to verify. This refactor makes
it safer, and even if the code was previously correct but merely hard
to analyze, the bounds checking in span (for debug mode) gives the
static analyzer the clues it needs to know this is safe.

As further explanation, these situations generally involve something
that looks like this, schematically:

    caller () {
        T buffer[known_size];
        call function(buffer, ...params...);
    }

    function(T* buffer, ...params) {
        i, j = ...very complex things, hard to analyze...
        foo = buffer[i];
        buffer[j] = bar;
        // Did we do a buffer overrun??? ¯\_(ツ)_/¯
        // This function doesn't necessarily have knowldege of the
        // explicit bounds, even if it wanted to check every access.
    }

By rewriting this idiom as follows:

    function(span<T> buffer) {
        // buffer carries its bounds, and every access will be
        // checked in debug builds.
    }

we are passing the bounds as well, and at least for debug builds, are
also checking bounds with an assertion on every indexed access into
buffer, without the function needing a lot of clutter. Fire and forget.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
@lgritz
Copy link
Collaborator Author

lgritz commented Feb 20, 2024

Any objections?

Copy link
Collaborator

@ThiagoIze ThiagoIze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the span usage. Just a couple questions for you.

if (swab)
swap_endian(d.data(), d.size());
spec.attribute(name, type, d.data());
cspan<uint8_t> dspan
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is dspan 8-bit when it's given a 16-bit span and dswab uses it as if it was 16-bit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's in the middle of some byte buffer, they aren't actually necessarily aligned on 16-bit boundaries. We're just copying things around, a pile of bytes is ok.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some comments to clarify.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with better comments.

Comment on lines +650 to +651
std::vector<uint16_t> dswab((const uint16_t*)dspan.begin(),
(const uint16_t*)dspan.end());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making a copy of dspan when previously no copy was used. Is that acceptable time and memory overhead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's the opposite. Previously, it was ALWAYS making a copy into a vector (a few lines above). Now it only does so if it needs to mutate the data by byteswapping it. Now when there is no swap needed, there is no copy.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
Copy link
Collaborator

@ThiagoIze ThiagoIze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@lgritz lgritz merged commit 758b5a7 into AcademySoftwareFoundation:master Mar 2, 2024
24 of 26 checks passed
@lgritz lgritz deleted the lg-spans branch March 2, 2024 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants