Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature-1659: Initial draft #1680

Conversation

pawellhasa
Copy link
Contributor

What does this PR do?

Initial draft for encryption implementation at serialisation level

Motivation

Provide REST encryption

Related issues

#1659

Additional Notes

This is draft, so I'm looking for suggestions

Checklist

  • I have run the build using mvn clean package command
  • [-] My unit tests cover both failure and success scenarios <- I didn't cover all data types yet

@lvca lvca added the enhancement New feature or request label Aug 1, 2024
@lvca lvca self-requested a review August 1, 2024 16:48
decipher.init(Cipher.DECRYPT_MODE, secretKey, new GCMParameterSpec(tagSize, ivBytes));
return decipher.doFinal(encryptedData);
} catch (Exception e) {
throw new RuntimeException("Error while decrypting data", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about creating a new exception class for encryption, like EncryptionException that extends ArcadeDBException (so it's a RuntimeException)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do like that

if (value == null)
return;
Binary content = dataEncryption != null ? new Binary() : serialized;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to create a new Binary() object in the case of encryption content?

case BinaryTypes.TYPE_RID:
break;
default:
serialized.putBytes(dataEncryption.encrypt(content.toByteArray()));
Copy link
Contributor

@lvca lvca Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you're encrypting every single value. Why don't encrypting the whole Binary at the end, once all the values are serialized? This would be much faster and you don't need to differentiate special cases like RIDs. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll have a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking inside the code to find best place to do it, as Binary is referred in plenty of places, and it seemed that interactions with BaseRecord#buffer is what I am looking for. That however also is fairly nested inside many classes meaning it is difficult to intercept right moment to perform write/read as it gets passed around and often buffer is accessed directly to perform partial read.

In the end, I chosen implementing encryption at BinarySerializer in serializeProperties(), deserializeProperties(), deserializeProperty and hasProperty(). Idea is to write just all properties as encrypted at once and vice-versa. The issue I ran into is that serializeProperties() writes to two types: header and content. Header stores information about propertiesCount and about each propertyKey with its bufferPosition for the value. This allows reading just the value from whole buffer, and not all properties to find one. However this isn't possible with the goal above. I have to either re-factor the code to read all properties from buffer to decode, regardless of looking for specific one and then filter by key, or leave existing behaviour.

I could also re-factor structure of BaseRecord#binary to distinct between meta content (like ID) and data (encrypted) but that could be even more re-factoring, and it wouldn't resolve cherry picking properties anyway.

I understand that current implementation with encryption of each value adds performance cost, I didn't measure how this would affect our product yet, but it is mandatory requirement for us anyway.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this review. It makes sense, it's better to avoid encrypting the metadata for fast accessing to the properties you're requesting. Also, this makes easier to encrypt single property by adding some configuration (in the Schema -> Type -> Property)

@lvca lvca added this to the 24.8.1 milestone Sep 5, 2024
@lvca
Copy link
Contributor

lvca commented Sep 5, 2024

Ok, I'm doing some tests before merging it. I'd like to provide a way to define encryption without using Java so it would work also in a standard client/server configuration.

@gramian WDYT about it?

@lvca lvca merged commit 15edca8 into ArcadeData:main Sep 5, 2024
6 of 7 checks passed
@lvca
Copy link
Contributor

lvca commented Sep 5, 2024

Thanks, @pawellhasa for your contribution!

@gramian
Copy link
Collaborator

gramian commented Sep 5, 2024

I think the feature is useful. I have some questions:

  • Should there be some kind of performance testing to have at least rough idea about the compute cost? I mean something like a simple CRUD test of enough records with and without encryption.
  • I didn't' see any changes to the settings (maybe I overlooked them). How is encryption controlled just by using it with the embedded server?

@lvca
Copy link
Contributor

lvca commented Sep 5, 2024

To set the encryption now you have to call this API before using the database (right after open):

database.setDataEncryption(DefaultDataEncryption.useDefaults(DefaultDataEncryption.getSecretKeyFromPasswordUsingDefaults(password, salt)));

Good idea about a micro benchmark to see what's the cost of writing and reading with the default settings.

Also, if we provide a new boolean property encrypt (default=false) at both property and type levels, we can define which type/property to encrypt:

  • if encrypt is true on the type, then all the properties are encrypted
  • if encrypt is true on a property, then that property is encrypted

@pawellhasa WDYT?

@pawellhasa
Copy link
Contributor Author

Selecting which properties to encrypt can help avoiding encryption computing cost. I am away so won't be able to properly think about it till next week. I'll find out impact on our end and whether we would consider encryption for just part of data worth it.

@lvca
Copy link
Contributor

lvca commented Sep 9, 2024

NP. I prefer to complete this part and be done in terms of API (and SQL) before updating the docs = other users start using it.

@pawellhasa
Copy link
Contributor Author

Atm we don't want to exclude properties from encrypting in our product, if something changes, we'll discuss it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants