Skip to content

Commit

Permalink
@custom_serialize / @custom_deserialize DSL (#234)
Browse files Browse the repository at this point in the history
* @custom_serialize / @custom_deserialize DSL

e.g. `; @custom_serialize write_hex_bytes`

For specifying externally-provided functions for arbitrary
encodings/cbor details for (de)serialization.

Allowed at both the type-level (affecting everywhere it's used) or at
the field-level (overrides type-level if present).

Example use-case: CML's PlutusData's Bytes (and BigInt)  variant doesn't use arbitrary
CBOR bytes strings but instead follows a specific chunking format. We
used to hand code this but now we can just put this in the DSL.

This is particularily useful for people generating plutus-datum-based
CDDLs. It could also be used to allow for utf8 text (rust API) to be
(de)serialized to bytes to be encodable as a datum.

TODO:
[ ] tests for preserve-encodings
[ ] tests for tagged/otherwise extra encoding details over top of this

* preserve-encodings=true tests

* docs + more test cases + misc fixes

* clarify where the custom_serialize string comes from
  • Loading branch information
rooooooooob authored May 18, 2024
1 parent 6688c82 commit 200f1fe
Show file tree
Hide file tree
Showing 13 changed files with 2,958 additions and 2,117 deletions.
57 changes: 57 additions & 0 deletions docs/docs/comment_dsl.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,63 @@ foo = uint ; @newtype @custom_json

Avoids generating and/or deriving json-related traits under the assumption that the user will supply their own implementation to be used in the generated library.

## @custom_serialize / @custom_deserialize

```cddl
custom_bytes = bytes ; @custom_serialize custom_serialize_bytes @custom_deserialize custom_deserialize_bytes
struct_with_custom_serialization = [
custom_bytes,
field: bytes, ; @custom_serialize custom_serialize_bytes @custom_deserialize custom_deserialize_bytes
overridden: custom_bytes, ; @custom_serialize write_hex_string @custom_deserialize read_hex_string
tagged1: #6.9(custom_bytes),
tagged2: #6.9(uint), ; @custom_serialize write_tagged_uint_str @custom_deserialize read_tagged_uint_str
]
```

This allows the overriding of serialization and/or deserialization for when a specific format must be maintained. This works even with primitives where _CDDL_CODEGEN_EXTERN_TYPE_ would require making a wrapper type to use.

The string after `@custom_serialize`/`@custom_deserialize` will be directly called as a function in place of regular serialization/deserialization code. As such it must either be specified using fully qualified paths e.g. `@custom_serialize crate::utils::custom_serialize_function`, or post-generation it will need to be imported into the serialization code by hand e.g. adding `import crate::utils::custom_serialize_function;`.

With `--preserve-encodings=true` the encoding variables must be passed in in the order they are used in cddl-codegen with regular serialization. They are passed in as `Option<cbor_event::Sz>` for integers/tags, `LenEncoding` for lengths and `StringEncoding` for text/bytes. These are the same types as are stored in the `*Encoding` structs generated. The same must be returned for deserialization. When there are no encoding variables the deserialized value should be directly returned, and if not a tuple with the value and its encoding variables should be returned.

There are two ways to use this comment DSL:

* Type level: e.g. `custom_bytes`. This will replace the (de)serialization everywhere you use this type.
* Field level: e.g. `struct_with_custom_serialization.field`. This will entirely replace the (de)serialization logic for the entire field, including other encoding operations like tags, `.cbor`, etc.

Example function signatures for `--preserve-encodings=false` for `custom_serialize_bytes` / `custom_deserialize_bytes` above:

```rust
pub fn custom_serialize_bytes<'se, W: std::io::Write>(
serializer: &'se mut cbor_event::se::Serializer<W>,
bytes: &[u8],
) -> cbor_event::Result<&'se mut cbor_event::se::Serializer<W>>

pub fn custom_deserialize_bytes<R: std::io::BufRead + std::io::Seek>(
raw: &mut cbor_event::de::Deserializer<R>,
) -> Result<Vec<u8>, DeserializeError>
```

Example function signatures for `--preserve-encodings=true` for `write_tagged_uint_str` / `read_tagged_uint_str` above:

```rust
pub fn write_tagged_uint_str<'se, W: std::io::Write>(
serializer: &'se mut cbor_event::se::Serializer<W>,
uint: &u64,
tag_encoding: Option<cbor_event::Sz>,
text_encoding: Option<cbor_event::Sz>,
) -> cbor_event::Result<&'se mut cbor_event::se::Serializer<W>>

pub fn read_tagged_uint_str<R: std::io::BufRead + std::io::Seek>(
raw: &mut cbor_event::de::Deserializer<R>,
) -> Result<(u64, Option<cbor_event::Sz>, Option<cbor_event::Sz>), DeserializeError>
```

Note that as this is at the field-level it must handle the tag as well as the `uint`.

For more examples see `tests/custom_serialization` (used in the `core` and `core_no_wasm` tests) and `tests/custom_serialization_preserve` (used in the `preserve-encodings` test).

## _CDDL_CODEGEN_EXTERN_TYPE_

While not as a comment, this allows you to compose in hand-written structs into a cddl spec.
Expand Down
134 changes: 133 additions & 1 deletion src/comment_ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,15 @@ use nom::{
IResult,
};

#[derive(Default, Debug, PartialEq)]
#[derive(Clone, Default, Debug, PartialEq)]
pub struct RuleMetadata {
pub name: Option<String>,
pub is_newtype: bool,
pub no_alias: bool,
pub used_as_key: bool,
pub custom_json: bool,
pub custom_serialize: Option<String>,
pub custom_deserialize: Option<String>,
}

pub fn merge_metadata(r1: &RuleMetadata, r2: &RuleMetadata) -> RuleMetadata {
Expand All @@ -28,6 +30,29 @@ pub fn merge_metadata(r1: &RuleMetadata, r2: &RuleMetadata) -> RuleMetadata {
no_alias: r1.no_alias || r2.no_alias,
used_as_key: r1.used_as_key || r2.used_as_key,
custom_json: r1.custom_json || r2.custom_json,
custom_serialize: match (r1.custom_serialize.as_ref(), r2.custom_serialize.as_ref()) {
(Some(val1), Some(val2)) => {
panic!(
"Key \"custom_serialize\" specified twice: {:?} {:?}",
val1, val2
)
}
(val @ Some(_), _) => val.cloned(),
(_, val) => val.cloned(),
},
custom_deserialize: match (
r1.custom_deserialize.as_ref(),
r2.custom_deserialize.as_ref(),
) {
(Some(val1), Some(val2)) => {
panic!(
"Key \"custom_deserialize\" specified twice: {:?} {:?}",
val1, val2
)
}
(val @ Some(_), _) => val.cloned(),
(_, val) => val.cloned(),
},
};
merged.verify();
merged
Expand All @@ -39,6 +64,8 @@ enum ParseResult {
DontGenAlias,
UsedAsKey,
CustomJson,
CustomSerialize(String),
CustomDeserialize(String),
}

impl RuleMetadata {
Expand Down Expand Up @@ -67,6 +94,32 @@ impl RuleMetadata {
ParseResult::CustomJson => {
base.custom_json = true;
}
ParseResult::CustomSerialize(custom_serialize) => {
match base.custom_serialize.as_ref() {
Some(old) => {
panic!(
"Key \"custom_serialize\" specified twice: {:?} {:?}",
old, custom_serialize
)
}
None => {
base.custom_serialize = Some(custom_serialize.to_string());
}
}
}
ParseResult::CustomDeserialize(custom_deserialize) => {
match base.custom_deserialize.as_ref() {
Some(old) => {
panic!(
"Key \"custom_deserialize\" specified twice: {:?} {:?}",
old, custom_deserialize
)
}
None => {
base.custom_deserialize = Some(custom_deserialize.to_string());
}
}
}
}
}
base.verify();
Expand Down Expand Up @@ -113,6 +166,28 @@ fn tag_custom_json(input: &str) -> IResult<&str, ParseResult> {
Ok((input, ParseResult::CustomJson))
}

fn tag_custom_serialize(input: &str) -> IResult<&str, ParseResult> {
let (input, _) = tag("@custom_serialize")(input)?;
let (input, _) = take_while(char::is_whitespace)(input)?;
let (input, custom_serialize) = take_while1(|ch| !char::is_whitespace(ch))(input)?;

Ok((
input,
ParseResult::CustomSerialize(custom_serialize.to_string()),
))
}

fn tag_custom_deserialize(input: &str) -> IResult<&str, ParseResult> {
let (input, _) = tag("@custom_deserialize")(input)?;
let (input, _) = take_while(char::is_whitespace)(input)?;
let (input, custom_deserialize) = take_while1(|ch| !char::is_whitespace(ch))(input)?;

Ok((
input,
ParseResult::CustomDeserialize(custom_deserialize.to_string()),
))
}

fn whitespace_then_tag(input: &str) -> IResult<&str, ParseResult> {
let (input, _) = take_while(char::is_whitespace)(input)?;
let (input, result) = alt((
Expand All @@ -121,6 +196,8 @@ fn whitespace_then_tag(input: &str) -> IResult<&str, ParseResult> {
tag_no_alias,
tag_used_as_key,
tag_custom_json,
tag_custom_serialize,
tag_custom_deserialize,
))(input)?;

Ok((input, result))
Expand Down Expand Up @@ -163,6 +240,8 @@ fn parse_comment_name() {
no_alias: false,
used_as_key: false,
custom_json: false,
custom_serialize: None,
custom_deserialize: None,
}
))
);
Expand All @@ -180,6 +259,8 @@ fn parse_comment_newtype() {
no_alias: false,
used_as_key: false,
custom_json: false,
custom_serialize: None,
custom_deserialize: None,
}
))
);
Expand All @@ -197,6 +278,8 @@ fn parse_comment_newtype_and_name() {
no_alias: false,
used_as_key: false,
custom_json: false,
custom_serialize: None,
custom_deserialize: None,
}
))
);
Expand All @@ -214,6 +297,8 @@ fn parse_comment_newtype_and_name_and_used_as_key() {
no_alias: false,
used_as_key: true,
custom_json: false,
custom_serialize: None,
custom_deserialize: None,
}
))
);
Expand All @@ -231,6 +316,8 @@ fn parse_comment_used_as_key() {
no_alias: false,
used_as_key: true,
custom_json: false,
custom_serialize: None,
custom_deserialize: None,
}
))
);
Expand All @@ -248,6 +335,8 @@ fn parse_comment_newtype_and_name_inverse() {
no_alias: false,
used_as_key: false,
custom_json: false,
custom_serialize: None,
custom_deserialize: None,
}
))
);
Expand All @@ -265,6 +354,8 @@ fn parse_comment_name_noalias() {
no_alias: true,
used_as_key: false,
custom_json: false,
custom_serialize: None,
custom_deserialize: None,
}
))
);
Expand All @@ -282,6 +373,8 @@ fn parse_comment_newtype_and_custom_json() {
no_alias: false,
used_as_key: false,
custom_json: true,
custom_serialize: None,
custom_deserialize: None,
}
))
);
Expand All @@ -292,3 +385,42 @@ fn parse_comment_newtype_and_custom_json() {
fn parse_comment_noalias_newtype() {
let _ = rule_metadata("@no_alias @newtype");
}

#[test]
fn parse_comment_custom_serialize_deserialize() {
assert_eq!(
rule_metadata("@custom_serialize foo @custom_deserialize bar"),
Ok((
"",
RuleMetadata {
name: None,
is_newtype: false,
no_alias: false,
used_as_key: false,
custom_json: false,
custom_serialize: Some("foo".to_string()),
custom_deserialize: Some("bar".to_string()),
}
))
);
}

// can't have all since @no_alias and @newtype are mutually exclusive
#[test]
fn parse_comment_all_except_no_alias() {
assert_eq!(
rule_metadata("@newtype @name baz @custom_serialize foo @custom_deserialize bar @used_as_key @custom_json"),
Ok((
"",
RuleMetadata {
name: Some("baz".to_string()),
is_newtype: true,
no_alias: false,
used_as_key: true,
custom_json: true,
custom_serialize: Some("foo".to_string()),
custom_deserialize: Some("bar".to_string()),
}
))
);
}
Loading

0 comments on commit 200f1fe

Please sign in to comment.