Unified data syntax #365

FWDekker · 2021-01-05T17:42:36Z

A unified data syntax (UDS) allows expressing any sort of random data (strings, integers, words, phone numbers, coordinates, arrays, etc.). The existing schemes for any data can be converted into a UDS string, which can then be parsed to create random data. This is somewhat similar to "inverting" a regex (i.e. generating a random string that matches a regex), except that the regex syntax is not well-suited for this problem (no native support for ranges, hard to read, dictionaries result in very long strings).

Instead of giving each data action a generateStrings method, each scheme can be compiled to a UDS string which has the #random(Random) method to instantiate a random string that matches the UDS.

This system will pave the way to creating arbitrary new data types, and in particular will allow users to create their own data types. It will also make it easier to add many presets that do not necessarily match any particular data type, but without cluttering any menus. Note that this issue is not about implementing new data types; the goal here is only to introduce the UDS.

Currently, I envision UDS to be similar to the string representations of Kotlin data classes, except that default values can be left out. For example, to create an integer in the range [0, 15], the UDS would be Int[min=0, max=15]. To generate two integers in different ranges with a dash in between would then be Int[min=0, max=15]-Int[min=5, max=15]. Naturally, it should be possible to escape this syntax to allow users to insert literally the string Int[min=0, max=15]; and spaces should be ignored (except in strings).
Other native types should be Dec (decimal), Str (string), Word (word), Time (to insert current time?), maybe UUID if this one cannot be represented with the other types.

I should start with a basic syntax. After this issue, no new functionalities are directly exposed to users; this issue does not yet give users the ability to insert new data types.

The text was updated successfully, but these errors were encountered:

solonovamax · 2021-03-23T16:37:00Z

(crosspost from #305, but went more in-depth on my ideas + added some stuff.)

Here are some ideas for the UDS type:

make it so there only exists the UDS type. All the other types could be made using the UDS.

And here are some ideas for UDS syntax:
rather than using the (more ambiguous) Int[key=value] syntax, you could use %{name} or ${name} (And perhaps you can omit the {} if you're only using [a-zA-Z] characters and it is followed by a non-[a-zA-Z] character.), and then you'd have a separate menu to add, remove, and configure the different "substitutions" you have access to in a specific type. So if you wanted t make a UUID type, you'd do so like this:

Substitutions:
- 4 Digit Hex
  - Name: 4Hex
  - Base: 16
  - Minimum Value: 0
  - Maximum Value: 65535
  - Padding Character: 0
  - Minimum Characters: 4 // this here means that they must be at least 4 characters long, and will be padded with 0s.
  - Capitalization: UPPER
Template:
```
%{4Hex}%{4Hex}-%{4Hex}-%{4Hex}-%{4Hex}%{4Hex}%{4Hex}
```
This here uses a (theoretical) template language to add items in a loop. This would be highly extensible and allow for users to define things like arrays with a variable number of elements and complex logic for simplification of types, instead of just doing something like new int[]{%I, %I, %I, %I, %I, %I, %I, %I, %I, %I, %I, %I, %I, %I, %I, %I}.

Here is pretty much how (I think) it should work:

You can define different "substitutions", which are injected into the template. (In the code, you could have them defined as functions which call, for example, java.util.Random.
- each substitution has settings for what is currently in the "Integer", "Words", and "String" types, along with some new "Time" (and possibly other types?)
  Here are the different config values for each: (+ there's also a "Name" value which is the name it uses in the template)
  - Integer
    - Minimum: Int from LONG.MIN_VALUE to LONG.MAX_VALUE.
    - Maximum: Int from LONG.MIN_VALUE to LONG.MAX_VALUE.
    - Base: Int from 2 to 36. Acts like the existing "Base" config.
    - Grouping Separator: an input field for any single character. (or you can just select a preset value from the list: ., ,, _.)
    - Capitalization: select from "lower", "UPPER", "RanDoM", "Sentence", or "First Letter". ("First Letter" is the same as "Sentence")
    - Padding Character: any single character.
    - Minimum Characters: int from 0 to Int.MAX_VALUE. (Maybe soft constrain it to 2^8 or smth?), this signifies the minimum number of characters a number may have. If it has less than this, it is padded with the padding character.
  - String
    - Minimum Length: Int from LONG.MIN_VALUE to LONG.MAX_VALUE.
    - Maximum Length: Int from LONG.MIN_VALUE to LONG.MAX_VALUE.
    - Capitalization: select from "lower", "UPPER", "RanDoM", "Sentence", or "First Letter". ("First Letter" is the same as "Sentence" unless you have spaces enabled.)
    - Symbol Sets: the symbol sets menu that already exists is extensible enough.
    - Exclude Lookalike: true/false.
  - Words
    - Minimum Length: Int from LONG.MIN_VALUE to LONG.MAX_VALUE.
    - Maximum Length: Int from LONG.MIN_VALUE to LONG.MAX_VALUE.
    - Capitalization: select from "lower", "UPPER", "RanDoM", "Sentence", or "First Letter". ("First Letter" is the same as "Sentence" unless you have spaces enabled.)
    - Word Selection: (There's a tickbox for each item in this list. If you select it, it enables that. So the default string would have "Dictionary" ticked. But if you wanted to make a string that was always "yes", "no", or "maybe", then you'd use "Words")
      - Dictionaries: The dictionary selector that currently exists. (Though, if you could also integrate with IntelliJ's "saved words" dictionary that exists per-project and application-wide, that might be nice too. Also, maybe bundle/have available for download some dictionaries for a few other languages?)
      - Words: Basically, it's like the "File → Settings → Editor > Proofreading > Spelling → Accepted Words" menu. You can just add/remove words, and then all the words in the list are used.
      - (Optionally) some way of adding weights to the words in the dictionary and word list would be cool, but lower priority. (People wouldn't really use this as much)
  - Time
    - Format: this is just an input field that can have any text you want in it. It formats it like the date linux command line utility. So, for example, %T could produce 11:50:14, +%A could produce Tuesday, +%s could produce 1616514687, %D could produce 03/23/21, %Y-%m-%d could produce 2021-03-23, %A %B %d %T %y could produce Tuesday March 23 11:53:38 21, etc.
      This here is really just a convenience utility, since it's effectively syntactic sugar. If you really wanted to, you'd be able to recreate this entirely from scratch, using only a bunch of String and Integer types for a template. (String can be used for the days/months, Integer for everything else.)
- Then, once you have the different "substitutions" defined, you can write out a template into an input field. Templates are written in a template language that has access to the substitutions you defined earlier. (If a substitution occurs multiple times, each time is a unique value. So %I %I (where I is the name of an integer from 0 to 99) could produce: 83 54. Or maybe have a setting to disable/enable this?)
- Also, you can use more complex functions in a template. Like for loops. Why? Because I want to make writing arrays more bearable. (The syntax I propose could look like this: %{for number in list}{repition}, where number is the name of the variable used as index/iteration item in the loop, list is a list, and repition is what is repeated. So if you wanted to print the numbers from 1 to 5, you'd do it like this: %{for i in 1..5}{%number }, and that would append the text %number to itself 5 times, where each time %number is one higher. So, 1 2 3 4 5 would be produced. (Trailing space because I cba to do anything more complex.))
  
  And here's a few more types that aren't needed, but sound cool to have (I mean, why not? At this point, this entire plugin is "I want to show off that us programmers can do really cool and fancy things") (not going to list config value ideas for this because I cba)
  - Gaussian (generates a number using a gaussian distribution)
  - Binomial Distribution (generates a number using a binomial distribution)
  - Just open up this Wikipedia page, roll a dice, and pick a few.
  - Random Name Generator? Idk, could be cool for generating random project names.
  - Random Noise Generation? This could be interesting. eg: generate an array of random noise values.

At this point it's basically just flexing lol.
But using those 4 different types, you can make the following types (as a substitution):

UUID
Decimal
Hex Color
Java Color Object
Longs/Ints in Java
Random Booleans
Arrays

And pretty much anything else you could possible think of.

Here's a few definitions for different types:

Arrays (using integer arrays in this example)

Substitutions:
- Integer
  - Name: I
  - Base: 10
  - Minimum Value: 0
  - Maximum Value: 99
  - Padding Character: 0
  - Minimum Characters: 0

Template:

new int[]{ %I %{for number in 1..15}{, %I} }

Decimal

Substitutions:
- Integer
  - Name: Pre
  - Base: 10
  - Minimum Value: 0
  - Maximum Value: 9999
  - Padding Character: 0
  - Minimum Characters: 4
- Integer
  - Name: Post
  - Base: 10
  - Minimum Value: 0
  - Maximum Value: 9999
  - Padding Character: 0
  - Minimum Characters: 0
Template:
```
%Pre.%Post
```

Hex Color

Substitutions:
- Integer
  - Name: H
  - Base: 16
  - Minimum Value: 0
  - Maximum Value: 16777215
  - Padding Character: 0
  - Minimum Characters: 6
Template:
```
#%H
```

Java Color Object

Substitutions:
- Integer
  - Name: c
  - Base: 16
  - Minimum Value: 0
  - Maximum Value: 255
  - Padding Character: 0
  - Minimum Characters: 0
Template:
```
new Color(0x%c, 0x%c, 0x%c);
```

Random Booleans

Substitutions:
- Words
  - Name: b
  - Minimum Length: 0
  - Maximum Length: 99
  - Capitalization: lower
    - Words:
      - true
      - false
Template:
```
%b
```

FWDekker · 2021-03-23T17:09:09Z

I think the core difference in your proposal is that substitution definitions are separated from the templates, and substitutions cannot be parameterised. (Oh, and you suggest the use of for loops, which I had not completely thought out yet in my design.)

I think that the main disadvantage of having users create substitutions is that while it is less verbose, it is more difficult to understand because users first have to define all the substitutions they want to use, and then insert those into the template. With my proposal, the substitution definitions are inlined into the template. This may result in longer template strings, but this issue can be resolved by allowing users to use templates in other templates, using loops, or using other constructs.

Whether the syntax is %asdf or Asdf() isn't that important, but I think that the number of concepts that the user has to understand should be kept to a minimum. I prefer using only templates that are based on the "primitive" types such as integers, strings, time, probability distributions, etc. Usability and simplicity should be paramount in the design of the plugin.

solonovamax · 2021-03-23T22:27:56Z

I just feel it might look quite a bit messier. Also, by default, the plugin would come with a whole bunch of different things already set up, which the user can then fine-tune.

For example:

Int[base=16, min=0, max=16777215, paddingChar=0, minChars=6]

is how you'd write the Hex Color template.
or

Int[min=0, max=9999].Int[min=0, max=9999, paddingChar=0, minChars=4]

is how you'd write the Decimal template.

So I feel like it can get quite messy if you have a few variables or many parameters. But in the end, it's up to you. (Also, I feel like it's actually easier than having these magic key-value pairs that you have no docs for. Whereas, it'd be comparatively simpler to explain how the substitution works to someone.)

As for %asdf vs Asdf(), I definitely feel like a control character is important. Because you need some way to escape functions, or else what happens when you want to write the literal text Asdf()? Whereas with %asdf, you can do %%asdf (2×% escapes the percent sign) to write it literally. You can always use $ (escaped with $$), \ (like latex) (escaped with \\), {[your command here]} (escaped with {{[your text here]}}), or really any other character, but I think a control character is necessary. Also, I proposed optionally delimiting longer commands with { and } for a reason: what happens if you want to have a command directly beside a string? If you had %IL where I was the command and L was the string, how do you figure that out? It's much easier to just say "do %{I}L" in those scenarios than to do the more complex (and in some scenarios impossible) parsing route.

FWDekker · 2021-03-24T12:24:24Z

I feel like it's actually easier than having these magic key-value pairs that you have no docs for.

That's a pretty good argument. To alleviate that issue, I think there could be a small wizard of sorts. No need to manage the defined types, but still an easy interface to build types with. I'll have to look into that. Either way, I'm going to experiment with the solution before it's rolled out, nothing is final yet.

FWDekker added the enhancement Improvement of existing feature label Jan 5, 2021

FWDekker added this to the v3.0.0 milestone Jan 5, 2021

FWDekker self-assigned this Jan 5, 2021

This was referenced Jan 5, 2021

UDS dialog #366

Closed

Add new data types #305

Closed

This was referenced Aug 3, 2021

Basic unified data syntax implementation #388

Merged

Template dialog #391

Merged

FWDekker mentioned this issue Aug 27, 2021

Custom data types, UDS, and templates #400

Merged

FWDekker closed this as completed in #400 Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified data syntax #365

Unified data syntax #365

FWDekker commented Jan 5, 2021

solonovamax commented Mar 23, 2021

FWDekker commented Mar 23, 2021

solonovamax commented Mar 23, 2021

FWDekker commented Mar 24, 2021

Unified data syntax #365

Unified data syntax #365

Comments

FWDekker commented Jan 5, 2021

solonovamax commented Mar 23, 2021

Arrays (using integer arrays in this example)

Decimal

Hex Color

Java Color Object

Random Booleans

FWDekker commented Mar 23, 2021

solonovamax commented Mar 23, 2021

FWDekker commented Mar 24, 2021