Process XOR/ROL optimisations #28

arekbulski · 2018-04-01T15:47:44Z

Effectuates:
kaitai-io/kaitai_struct#390
kaitai-io/kaitai_struct#397

I ran the test suite but please confirm it. Also I leave running the benchmarks to you, because I am not able to run those.

KOLANICH · 2018-04-01T20:43:20Z

kaitaistruct.py

        if PY2:
            return bytes(bytearray(v ^ key for v in bytearray(data)))
        else:
            return bytes(v ^ key for v in data)

    @staticmethod
    def process_xor_many(data, key):
+        if key == b'\x00' * len(key):


Overhead on every call. I guess unneeded - comparing every byte of one array to each byte of another array ( where each byte is zero) has the same complexity as xoring 2 arrays.

If the key is few bytes and data is millions of bytes, then this check has negligible cost. However... you are right. It might be better to use something like not any(key) which would not read entire key but only up to first non-zero byte.

You are also right. So maybe check 2 sizes and take decision what to do based on them?

if len(key) < len(data)//someCoefficientDependingOnTheProbabiltyToMeetZeroAndDistributionOfKeysLengths

?

KOLANICH · 2018-04-01T20:51:29Z

kaitaistruct.py

        if PY2:
            return bytes(bytearray(a ^ b for a, b in zip(bytearray(data), itertools.cycle(bytearray(key)))))
        else:
            return bytes(a ^ b for a, b in zip(data, itertools.cycle(key)))

+    precomputed_rotations = {(amount,1):[(i << amount) & 0xff | (i >> (-amount & 7)) for i in range(256)] for amount in range(9)}


Not sure that memory lookup (especially in interpreted fully object-oriented language, so I guess CPU cache helps little) is faster than bit shift. Could you provide some benchmark results?

I am unable to run a benchmark.

I also need to withdraw one of the commits.

I am unable to run a benchmark.

there is no need to run kaitai benchmark, you can run just an own one

arekbulski · 2018-04-02T20:15:26Z

Benchmarks as requested:

>>> from timeit import *
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
12.807483946000048
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
13.325101311000026
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.117929144000072
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.115412523000032
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.979522146999898
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.68503412799987
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24798052300002
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24846534499989

arekbulski · 2018-04-02T20:18:31Z

All suggestions from @KOLANICH included and resolved. Ready to merge.

KOLANICH · 2018-04-02T20:46:47Z

64.24798052300002

It's a shame that python doesn't have >>> and <<< operators since both rotations are implemented in hardware (more concrete: in both x86 and arm) since ancient times and we have to use these hacks. Could you create an issue in their bug tracker about it?

arekbulski · 2018-04-02T20:57:56Z

I would reach to python-ideas mailing list. The guys there have authoritative knowledge about such things, and they may say that adding those operators might not happen or was already rejected. I will ask them.

OTOH, that would not affect us. Since the runtime needs to support older Python runtimes (including 2.7), even if they would add >>> and <<< we could not use it.

KOLANICH · 2018-04-02T21:28:57Z

Since the runtime needs to support older Python runtimes (including 2.7), even if they would add >>> and <<< we could not use it.

I wonder if it is possible to extend python syntax with a native module. Or at least make some builtin functions without much overhead.

arekbulski · 2018-04-04T10:53:07Z

I added some helper code that would allow me to not maintain 2 code paths.

arekbulski · 2018-04-04T21:16:48Z

Finished implementing the rotation, including larger groups.

KOLANICH · 2018-04-04T21:40:34Z

kaitaistruct.py

-            return bytes(a ^ b for a, b in zip(data, itertools.cycle(key)))
+        if len(key) == 1:
+            return KaitaiStream.process_xor_one(data, ord(key))
+        if len(key) <= 64 and key == b'\x00' * len(key):


why 64? Why not, for example a quarter of message length?

Its to make it O(1) time, in case both key and data was megabyte sized. If so then it skips the check and just does the xoring.

Its to make it O(1) time, in case both key and data was megabyte sized. If so then it skips the check and just does the xoring.

Again, IMHO it is waste of resources, I assume xoring to zero string as improbable, in most cases it is waste of resources.

let we have the probability of the zero array is p, N is the bit-length of data, K is the bit-length of key.

the complexity in the case of zero is
O(K)
(in fact 2*K because we create an array and then compare it)

otherwise

O(K+N)

So the average complexity of zero-checking case (we check key for zero) is

c=K+(1-p)*N

And the average complexity of non-checking case is always N.

So we need to select by the clause
K+(1-p)*N<N

K<p*N

Now let's model p.
If each bit of a key is randomly chosen from {1, 0} then
p = 2^-K

Please read the reply below and withdraw this one review.

arekbulski · 2018-04-05T12:07:15Z

@KOLANICH Please stop right there. What you are doing is a pointless mathematical exercise, not proving anything. You are making huge implicit assumptions that arent justified, and overall doing it wrong.

For one, you are assuming that keys follow uniform distribution which I think is wrong. You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short. Thats also probably wrong. In my personal experience, programmers tend to use very short keys, few bytes at most. It seems unlikely to find kilobyte-sized keys in real formats, just due to how software deverlopers think and work. Its also somewhat likely that all-zero keys can be found in real usage. Just because the format includes a possibility of xoring some data, it doesnt mean that is actually and actively being used. Programmers often tend to use zero keys when they dont have a particular need to use a different one. A safe default, so to speak. Note that I am not presenting those conjectures to justify my code, but rather to disprove your argument. Your argument resembles this kind of thinking: we have some real-world numbers, and digits are taken from uniform distribution, then few pages of equations... until someone points out that digits in real statistical data follow Benfords law.

For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N. Segdwick, and I am sure you are familiar with this guy's work, points out in literally EVERY LECTURE that he gave, that using tilde notation would be more appropriate for that reason. That is, assuming it would be justified to use assymptotic notation in the first place.

The justification for the checks you talk about are following, and DO NOT depend on the conjectures I mentioned earlier:

The check is limited to 64 bytes, so in absolute terms its cheap regardless of the key.
If the data is short, then entire method takes little time so it doesnt really matter. For example, if data length is 1 and key length is 64, then the checks take more time than main loop, but it doesnt matter.
If the data is much longer, say megabytes, then the check is negligible in comparison to main loop, and potential gain is huge.

KOLANICH · 2018-04-05T13:43:24Z

For one, you are assuming that keys follow uniform distribution which I think is wrong.

Of course it's wrong since xor is usually used not for crypto in file formats, but we can plug there any distribution we like by introducing an attribute giving probability of zero string. We may even specify simple distributions as expressions in KS language (of course it will require more math functions).

You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short.

I have no assumption on key length in the derivation, except the one it is a nonnegative number. The derivation should work for any length.

For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N.

I can use any function under O(), remember it's just a math notation meaning "the function in question is bounded by the function, which we write in the brackets after O". When we wanna estimate complexity of algorithms we don't distinguish between N and 2*N because 2*N < N^2 for any N > 2. But we, as you mentioned, are dealing with the real world, difference between, N, N+64 and 2*N is sufficiant to us. And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

arekbulski · 2018-04-05T14:04:58Z

And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

Half wrong. We wanna chose the implementation that is realistically faster, not the one with smaller theoretic asymptotic complexity.

I think you are completely missing the point, and just keep producing more equations.

arekbulski · 2018-04-05T14:25:56Z

I can use any function under O(), remember it's just a math notation meaning "the function in question is bounded by the function, which we write in the brackets after O". When we wanna estimate complexity of algorithms we don't distinguish between N and 2N because 2N < N^2 for any N > 2. But we, as you mentioned, are dealing with the real world, difference between, N, N+64 and 2*N is sufficiant to us. And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

Listen, I like occasional math too. In fact I find this slightly nostalgic, because I do have a CS diploma and had this theory at classes too. Aside of the fact that we probably reside in different countries, I would not mind having some calculus just for fun, over some marshmallows at campfire. But this doesnt make your arguments valid. What you said in the quote is true, but it is also irrelevant.

I implore you to instead provide some counter argument for mine, if there is one:

If the data is short, then entire method takes little time so it doesnt really matter. For example, if data length is 1 and key length is 64, then the checks take more time than main loop, but it doesnt matter.
If the data is much longer, say megabytes, then the check is negligible in comparison to main loop, and potential gain is significant.

arekbulski · 2018-04-05T15:39:34Z

Just added small check that groupsize is not negative.

arekbulski · 2018-04-05T16:37:26Z

I would prefer to remove the unused implementation before merging, but I left it so you can look at it.

arekbulski · 2018-04-06T14:00:06Z

Added optimisations from kaitai-io/kaitai_struct#411 .

GreyCat · 2018-04-06T14:01:04Z

Please stop adding more and more stuff to a single PR. This highly elevates the chance that it won't be applied. And, no, it's not "optimization", it breaks functionality of reading files which are being appended to during parsing, like live binary logs, packet captures, etc.

…loop

arekbulski · 2018-04-06T15:20:55Z

I appologise for putting so much stuff into a single PR. Please consider this PR frozen at this point, and only related to Process XOR/ROL.

I advise reading it one commit at a time, and in split view.

arekbulski force-pushed the patch-seek branch 2 times, most recently from a9d9d9a to c6d5696 Compare April 1, 2018 18:25

KOLANICH reviewed Apr 1, 2018

View reviewed changes

arekbulski mentioned this pull request Apr 2, 2018

Process ROL/ROR: rigorous testing kaitai-io/kaitai_struct#403

Open

arekbulski force-pushed the patch-seek branch from c6d5696 to 8a31dbb Compare April 2, 2018 20:17

arekbulski force-pushed the patch-seek branch from 8a31dbb to ebc0925 Compare April 2, 2018 21:26

arekbulski force-pushed the patch-seek branch from 64164cd to c340e40 Compare April 4, 2018 12:11

KOLANICH reviewed Apr 4, 2018

View reviewed changes

arekbulski force-pushed the patch-seek branch from e5bda2d to da9d67b Compare April 5, 2018 15:38

arekbulski force-pushed the patch-seek branch from da9d67b to 8ce12b5 Compare April 5, 2018 21:45

KOLANICH mentioned this pull request Apr 6, 2018

Docstrings + Process XOR/ROL optimisations kaitai-io/kaitai_struct_csharp_runtime#10

Open

arekbulski added 3 commits April 6, 2018 16:13

process xor: skips when key is zero, recognizes 1-byte case

88e41ac

process rotate: precomputed translations, comprehension instead of a …

836d7a3

…loop

added 2 helper functions: integers2bytes and bytes2integers

bbf212d

rotations of larger groups

3511a6a

arekbulski force-pushed the patch-seek branch from 9de46ee to 3511a6a Compare April 6, 2018 14:15

arekbulski changed the title ~~seek, process xor, and process rotate optimisations~~ Process XOR/ROL optimisations Apr 6, 2018

arekbulski added the enhancement label Apr 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process XOR/ROL optimisations #28

Process XOR/ROL optimisations #28

arekbulski commented Apr 1, 2018

KOLANICH Apr 1, 2018 •

edited

Loading

arekbulski Apr 1, 2018

KOLANICH Apr 1, 2018 •

edited

Loading

KOLANICH Apr 1, 2018

arekbulski Apr 1, 2018

arekbulski Apr 1, 2018

KOLANICH Apr 1, 2018 •

edited

Loading

arekbulski commented Apr 2, 2018

arekbulski commented Apr 2, 2018

KOLANICH commented Apr 2, 2018 •

edited

Loading

arekbulski commented Apr 2, 2018

KOLANICH commented Apr 2, 2018 •

edited

Loading

arekbulski commented Apr 4, 2018

arekbulski commented Apr 4, 2018

KOLANICH Apr 4, 2018 •

edited

Loading

arekbulski Apr 4, 2018

KOLANICH Apr 5, 2018 •

edited

Loading

arekbulski Apr 5, 2018

arekbulski commented Apr 5, 2018 •

edited

Loading

KOLANICH commented Apr 5, 2018 •

edited

Loading

arekbulski commented Apr 5, 2018 •

edited

Loading

arekbulski commented Apr 5, 2018

arekbulski commented Apr 5, 2018

arekbulski commented Apr 5, 2018

arekbulski commented Apr 6, 2018

GreyCat commented Apr 6, 2018

arekbulski commented Apr 6, 2018

Process XOR/ROL optimisations #28

Are you sure you want to change the base?

Process XOR/ROL optimisations #28

Conversation

arekbulski commented Apr 1, 2018

KOLANICH Apr 1, 2018 • edited Loading

Choose a reason for hiding this comment

arekbulski Apr 1, 2018

Choose a reason for hiding this comment

KOLANICH Apr 1, 2018 • edited Loading

Choose a reason for hiding this comment

KOLANICH Apr 1, 2018

Choose a reason for hiding this comment

arekbulski Apr 1, 2018

Choose a reason for hiding this comment

arekbulski Apr 1, 2018

Choose a reason for hiding this comment

KOLANICH Apr 1, 2018 • edited Loading

Choose a reason for hiding this comment

arekbulski commented Apr 2, 2018

arekbulski commented Apr 2, 2018

KOLANICH commented Apr 2, 2018 • edited Loading

arekbulski commented Apr 2, 2018

KOLANICH commented Apr 2, 2018 • edited Loading

arekbulski commented Apr 4, 2018

arekbulski commented Apr 4, 2018

KOLANICH Apr 4, 2018 • edited Loading

Choose a reason for hiding this comment

arekbulski Apr 4, 2018

Choose a reason for hiding this comment

KOLANICH Apr 5, 2018 • edited Loading

Choose a reason for hiding this comment

arekbulski Apr 5, 2018

Choose a reason for hiding this comment

arekbulski commented Apr 5, 2018 • edited Loading

KOLANICH commented Apr 5, 2018 • edited Loading

arekbulski commented Apr 5, 2018 • edited Loading

arekbulski commented Apr 5, 2018

arekbulski commented Apr 5, 2018

arekbulski commented Apr 5, 2018

arekbulski commented Apr 6, 2018

GreyCat commented Apr 6, 2018

arekbulski commented Apr 6, 2018

KOLANICH Apr 1, 2018 •

edited

Loading

KOLANICH Apr 1, 2018 •

edited

Loading

KOLANICH Apr 1, 2018 •

edited

Loading

KOLANICH commented Apr 2, 2018 •

edited

Loading

KOLANICH commented Apr 2, 2018 •

edited

Loading

KOLANICH Apr 4, 2018 •

edited

Loading

KOLANICH Apr 5, 2018 •

edited

Loading

arekbulski commented Apr 5, 2018 •

edited

Loading

KOLANICH commented Apr 5, 2018 •

edited

Loading

arekbulski commented Apr 5, 2018 •

edited

Loading