Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process XOR/ROL optimisations #28

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Process XOR/ROL optimisations #28

wants to merge 4 commits into from

Conversation

arekbulski
Copy link
Member

Effectuates:
kaitai-io/kaitai_struct#390
kaitai-io/kaitai_struct#397

I ran the test suite but please confirm it. Also I leave running the benchmarks to you, because I am not able to run those.

@arekbulski arekbulski force-pushed the patch-seek branch 2 times, most recently from a9d9d9a to c6d5696 Compare April 1, 2018 18:25
kaitaistruct.py Outdated
if PY2:
return bytes(bytearray(v ^ key for v in bytearray(data)))
else:
return bytes(v ^ key for v in data)

@staticmethod
def process_xor_many(data, key):
if key == b'\x00' * len(key):
Copy link
Contributor

@KOLANICH KOLANICH Apr 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overhead on every call. I guess unneeded - comparing every byte of one array to each byte of another array ( where each byte is zero) has the same complexity as xoring 2 arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the key is few bytes and data is millions of bytes, then this check has negligible cost. However... you are right. It might be better to use something like not any(key) which would not read entire key but only up to first non-zero byte.

Copy link
Contributor

@KOLANICH KOLANICH Apr 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are also right. So maybe check 2 sizes and take decision what to do based on them?

if len(key) < len(data)//someCoefficientDependingOnTheProbabiltyToMeetZeroAndDistributionOfKeysLengths

?

kaitaistruct.py Outdated
if PY2:
return bytes(bytearray(a ^ b for a, b in zip(bytearray(data), itertools.cycle(bytearray(key)))))
else:
return bytes(a ^ b for a, b in zip(data, itertools.cycle(key)))

precomputed_rotations = {(amount,1):[(i << amount) & 0xff | (i >> (-amount & 7)) for i in range(256)] for amount in range(9)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that memory lookup (especially in interpreted fully object-oriented language, so I guess CPU cache helps little) is faster than bit shift. Could you provide some benchmark results?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to run a benchmark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also need to withdraw one of the commits.

Copy link
Contributor

@KOLANICH KOLANICH Apr 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to run a benchmark.

there is no need to run kaitai benchmark, you can run just an own one

@arekbulski
Copy link
Member Author

Benchmarks as requested:

>>> from timeit import *
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
12.807483946000048
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
13.325101311000026
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.117929144000072
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.115412523000032
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.979522146999898
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.68503412799987
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24798052300002
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24846534499989

@arekbulski
Copy link
Member Author

All suggestions from @KOLANICH included and resolved. Ready to merge.

@KOLANICH
Copy link
Contributor

KOLANICH commented Apr 2, 2018

64.24798052300002

It's a shame that python doesn't have >>> and <<< operators since both rotations are implemented in hardware (more concrete: in both x86 and arm) since ancient times and we have to use these hacks. Could you create an issue in their bug tracker about it?

@arekbulski
Copy link
Member Author

I would reach to python-ideas mailing list. The guys there have authoritative knowledge about such things, and they may say that adding those operators might not happen or was already rejected. I will ask them.

OTOH, that would not affect us. Since the runtime needs to support older Python runtimes (including 2.7), even if they would add >>> and <<< we could not use it.

@KOLANICH
Copy link
Contributor

KOLANICH commented Apr 2, 2018

Since the runtime needs to support older Python runtimes (including 2.7), even if they would add >>> and <<< we could not use it.

I wonder if it is possible to extend python syntax with a native module. Or at least make some builtin functions without much overhead.

@arekbulski
Copy link
Member Author

I added some helper code that would allow me to not maintain 2 code paths.

@arekbulski
Copy link
Member Author

Finished implementing the rotation, including larger groups.

return bytes(a ^ b for a, b in zip(data, itertools.cycle(key)))
if len(key) == 1:
return KaitaiStream.process_xor_one(data, ord(key))
if len(key) <= 64 and key == b'\x00' * len(key):
Copy link
Contributor

@KOLANICH KOLANICH Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 64? Why not, for example a quarter of message length?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to make it O(1) time, in case both key and data was megabyte sized. If so then it skips the check and just does the xoring.

Copy link
Contributor

@KOLANICH KOLANICH Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to make it O(1) time, in case both key and data was megabyte sized. If so then it skips the check and just does the xoring.

Again, IMHO it is waste of resources, I assume xoring to zero string as improbable, in most cases it is waste of resources.

let we have the probability of the zero array is p, N is the bit-length of data, K is the bit-length of key.

the complexity in the case of zero is
O(K)
(in fact 2*K because we create an array and then compare it)

otherwise

O(K+N)

So the average complexity of zero-checking case (we check key for zero) is

c=K+(1-p)*N

And the average complexity of non-checking case is always N.

So we need to select by the clause
K+(1-p)*N<N

K<p*N

Now let's model p.
If each bit of a key is randomly chosen from {1, 0} then
p = 2^-K

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read the reply below and withdraw this one review.

@arekbulski
Copy link
Member Author

arekbulski commented Apr 5, 2018

@KOLANICH Please stop right there. What you are doing is a pointless mathematical exercise, not proving anything. You are making huge implicit assumptions that arent justified, and overall doing it wrong.

For one, you are assuming that keys follow uniform distribution which I think is wrong. You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short. Thats also probably wrong. In my personal experience, programmers tend to use very short keys, few bytes at most. It seems unlikely to find kilobyte-sized keys in real formats, just due to how software deverlopers think and work. Its also somewhat likely that all-zero keys can be found in real usage. Just because the format includes a possibility of xoring some data, it doesnt mean that is actually and actively being used. Programmers often tend to use zero keys when they dont have a particular need to use a different one. A safe default, so to speak. Note that I am not presenting those conjectures to justify my code, but rather to disprove your argument. Your argument resembles this kind of thinking: we have some real-world numbers, and digits are taken from uniform distribution, then few pages of equations... until someone points out that digits in real statistical data follow Benfords law.

For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N. Segdwick, and I am sure you are familiar with this guy's work, points out in literally EVERY LECTURE that he gave, that using tilde notation would be more appropriate for that reason. That is, assuming it would be justified to use assymptotic notation in the first place.

The justification for the checks you talk about are following, and DO NOT depend on the conjectures I mentioned earlier:

  • The check is limited to 64 bytes, so in absolute terms its cheap regardless of the key.
  • If the data is short, then entire method takes little time so it doesnt really matter. For example, if data length is 1 and key length is 64, then the checks take more time than main loop, but it doesnt matter.
  • If the data is much longer, say megabytes, then the check is negligible in comparison to main loop, and potential gain is huge.

@KOLANICH
Copy link
Contributor

KOLANICH commented Apr 5, 2018

For one, you are assuming that keys follow uniform distribution which I think is wrong.

Of course it's wrong since xor is usually used not for crypto in file formats, but we can plug there any distribution we like by introducing an attribute giving probability of zero string. We may even specify simple distributions as expressions in KS language (of course it will require more math functions).

You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short.

I have no assumption on key length in the derivation, except the one it is a nonnegative number. The derivation should work for any length.

For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N.

I can use any function under O(), remember it's just a math notation meaning "the function in question is bounded by the function, which we write in the brackets after O". When we wanna estimate complexity of algorithms we don't distinguish between N and 2*N because 2*N < N^2 for any N > 2. But we, as you mentioned, are dealing with the real world, difference between, N, N+64 and 2*N is sufficiant to us. And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

@arekbulski
Copy link
Member Author

arekbulski commented Apr 5, 2018

And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

Half wrong. We wanna chose the implementation that is realistically faster, not the one with smaller theoretic asymptotic complexity.

I think you are completely missing the point, and just keep producing more equations.

@arekbulski
Copy link
Member Author

I can use any function under O(), remember it's just a math notation meaning "the function in question is bounded by the function, which we write in the brackets after O". When we wanna estimate complexity of algorithms we don't distinguish between N and 2N because 2N < N^2 for any N > 2. But we, as you mentioned, are dealing with the real world, difference between, N, N+64 and 2*N is sufficiant to us. And we wanna choose the optimal algo for the situation - the one giving the lesser complexity.

Listen, I like occasional math too. In fact I find this slightly nostalgic, because I do have a CS diploma and had this theory at classes too. Aside of the fact that we probably reside in different countries, I would not mind having some calculus just for fun, over some marshmallows at campfire. But this doesnt make your arguments valid. What you said in the quote is true, but it is also irrelevant.

I implore you to instead provide some counter argument for mine, if there is one:

  • If the data is short, then entire method takes little time so it doesnt really matter. For example, if data length is 1 and key length is 64, then the checks take more time than main loop, but it doesnt matter.
  • If the data is much longer, say megabytes, then the check is negligible in comparison to main loop, and potential gain is significant.

@arekbulski
Copy link
Member Author

Just added small check that groupsize is not negative.

@arekbulski
Copy link
Member Author

I would prefer to remove the unused implementation before merging, but I left it so you can look at it.

@arekbulski
Copy link
Member Author

Added optimisations from kaitai-io/kaitai_struct#411 .

@GreyCat
Copy link
Member

GreyCat commented Apr 6, 2018

Please stop adding more and more stuff to a single PR. This highly elevates the chance that it won't be applied. And, no, it's not "optimization", it breaks functionality of reading files which are being appended to during parsing, like live binary logs, packet captures, etc.

@arekbulski arekbulski changed the title seek, process xor, and process rotate optimisations Process XOR/ROL optimisations Apr 6, 2018
@arekbulski
Copy link
Member Author

I appologise for putting so much stuff into a single PR. Please consider this PR frozen at this point, and only related to Process XOR/ROL.

I advise reading it one commit at a time, and in split view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants