Java discussion forum to accompany the Javamex web site.

Great tutorial on Java encryption.  I keep coming back to it for reference.  Perhaps you could clarify something for me though regarding AES CTR block mode initalization vector.  I haven't been able to find anything on the web about Sun's format for the IV.  You mention that the lowest order bytes are the ones that toggle:

This IV would be compatible with Sun's implementation of CTR mode, where the counter is big endian (i.e. the highest byte of the IV actually represents the lowest-order part of the counter, and is the "part that incremenst first"). The advantage of this solution is that in many cases, you may be keeping a message counter anyway.


Isn't this little endian?  I think the example also needs to specify the endian for the ByteBuffer, or else it uses the native (which is typically little, on most PCs). 

ByteBuffer bb = ByteBuffer.allocate(16);

bb.putLong(0, messageNo);


Does Sun use a 8 byte counter (long)?


I've seen elsewhere on the web where a nonce is added to the upper 8 bytes instead of filling them with zeroes but no clear explanation of why.  Thoughts?



Views: 2461

Reply to This

Replies to This Discussion

OK, re the nonce first: the basic idea is that one way or another, you must not ever re-use the same counter value with the same encryption key. If you did that, then XORing the resulting encrypted blocks together would give you the two corresponding plaintexts XORed together, which could obviously leak significant information about the plaintext under some circumstances.


So, to arrange for this to happen when encrypting with a given key, you essentially have two solutions: either start your count at some random number (or in other words, start at zero but add a nonce: it amounts to the same thing), or else guarantee that you start it at 1 plus where you last left off last time you encrypted a conversation with that key.


[Of course, if you start with a random number, then there's a miniscule chance that at some point you will encrypt with the same counter value/key combination as some other block previously encrypted, but the point is that with a cryptographically strong random number generator the chance is in fact so miniscule that it would be impractical for an attacker to find which block corresponded to which.]


Re the endianness: I don't honestly remember off-hand, but I have a vague recollection that the endianness was "not what I expected", so it may well be big-endian as the article says. I will chcek again when I get a moment and correct if necessary. Either way, you're absolutely correct about the buffer: you should always specify the endianness of the buffer where it matters, and so this is definitely a mistake.

Ok I think I figured this out by writing some test code.  It looks like Sun uses a big endian counter in the last 8 bytes of the IV.  That leaves the first 8 bytes to be used as a nonce.  Test code is:


byte[] IV = new byte[16];        

ByteBuffer buf = ByteBuffer.wrap(IV);        

buf.order(ByteOrder.BIG_ENDIAN);  //Sun implements counter as big-endian        

buf.position(8);  //Counter is the last 8 bytes        



To test, I created an IV with a random nonce in the first 8 bytes, and a counter that increments by 1.  I encrypted the same 16-byte plaintext message 3 times for each IV, with the first two using the .update() method, followed by the .doFinal().  Thus, the algorithm would increment the counter for each 16-byte block.


In the output below, you can see that the ciphertext output with the same counter values match.


IV A= e68ae946374e7f670000000000000001       

CipherTextA1 = 0342d5bf22a0751d58f84dc1f45b37ea       <--counter = 1

CipherTextA2 = 2cab699cc0ee7df3c42ee2b08a0b5fbf        <--counter = 2

CipherTextA3 = 2d46c86ed955d753b28b597e93886054    <--counter = 3


IV B= e68ae946374e7f670000000000000002

CipherTextB1 = 2cab699cc0ee7df3c42ee2b08a0b5fbf         <--counter = 2

CipherTextB2 = 2d46c86ed955d753b28b597e93886054    <--counter = 3

CipherTextB3 = c539b28492c4c1b2347702be5fcff7c7         <--counter = 4


IV C= e68ae946374e7f670000000000000003

CipherTextC1 = 2d46c86ed955d753b28b597e93886054     <--counter = 3

CipherTextC2 = c539b28492c4c1b2347702be5fcff7c7          <--counter = 4

CipherTextC3 = ae19494cf58df69377d33e1398be7415         <--counter = 5





Many thanks for this clarification!

Neil Coffey said:

OK, re the nonce first: the basic idea is that one way or another, you must not ever re-use the same counter value with the same encryption key. If you did that, then XORing the resulting encrypted blocks together would give you the two corresponding plaintexts XORed together, which could obviously leak significant information about the plaintext under some circumstances.



I associated your hint at XORing as a tool for attacks with your suggestion of XORing as a default method to combine hash values in your tutorial on hashing, which I enjoyed reading. If two of the data fields are of the same type and are not unlikely to hold the same values, there may be multiple hash collisions.

Imagine, for example, a vector (in the mathematical sense). Points on a diagonal line through the origin have commmon coordinate values: (A,A,0), (B,B,0), (C,C,0). The same is true if the diagonal is at any other height (A,A,Z), (B,B,Z), (C,C,Z) or direction (A,Y,A), (B,Y,B), (C,Y,C). Any point on such lines would have the same hash value since hash(A)^hash(A)^hash(Z) = hash(Z).

Another example is RGB color codes:  If the hash is built by XORing component hash values, #3333XX, #6666XX, ... FFFFXX would collide.

Is it a good idea just to implement different permutations of bits for the component hashes (in my case, permutation is a fast FPGA operation) if I have no information on any structure of the data?

Thank you. 

Thanks for your point and sorry for the delay in responding. I think there are a few things worth mentioning here.


The first thing that is worth mentioning is that, while there is a connection, the calculation of hash codes for use in a hash table generally has a fundamental design difference to cryptographic use of encryption or hashes, namely that in a hash table, we are not generaly designing against an adversary trying to deliberately "break" our hash table, but rather for something like "randomly chosen keys from a typical possible set of keys".


If you do need to design against a deliberate attack on a hash table, then you may need to consider keying on a cryptographically strong hash and possibly going beyond a simple hash table. Cryptographic hashes are generally not used in ordinary hash tables for performance reasons, but they're hash codes at the end of the day and could be used as the key to a hash table. But note that hash tables are typically small enough that it is possible to use brute force to find, say, several thousand items that resolve to the same hash code, cryptographic hash code or not.


Beyond deliberate attacks, there may well be cases where you know something about your data that means you need to design your hash codes in a particular way. In your colour example, it depends a bit on if you think that the colours/vectors are going to be "genuinely random" or if the pattern of equal components that you mention are liable to occur in reality. If they are likely to have that specific pattern, or indeed various other "simple" relationships between components, then using different permutations (in effect, you could do something like run each component through one cycle of an XORShift generator and choose different shift values for each component-- I assume you mean something along those lines) could solve the problem, but increases the complexity of the hash code.


So for any weakness in the hash function, it ultimately depends on how likely you think that weakness is to be exposed, and how much complexity it is worth adding to the hash code to overcome that weakness.

My case is about performance (thank you, Neil, for your hint at XORShift generators, which have been invented after my graduation) and about safety (not security). It should just not happen due to a dull mistake in the construction of the hash algo that the rate of collisions depend on the likelihood of patterns in the data (it may become larger than expected by orders of magnitude).


Reply to Discussion


© 2017   Created by Neil Coffey.   Powered by

Report an Issue  |  Terms of Service