Rating changes for last rounds are temporarily rolled back. They will be returned soon. ×

whatthemomooofun1729's blog

By whatthemomooofun1729, history, 7 months ago, In English

Hi, I am trying to check if an array (consisting of the numbers from 1 to N, where N is the length of the array) is a permutation of N elements using hashing. My idea is to assign a randomized value as a hash for each value between 1 and N. The hash value of an array would be them sum of these individual hashes, and I compare this sum to the sum of the hashes for a permutation of length N.

I am wondering, what is the probability of collisions using this type of hashing? I couldn't find much online other than the probability for rolling hashes. Thank you!

»
7 months ago, # |
  Vote: I like it +11 Vote: I do not like it

Why not use xor-sum instead of normal sum? The probability is easier to bound this way. Assume that the hashes that you generate are h[1..n]. Suppose that your hashes retain $$$B$$$ bits of information and that they are randomly chosen from an uniform distribution: 0 <= h[i] < 2**B, P(h[i] = X | 0 <= X < 2**B) = 1 / 2**B.

Notice that this means that P(the jth bit of h[i] = 0) = P(the jth bit of h[i] = 1) = 1/2 for any 1 <= i <= n, 0 <= j < B.

Suppose that the array we are checking is v[1..n] and we know that 1 <= v[i] <= n.

The correct answer for a permutation would be h_ok = h[1] ^ h[2] ^ .. ^ h[n]. Suppose that v[1..n] is not a permutation and we want to calculate the probability that h[v[1]] ^ h[v[2]] ^ .. ^ h[v[n]] = h_ok.

By moving all terms on one side, we would be left with an equality like {0, 1} * h[1] ^ {0, 1} * h[2] ^ .. ^ {0, 1} * h[n] = 0. (as in h[1] either once or none ...)

For example, if v[1..7] = [3 4 5 3 3 2 7], we would have the equality h[3] ^ h[3] ^ h[1] ^ h[6] = 0 <=> h[1] ^ h[6] = 0.

We will prove that the number of terms that we have on the left hand side doesn't matter and the probability of equality is $$$1/2^B$$$.

Let's prove first that the probability of the LSB to be equal to $$$0$$$ is $$$1/2$$$ no matter how many terms we have on the LHS.

If we have $$$1 \leq k \leq n$$$ terms on the left hand side, then the result is $$$0$$$ if either $$$0, 2, 4, ... $$$ out of the $$$k$$$ bits are $$$1$$$:

$$$P(LHS = 0) = \frac{{k \choose 0} + {k \choose 2} + {k \choose 4} + ...}{2^k} = \frac{2^{k-1}}{2^k} = \frac{1}{2}$$$.

Now, since the hashes have been uniformly chosen in $$$[0 .. 2^B)$$$, the bits' values are independent, so the probability for the LHS to be $$$0$$$ taking in consideration all of the bits is $$$(1/2) ^ B = \frac{1}{2 ^ B}$$$.

So if you use $$$B = 64$$$, the collision probability is $$$< 10^{-19}$$$.

Monte-Carlo simulation to test the expected probability:
»
7 months ago, # |
  Vote: I like it 0 Vote: I do not like it

It's possible,but it's easy to fail in distinguishing different arrays.The different arrays which have same hash value appear frequently.

However, you can use different principles to caculate the hash value to make it safer.

I hope my comments would help you.