A Never Seen Before Data Structure -- O(log(n)) Range Queries

#	User	Rating
1	tourist	3757
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	171
2	awoo	165
3	adamant	163
4	TheScrasse	159
5	maroonrk	155
6	nor	154
7	-is-this-fft-	152
8	Petr	147
9	orz	146
10	pajenegod	145

TLDR: just take this blog of mine and sprinkle some canonical skew-binary numbers and boom, you have a weird version of the Fenwick tree that is asymptotically faster that the Fenwick tree on an obscure operation.

It also turns out that we can perform point updates without commutativity or inverse operations, which the plain Fenwick tree can't do by itself.

The idea

I explained on my previous blog that we can think of the Fenwick tree as a set of intervals, and that we can use those intervals to answer any range query in $$$O(\log(N)^{2})$$$. But there is no real reason to use exactly the same intervals that the Fenwick tree uses. Instead we can try to use some other set of intervals to make things more efficient (at least in an asymptotic sense).

Instead, we use intervals based on the jump pointers described on that one blog from the catalog. That blog claims that we can jump any distance from any node in $$$O(\log(N))$$$ by following a greedy algorithm (the same greedy I used to perform range queries on a Fenwick tree). It follows that we can use the same kind of thing to perform range queries in $$$O(\log(N))$$$ time.

To be clear I put things in bullet points:

We want to compute range sum queries on some array A
The data structure stores tree[i] = sum of A(i-jmp[i], i] (yes, an open-closed interval)
The jumps are defined recursively: if jmp[i] == jmp[i-jmp[i]] then jmp[i+1] = 2*jmp[i]+1 and otherwise jmp[i+1] = 1
The greedy algorithm to perform queries goes as follows:

int A[MAXN+1], jmp[MAXN+1], tree[MAXN+1];

int range(int i, int j) { // inclusive, 1-indexed
  int ans = 0;
  while (j-i+1 > 0) {
    if (j-i+1 >= jmp[j]) {
      ans += tree[j];
      j -= jmp[j];
    } else {
      ans += A[j];
      j -= 1;
    }
  }
  return ans;
}

The above algorithm performs any query in $$$O(\log(N))$$$ time. This is due to some properties of skew-binary numbers (trust me bro...). Sadly, I haven't figured out how to find the jumps on-line but we can just precompute them:

void init() {
  jmp[1] = jmp[2] = 1;
  for (int i = 2; i <= MAXN; ++i) {
    if (jmp[i] == jmp[i-jmp[i]]) {
      jmp[i+1] = 2*jmp[i]+1;
    } else {
      jmp[i+1] = 1;
    }
  }
}

Some observations:

An interval is either a single element or the concatenation of two intervals plus one more element
The intervals are well-nested and they have power of two (minus one) length
It follows that each position is covered by up to $$$\log(N)+O(1)$$$ intervals

Taking advantage of these facts, we will do updates by recomputing all the intervals that cover the updated position. In particular, we will first recompute the interval that ends at the updated position, then the smallest interval that strictly covers it and so on.

The implementation is straightforward: just update the array then walk the sequence of increasingly larger intervals and recompute them.

void update(int i, int x) { // assign A[i] = x
  A[i] = x;
  if (jmp[i] == 1) tree[i] = x, i = cover(i);
  for (; i <= MAXN; i = cover(i)) {
    tree[i] = tree[i-1 - jmp[i-1]] + tree[i-1] + A[i];
  }
}

It is possible to find the next covering interval on-line (thanks bicsi!). Remember that we construct an interval of length $$$2k+1$$$ exactly when there are two adjacent intervals of length $$$k$$$.

This tells us that if the interval that starts at position i-jmp[i] has the same length as the one that starts at position i (namely jmp[i] == jmp[i-jmp[i]]), then these two will be covered by an interval of length 2*jmp[i]+1 that ends at position i+1.

In the opposite case we must have jmp[i] == jmp[i+jmp[i]] (imagine the jmp array is infinite) so it will be covered by an interval that ends at position i+jmp[i]+1.

int cover(int i) {
  return jmp[i] == jmp[i-jmp[i]] ? i + 1 : i + jmp[i] + 1;
}

Conclusion

We have invented a Fenwick tree-flavored data structure that does $$$O(\log(N))$$$ range queries and updates, and works with non-commutative operations that don't have an inverse. I had never heard of this particular data structure so I will claim this is a completely new, never seen before data structure.

~~But surprise: it's dog slow in practice. Don't use it to perform range queries.~~ But maybe the idea could be useful in some ad-hoc problem or whatever.

UPD: After some changes I measured this to be about as fast as a recursive segment tree. (I will admit I was very sloppy, my measurement was just submitting a problem on CSES: https://cses.fi/problemset/task/1648/)

Thanks to ponysalvaje for explaining me how Fenwick tree works.

Thanks to bicsi for the idea of how to implement non-additive updates and finding the covering intervals on-line.

Full code

Here is the full compacted implementation:

int const MAXN = 200000;
int A[MAXN+1]; long long tree[MAXN+1]; int jmp[MAXN+1];
 
long long range(int i, int j) { // range query, inclusive, 1-indexed -- O(log(N))
	long long ans = 0;
	while (j-i+1 > 0) {
		if (j-i+1 >= jmp[j]) ans += tree[j], j -= jmp[j];
		else                 ans += A[j],    j -= 1;
	}
	return ans;
}
 
int cover(int i) {
	return jmp[i] == jmp[i-jmp[i]] ? i + 1 : i + jmp[i] + 1;
}
 
void update(int i, int x) { // assign A[i] = x, 1-indexed -- O(log(N))
	A[i] = x;
	if (jmp[i] == 1) tree[i] = x, i = cover(i);
	for (; i <= MAXN; i = cover(i)) {
		tree[i] = tree[i-1 - jmp[i-1]] + tree[i-1] + A[i];
	}
}
 
void init() { // init jmp[] -- O(N)
	jmp[1] = jmp[2] = 1;
	for (int i = 2; i <= MAXN; ++i) {
		jmp[i+1] = jmp[i] == jmp[i-jmp[i]] ? 2*jmp[i]+1 : 1;
	}
}
 
void updateall() { // init tree[] using A[] -- O(N)
	for (int i = 1; i <= MAXN; ++i) {
		tree[i] = jmp[i] == 1 ? A[i] : tree[i-1 - jmp[i-1]] + tree[i-1] + A[i];
	}
}

Rev.	By	When	Δ	Comment
en10	estoy-re-sebado	2023-11-22 23:15:33	3715	Make the data structure use O(N) memory, add section on better point updates.
en9	estoy-re-sebado	2023-11-22 20:54:36	32
en8	estoy-re-sebado	2023-11-22 18:07:39	88
en7	estoy-re-sebado	2023-11-22 06:01:59	209
en6	estoy-re-sebado	2023-11-22 05:31:38	13	Tiny change: 'mp[MAXN+1];\nlong long tree[MAXN' -> 'mp[MAXN+1], tree[MAXN'
en5	estoy-re-sebado	2023-11-22 05:30:43	31
en4	estoy-re-sebado	2023-11-22 05:23:45	170
en3	estoy-re-sebado	2023-11-22 05:21:40	97
en2	estoy-re-sebado	2023-11-22 05:20:28	3085	Tiny change: 'onstructed, but this ' -> 'onstructed but this ' (published)
en1	estoy-re-sebado	2023-10-16 19:19:12	1161	Initial revision (saved to drafts)

The idea

Conclusion

History