Help with Suffix Arrays

Блог пользователя megalodon

Автор megalodon, 11 лет назад, По-английски

Hello. Can anybody please provide me a clean tutorial of how to implement a Suffix Array in Java? Thanks

suffix array, strings, sortings

-9

megalodon
11 лет назад
13

Комментарии (13)

Написать комментарий?

phantom11

11 лет назад, # |

May be this helps.. It has a clean implementation along with lots of practice problems

→ Ответить

megalodon

11 лет назад, # ^ |

thanks, but the suffix array presented is too slow. You need something at least O(n logn) for a contest.

→ Ответить

dalex

11 лет назад, # |

+20

https://sites.google.com/site/indy256/algo/suffix_array — just an implementation.

Also, I don't think that it's time to learn suffix array with your green colour...

→ Ответить

Burunduk1

11 лет назад, # ^ |

that it's time to learn suffix array with your green colour...

Maybe, it's true. But...

For example, not all my students in university (they had AC on "Suffix Array task" as part of practice) where good in olympiads. Even clever students. Some of them even never participated :-)

Or university is not good place/time to learn Suffix Array? Or not "true olympiad people" should not know about it? :)

→ Ответить

dalex

11 лет назад, # ^ |

I'm sure that author wants to learn it for olympiads. But if so, there are many easier and more helpful algorithms.

Your university is good place to learn suffix array. I guess most of my groupmates will be in the army now if there's a similar course in my university. And I have a feeling that most of the universities are closer to my than to yours :D

→ Ответить

Zlobober

10 лет назад, # ^ |

Re-read your comments. Sounds arrogant, isn't it?

→ Ответить

AlexanderBolshakov

10 лет назад, # ^ |

It doesn't matter how it sounds, because the truth looks exactly as dalex said 1.5 years ago.

Just go to a random university and try to explain suffix array to a random CS student. You'll be heavily surprised that this student very likely won't understand you. After this you will possibly start to appreciate the fact that you study in MSU and are surrounded with lots of smart people.

And yes, I really know what I say.

→ Ответить

megalodon

11 лет назад, # ^ |

← Rev. 2 →

It's always time to learn. Thanks for the implementation anyway it's good enough.

→ Ответить

Dixtosa

11 лет назад, # ^ |

Incorrect attitude bro, studying classical algorithms also teaches me quite productively how to think.

→ Ответить

halfo

11 лет назад, # |

Well,, I think this two (1 and 2) links provide a good number of resources on suffix array. You can also take help from Stack Overflow too.

Hope these help you. And sorry for my poor English. Thanks :)

→ Ответить

allthecode

10 лет назад, # |

← Rev. 3 →

Naive implementation is many times just fine for a contest.

→ Ответить

ltaravilse

10 лет назад, # |

I think that the question asked on a tutorial of how to implement it and not of an implementation already coded. So here's my brief explanation on how you can implement it by yourself (you always learn more by implementing an algorithm your self than by just using it).

You'll group all the suffixes in buckets, and you'll do it by comparing their prefixes of sizes 2^i for all i until 2^i >= length of the string. Let's say you have the string "ABRACADABRA", then you'll have all this suffixes: "ABRACADABRA", "BRACADABRA", "RACADABRA", "ACADABRA", "CADABRA", "ADABRA", "DABRA", "ABRA", "BRA", "RA" and "A".

First of all you'll group them in buckets for the first letter, so the buckets will be:

["ABRACADABRA", "ACADABRA", "ADABRA", "ABRA", "A"]

["BRACADABRA", "BRA"]

["CADABRA"]

["DABRA"]

["RACADABRA", "RA"]

Note that in each bucket the strings aren't sorted but two strings are properly sorted between each other if they belong to different buckets. This comparison is O(n log n) as you only sort comparing by one character per suffix (The first character of the suffix). So now you have the strings sorted by the first character, and you want to sort them by the first two characters. What you'll do now is sorting the suffixes that belong to the same bucket between each other, but not comparing with between suffixes of different buckets, as you already know how they compare each other.

Let's say we want to sort the first bucket ["ABRACADABRA", "ACADABRA", "ADABRA", "ABRA", "A"]. For sorting it we know that the first letter is the same in all the suffixes, so we compare the second letter, but how we compare it? We don't compare characters but suffixes, for example, if we want to compare "ABRACADABRA" with "ACADABRA", we know how "BRACADABRA" and "CADABRA" compare each other by the first letter of the suffix, so we look for this comparison. Now we have the following buckets for all this strings:

["A"] (note that because the "second letter" of this string is the end of string we consider it to come before any other letter).

["ABRACADABRA", "ABRA"]

["ACADBRA"]

["ADABRA"]

This is how we do the second step. Now look that the second bucket (["BRACADABRA","BRA"]) remained the same, because "BR" == "BR", so let's see how we sort it in the third step. Now we want to compare the suffixes by their prefixes of length 4 (or less if the suffix has less than 4 characters), so "BRA" < "BRAC" but we need to do two comparisons, as we know they share the first two characters, but we have to compare "A" == "A" and End of String < "C", but let's see how we do it in one step. We know how "A" and "ACADABRA" compare to each other according to the prefix of length at most 2, that is "A" < "ACADABRA", so we just look for this comparison once, and we know then that "BRA" < "BRACADABRA".

Now let's use a bigger example to make it more clear. Let's say we have the strings "ABCDEFGHIABCDEFGIJK" and "ABCDEFGIJK" in the same bucket and we've already compared the first 4 letters of the suffixes. Now we want to compare "ABCDEFGI" with "ABCDEFGH" (the first 8 letters of the suffixes). We know that "EFGHIABCDEFGIJK" < "EFGIJK" according to the first 4 letters of the suffixes, so we just make look for this comparison and we compare 4 letters in only one operation.

I hope it's clear enough for you to being able to implement it.

→ Ответить