Блог пользователя megalodon

Автор megalodon, 11 лет назад, По-английски

Hello. Can anybody please provide me a clean tutorial of how to implement a Suffix Array in Java? Thanks

  • Проголосовать: нравится
  • -9
  • Проголосовать: не нравится

»
11 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

May be this helps.. It has a clean implementation along with lots of practice problems

  • »
    »
    11 лет назад, # ^ |
      Проголосовать: нравится +2 Проголосовать: не нравится

    thanks, but the suffix array presented is too slow. You need something at least O(n logn) for a contest.

»
11 лет назад, # |
  Проголосовать: нравится +20 Проголосовать: не нравится

https://sites.google.com/site/indy256/algo/suffix_array — just an implementation.

Also, I don't think that it's time to learn suffix array with your green colour...

  • »
    »
    11 лет назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    that it's time to learn suffix array with your green colour...

    Maybe, it's true. But...

    For example, not all my students in university (they had AC on "Suffix Array task" as part of practice) where good in olympiads. Even clever students. Some of them even never participated :-)

    Or university is not good place/time to learn Suffix Array? Or not "true olympiad people" should not know about it? :)

    • »
      »
      »
      11 лет назад, # ^ |
        Проголосовать: нравится +8 Проголосовать: не нравится

      I'm sure that author wants to learn it for olympiads. But if so, there are many easier and more helpful algorithms.

      Your university is good place to learn suffix array. I guess most of my groupmates will be in the army now if there's a similar course in my university. And I have a feeling that most of the universities are closer to my than to yours :D

      • »
        »
        »
        »
        10 лет назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        Re-read your comments. Sounds arrogant, isn't it?

        • »
          »
          »
          »
          »
          10 лет назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          It doesn't matter how it sounds, because the truth looks exactly as dalex said 1.5 years ago.

          Just go to a random university and try to explain suffix array to a random CS student. You'll be heavily surprised that this student very likely won't understand you. After this you will possibly start to appreciate the fact that you study in MSU and are surrounded with lots of smart people.

          And yes, I really know what I say.

  • »
    »
    11 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится +5 Проголосовать: не нравится

    It's always time to learn. Thanks for the implementation anyway it's good enough.

  • »
    »
    11 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Incorrect attitude bro, studying classical algorithms also teaches me quite productively how to think.

»
11 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Well,, I think this two (1 and 2) links provide a good number of resources on suffix array. You can also take help from Stack Overflow too.

Hope these help you. And sorry for my poor English. Thanks :)

»
10 лет назад, # |
Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

Naive implementation is many times just fine for a contest.

»
10 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I think that the question asked on a tutorial of how to implement it and not of an implementation already coded. So here's my brief explanation on how you can implement it by yourself (you always learn more by implementing an algorithm your self than by just using it).

You'll group all the suffixes in buckets, and you'll do it by comparing their prefixes of sizes 2^i for all i until 2^i >= length of the string. Let's say you have the string "ABRACADABRA", then you'll have all this suffixes: "ABRACADABRA", "BRACADABRA", "RACADABRA", "ACADABRA", "CADABRA", "ADABRA", "DABRA", "ABRA", "BRA", "RA" and "A".

First of all you'll group them in buckets for the first letter, so the buckets will be:

["ABRACADABRA", "ACADABRA", "ADABRA", "ABRA", "A"]

["BRACADABRA", "BRA"]

["CADABRA"]

["DABRA"]

["RACADABRA", "RA"]

Note that in each bucket the strings aren't sorted but two strings are properly sorted between each other if they belong to different buckets. This comparison is O(n log n) as you only sort comparing by one character per suffix (The first character of the suffix). So now you have the strings sorted by the first character, and you want to sort them by the first two characters. What you'll do now is sorting the suffixes that belong to the same bucket between each other, but not comparing with between suffixes of different buckets, as you already know how they compare each other.

Let's say we want to sort the first bucket ["ABRACADABRA", "ACADABRA", "ADABRA", "ABRA", "A"]. For sorting it we know that the first letter is the same in all the suffixes, so we compare the second letter, but how we compare it? We don't compare characters but suffixes, for example, if we want to compare "ABRACADABRA" with "ACADABRA", we know how "BRACADABRA" and "CADABRA" compare each other by the first letter of the suffix, so we look for this comparison. Now we have the following buckets for all this strings:

["A"] (note that because the "second letter" of this string is the end of string we consider it to come before any other letter).

["ABRACADABRA", "ABRA"]

["ACADBRA"]

["ADABRA"]

This is how we do the second step. Now look that the second bucket (["BRACADABRA","BRA"]) remained the same, because "BR" == "BR", so let's see how we sort it in the third step. Now we want to compare the suffixes by their prefixes of length 4 (or less if the suffix has less than 4 characters), so "BRA" < "BRAC" but we need to do two comparisons, as we know they share the first two characters, but we have to compare "A" == "A" and End of String < "C", but let's see how we do it in one step. We know how "A" and "ACADABRA" compare to each other according to the prefix of length at most 2, that is "A" < "ACADABRA", so we just look for this comparison once, and we know then that "BRA" < "BRACADABRA".

Now let's use a bigger example to make it more clear. Let's say we have the strings "ABCDEFGHIABCDEFGIJK" and "ABCDEFGIJK" in the same bucket and we've already compared the first 4 letters of the suffixes. Now we want to compare "ABCDEFGI" with "ABCDEFGH" (the first 8 letters of the suffixes). We know that "EFGHIABCDEFGIJK" < "EFGIJK" according to the first 4 letters of the suffixes, so we just make look for this comparison and we compare 4 letters in only one operation.

I hope it's clear enough for you to being able to implement it.

»
10 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Thanks Man !!