Finding the LCS of N strings

→ Обратите внимание

До соревнования
Codeforces Round 950 (Div. 3)
2 дня
Зарегистрироваться »

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3757
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	maomao90	171
2	awoo	165
3	adamant	164
4	TheScrasse	159
5	maroonrk	155
6	nor	153
7	-is-this-fft-	152
8	Petr	147
9	orz	146
10	pajenegod	145

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя asif_iut

Finding the LCS of N strings

Автор asif_iut, 13 лет назад, По-английски

how to find the Longest Common Subsequence ( LCS ) of N strings ? is there any dp recurrence ?

lcs, strings

asif_iut
13 лет назад
26

Комментарии (25)

Показать архивные | Написать комментарий?

_jte_

13 лет назад, # |

← Rev. 2 →

Never mind.

→ Ответить

olpetOdessaONU

13 лет назад, # ^ |

← Rev. 3 →

There is one problem. Let we have 3 strings:

AAAAAABC
AAAAAADC
EEEEEEEFC

The first step gives us the LCS "AAAAAA" and the second one gives the empty LCS. But the right answer is "C".

→ Ответить

olpetOdessaONU

13 лет назад, # ^ |

I'm sorry, LCS means Longest Common String or Longest Common Suffix?

→ Ответить

_jte_

13 лет назад, # ^ |

Yes, you're right. This will fail.

→ Ответить

olpetOdessaONU

13 лет назад, # ^ |

But if we will support the set of nonexpanding common substrings, it can work.

→ Ответить

_jte_

13 лет назад, # ^ |

I've got your idea, thanks.

→ Ответить

subscriber

13 лет назад, # |

← Rev. 3 →

If I have right idea, ~~this~~ not this task can be solved by suffix array with complexity about O(S log S), where S is sum of all string's lengths.

Looks truly and very simple, so I'll share.
Build suffix array of concatenation of all strings.
Go for all elements with two links, that mean begin and end point of curent segment. Also keep LCP of all consecutive suffixes(in heap for example).
If suffixes in curent segment belong to all N strings, then check minimum element of the heap for answer and decrease length of segment(move left link). In other case, increase length of segment(move right link).
No gurantee, may be I'm wrong.

→ Ответить

aropan

13 лет назад, # ^ |

You find Longest Common Substring.

→ Ответить

subscriber

13 лет назад, # ^ |

You're right. Looks like game "guess the problem".

→ Ответить

CherryTree

13 лет назад, # |

Can be done almost in the same way as for two strings with a suffix automaton. Complexity is O(total length of all strings).

→ Ответить

_jte_

13 лет назад, # ^ |

Can you describe your algo?

→ Ответить

CherryTree

13 лет назад, # ^ |

It is decribed at e-maxx.ru how to solve it for two strings. The only difference is that for each state of the automaton you have to store a bitmask that contains an information about strings that have a substring which ends in this state.

→ Ответить

goo.gl_SsAhv

13 лет назад, # ^ |

-8

if you say about bit masks, the complexity is not O(total length of all strings), but atleast O(2^ number of strings) .

→ Ответить

CherryTree

13 лет назад, # ^ |

It takes O(length of the first string) time to build an automaton and O(total length of all strings) to calculate for each string all states of the automaton that can be reached from this string (that are the substrings of a given string). Actually, bitmask is not necessary, for each state you just have store whether is it still reachable from all processed strings.

→ Ответить

asif_iut

13 лет назад, # |

by LCS i mean Longest Common Subsequence.... sorry for not mentioning this before

→ Ответить

CherryTree

13 лет назад, # ^ |

← Rev. 2 →

As I know it is NP problem.

→ Ответить

jlcastrillon

13 лет назад, # ^ |

← Rev. 2 →

-14

it can also be done using suffix array but it's not as optimal as suffix automata approach.

→ Ответить

ftc	13 лет назад, # ^ \| 0 BTW, I don't know anything better than O(multiply of all lengths). And with such complexity there is a simple dp approach. → Ответить

dj3500

13 лет назад, # |

The problem is NP-hard.

→ Ответить

jlcastrillon

13 лет назад, # ^ |

-16

You guys misanderstood what I said. I was talking about
longest common substring for n strings. As Sergey said longest common subsequence for n strings is NP.
Longest common substring as I know it can be solve using suffix array ,suffix tree, or suffix automaton.

→ Ответить

dj3500

13 лет назад, # ^ |

The author asked for LCS (subsequence). And I was responding to his post, not yours.

The NP-hardness can be shown by a reduction to the Vertex Cover problem.

→ Ответить

ColdRobot

6 лет назад, # |

I know this is old, but can't we find LCS of 2 strings, then find LCS of another 2 strings and so on of every string, then take LCS of 2 LCSes?

Imagine 4 strings, we take LCS of 1&2 and 3&4, whatever LCS we get, we take LCS of them both.

→ Ответить

-Morass-

6 лет назад, # ^ |

+21

Good day to you,

well not sure if I understood you correctly, yet, lets imagine following strings:

aaabb
bbaaa
cccbb
bbccc

Firstly: LCS(aaabb,bbaaa)==aaa [imho doesn't matter here whether subsequence or substring was meant :P ]

Secondly LCS(cccbb,bbccc)==ccc

So we take results LCS(aaa,ccc)==""

Which shall not be correct if I'm not mistaken, since LCS(aaabb,bbaaa,cccbb,bbccc)=="bb"

Wish you a nice day!

→ Ответить

ColdRobot

6 лет назад, # ^ |

← Rev. 2 →

Thanks, How about finding the LCS between all strings/sequences and storing the one with minimum length, eg check (1,2)(1,3)(1,4)(2,3)(2,4)and(3,4) here.

I am new to the world of algorithms, am learning LCS for the first time, it would be of enormous help. Thanks.

→ Ответить

-Morass-

6 лет назад, # ^ |

Uh, I'm sorry, but I'm afraid I don't understand your algorithm now.

Mostly part "finding the LONGEST COMMON SUBSTRING/SUBSEQUENCE between all strings/sequences and storing the one with minimum length" — not sure what are we storing in here :'(

Not sure what you exactly want/expect from an algorithm.

Anyway as soon you are looking for any interesting algorithms (want to learn, or you are interested in) — for Longest Common Subsequence:

The most useful (due to its simplicity) is classical dynamic programming which was mentioned by CherryTree above.

If you would like to go more "deep" (but sorry, I was never thinking about the "generalisation" of these algorithms — so I'll mention it for 2 only), you might be interested in:

DP which reduces complexity to O(N² + M) insread of O(N * M) (useable for long+short string)
Hunt-Szymanski Algorithm, which is pretty sexy — considering the character layout of strings
LCS using four russians method (at least I guess it is called like this) which is also very sexy (yet imho greater pain :p) which provides awesome speedup, leting us solve "big" subsequences.

As long as you would be interested in Longest Common Substring, then there is much bigger diversity in order of complexities. I will also talk about LCS for 2 strings, yet HERE it is usually "easily" (or somehow) generalisable:

Obiously, but it is nothing interesting, you can go with some very naive algorithm — for each substring of one string, try whether it is in the other string ... hope I didn't make big mistake but it would be O(N⁴) (considering both strings to be of size N)? Obviously easily generalised for multiple strings, but the complexity shall not raise significantly (you have more strings.. but the "opponent" shall make them shorter... just polemisation)
There are some "interesting" speedups... for example if you use trie or hashing, you can easily get rid or one N in the complexity. This also isn't interesting, yet imho it is good brain-teaser if you are begginer in this field... since you can "get rid off" another N if you will continue in similar manner.
Here, the first interesting (not fastest, yet imho not hard + kinda easy) method is usage of hashing and Binary Search. Also big "+" of this method is, that it is very easily generalised for multiple strings. The cost is O(Nlog(N)) (and I think it is fairly possible to come up with this idea yourself ^_^ )
Finally, if you would like to go "deeper" or "faster" you could use some suffix data structures. Kinda problem is, that most of them (imho) are not "easy" as you would like to go O(N). An example is usage of Suffix Array + LCP... or something guys above mentioned... again, generalisable to multiple strings.

Sorry for wall of text. As you are beggined with this topic, I think for topic a, the key is basic dynamic which solves most of the problems one meets and for the second problem imho it is necessary to understand point 3.

Wish you a Nice Day!

Good Luck

~/Morass

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 01.06.2024 14:24:25 (i1).

Десктопная версия, переключиться на мобильную.

При поддержке