Longest Common Substring(最长公共子串)
题目描述
A string is finite sequence of characters over a non-empty finite set Σ.
In this problem, Σ is the set of lowercase letters.
Substring, also called factor, is a consecutive sequence of characters occurrences at least once in a string.
Now your task is simple, for two given strings, find the length of the longest common substring of them.
Here common substring means a substring of two or more strings.
输入格式
The input contains exactly two lines, each line consists of no more than 250000 lowercase letters, representing a string.
输出格式
The length of the longest common substring. If such string doesn’t exist, print “0” instead.
题意翻译
输入2 个长度不大于250000的字符串,输出这2 个字符串的最长公共子串。如果没有公共子串则输出0 。
Translated by @xyz32768
输入输出样例
输入
alsdfkjfjkdsal
fdjskalajfkdsla
输出
3
有两个字符串$A$和$B$,我们要求它们的最长公共连续子串。
首先,我们对$A$建立一个$SAM$。
定义:$Len$为当前$B$的前$i$个字符组成的子串与$A$的最长公共子序列,$State$为当前状态,初始化为0(初始状态)。$next(State,ch)$为在$State$状态节点处往$ch$道路走的下一状态。
匹配:
然后我们从$B[i=0]$开始,在$A$的$SAM$上走,一个一个匹配,若:
- 当前状态朝着$B[i]$往下走有路,说明可以继续往下匹配,就接着走,即$State=next(State,B[i]),++Len$。
- 如果没有路了,就跳到当前状态在后缀连接树上的父节点,如果父节点还是没有$B[i]$的路,就一直往上跳(即$State=link(State)$),直到遇到能往下走的边。此时就令$Len=len(当前状态)+1$,$State=next(State,B[i])$。
- 如果跳到头了都没有能走的路的话,就说明要从B[i]开始重新匹配,令$Len=0,State=0$。
原理:如果$B[i]$在当前位置下失配(无路可走),那么说明当前状态下的所有子串都失配了,但是它的后缀连接树上的父节点不一定失配,就继续往上找,即相当于当前已经匹配的$A$的子串的左边界往右移,然后继续找路。如果一直没路,就一直往上找,直到达到 初始状态 ,如果此时仍没有路的话,说名在当前$Len$长度下已经是$B[上一次从初始状态开始匹配的i]$开始的最长公共子串了,无法在加长了。那就让$Ans=max(Ans,Len)$,让后以$B[i]$为新的开头从初始状态重新开始匹配。即整个过程就是再找$B$的所有前缀的后缀最长能和$A$匹配多少。
*[初始状态]:A构建的SAM的初始状态。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
using namespace std;
const int MAXN = 250005;
int n;
char
A[MAXN],
B[MAXN];
struct SAM {
int size, last;
struct Node {
int len = 0, link = 0;
int next[26];
void clear() {
len = link = 0;
memset(next, 0, sizeof(next));
}
} node[MAXN * 2];
void init() {
for (int i = 0; i < size; i++) {
node[i].clear();
}
node[0].link = -1;
size = 1;
last = 0;
}
void insert(char x) {
int ch = x - 'a';
int cur = size++;
node[cur].len = node[last].len + 1;
int p = last;
while (p != -1 && !node[p].next[ch]) {
node[p].next[ch] = cur;
p = node[p].link;
}
if (p == -1) {
node[cur].link = 0;
}
else {
int q = node[p].next[ch];
if (node[p].len + 1 == node[q].len) {
node[cur].link = q;
}
else {
int clone = size++;
node[clone] = node[q];
node[clone].len = node[p].len + 1;
while (p != -1 && node[p].next[ch] == q) {
node[p].next[ch] = clone;
p = node[p].link;
}
node[q].link = node[cur].link = clone;
}
}
last = cur;
}
}sam;
int getNextState(const int& CurState,int Loc) {
return sam.node[CurState].next[Loc - 'a'];
}
int Compute(int n) {
int
&& Ans = 0,
&& CurState = 0,
&& Len = 0;
for (int i = 0; i < n; ++i) {
//如果有路可走,就走噻
if (getNextState(CurState, B[i])) {
CurState = getNextState(CurState, B[i]);
++Len;
}
//否则
else {
//跳link
for (CurState = sam.node[CurState].link;; CurState = sam.node[CurState].link) {
//如果跳到了
if (CurState > 0 && getNextState(CurState, B[i])) {
Len = sam.node[CurState].len + 1;
CurState = getNextState(CurState, B[i]);
break;
}
//如果跳到初始状态。
else if (CurState <= 0) {
Len = 0;
CurState = 0;
break;
}
}
}
Ans = max(Ans, Len);
}
return Ans;
}
int main() {
scanf("%s%s", &A, &B);
int Len_A = strlen(A);
sam.init();
for (int i = 0; i < Len_A; ++i) {
sam.insert(A[i]);
}
printf("%d", Compute(strlen(B)));
return 0;
}