KMP小计

来源：99网

KMP之前一直困扰了好久，说抽时间搞明白也是一直没时间，现在有点明白，但又时间太紧，所以仅记个大概来，方便我自己以后回顾。KMP之所以效率比较高，是因为模式串中有重复的元素。设S为主串，则要匹配的模式串就是T串。也就是说T串中有重复的元素，比如：
T = “abababc” 可以看出abababc中有重复的串，即前四位abab与从第2-5位的串是一样的（从0开始）
则根据其他参考资料，T的next数组为：
a b a b a b a c
-1 0 0 1 2 3 4 5
这里的next数组为从-1开始，因为有的博客中是从1开始，我觉得不符合字符串从0开始遍历的习惯，还是从-1开始比较好
废话不说，附上代码：

int getNext(const char *T, int *next){
    int j = 0, i = 1;
    next[0] = -1; next[1] = 0;
    while(i < strlen(T) ){
        if (-1 == j || T[i] == T[j]){
            ++i;
            ++j;
            next[i] = j;
        }
        else   
            j = next[j];
    }
    return 0;
}

到此，是KMP的next数组的代码，但是会有个问题，就是当T串为“aaaaaab”的时候，next数组的效率就会退化到一般的比较算法，所以有了以下改进代码：

int *getNext(const char *T){
    int j = 0, i = 1;
    int *next = new int[strlen(T)];
    next[0] = -1; next[1] = 0;
    while(i < strlen(T) ){
        if (-1 == j || T[i] == T[j])    {           
            ++i;
            ++j;
            if(T[i] != T[j])
                next[i] = j;
            else
                next[i] = next[j];
        }
        else   
            j = next[j];        
    }
    return next;
}

有了next数组之后，那么查询串T是都在S中的代码为：

int index_KMP(const char *T, const char *S, int *next){
    int *next = getNext(T);
    int i, j;
    for (i = 0, j = 0; i < strlen(S) && j < strlen(T); )
    {       
        if (-1 == j || S[i] == T[j])    {           
            ++i;
            ++j;
        }
        else  
            j = next[j];
    }   
    free(next); 
    if (j >= strlen(T)){        
        index = i - pattern_len;
        return true;
    }
    else{       
        index = -1; 
        return false;
    }
}

以上也有可能不严密的，仅供自己复习而已。

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部频道

KMP小计