

使用ngram分词解析器创建全文索引
1、对title字段建立全文索引(该字段没有固定的stopwords 分词,使用ngram分词解析器)
需先在my.cnf 配置文件中设置ngram_token_size(默认为2,2个字符作为ngram 的关键词),并重启mysql服务
这里使用默认的 2
select title from article limit 10; +------------------------------------------------------------------------------+ | title | +------------------------------------------------------------------------------+ | worth IT | |Launchpad 江南皮革厂小show | |Raw 幕后罕见一刻 “疯子”被抬回后台 | |Raw:公子大骂老爸你就是个绿茶 公子以一打四 | |四组30平米精装小户型,海量图片,附户型图 | |夜店女王性感烟熏猫眼妆 | |大秀哥重摔“巨石”强森 | |少女时代 崔秀英 服饰科普 林允儿 黄美英 金泰妍 郑秀晶 | |德阳户外踏青,花田自助烧烤 | +------------------------------------------------------------------------------+
2、对title字段创建全文索引
alter table article add fulltext index ft_index_title(title) with parser ngram; Query OK, 0 rows affected (3 min 29.22 sec) Records: 0 Duplicates: 0 Warnings: 0
3、会创建倒排索引(title字段越长长,创建的倒排索引越大)
112M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_1.ibd
28M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_2.ibd
20M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_3.ibd
140M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_4.ibd
128M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_5.ibd
668M Jul 5 21:46 FTS_00000000000000a7_00000000000000cd_INDEX_6.ibd
4、不建立全文索引搜索title的某个关键词
select count(*) from article where title like '%户外%'; +----------+ | count(*) | +----------+ | 22058 | +----------+ 1 row in set (8.60 sec) select count(*) from article where title like '%后台%'; +----------+ | count(*) | +----------+ | 1142 | +----------+
5、使用全文索引搜索某个关键词
响应时间有很大的提升
select count(*) from article where match(title) against('户外');
+----------+
| count(*) |
+----------+
| 22058 |
+----------+
1 row in set (0.07 sec)
select count(*) from article where title like '%后台%';
+----------+
| count(*) |
+----------+
| 1142 |
+----------+
1 row in set (8.31 sec)6、注意当搜索的关键词字符数大于2 (ngram_token_size定义大小)会出现不一致问题
普通搜索,实际中出现该关键词的记录数为6
select count(*) from article where title like '%公子大%';
+----------+
| count(*) |
+----------+
| 6 |
+----------+
1 row in set (8.40 sec)
全文搜索,出现关键字的记录数为9443
select count(*) from article where match(title) against('公子大');
+----------+
| count(*) |
+----------+
| 9443 |
+----------+
1 row in set (0.06 sec)
实际出现该关键字的记录数为1
select count(*) from article where title like '%花田自助%';
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (8.33 sec)
全文搜索出现该关键词的记录数为3202
select count(*) from article where match(title) against('花田自助');
+----------+
| count(*) |
+----------+
| 3202 |
+----------+
1 row in set (0.06 sec)结论
参考
InnoDB FULLTEXT Indexes
总结
