IKAnalyzer使用不同版本中文分词的切词方式实现相同功能效果

最近公司在做一个题库的功能，需要用到中文分词和公式分词的工具，最开始用 ikanalyzer 2012f 版本 + lunece 6.5.1做了一版中文分词工具。

具体如下：

一、ikanalyzer 2012f + lunece 6.5.1 实现中文分词

public static list<string> analysisbyik(analyzer analyzer,string field, string content){

if(stringutils.isnullorempty(content)){

return null;

}

tokenstream ts = null;

try {

ts = analyzer.tokenstream(field, new stringreader(content));

chartermattribute term = ts.addattribute(chartermattribute.class);

ts.reset();

list<string> vocabularies = new arraylist<>();

while (ts.incrementtoken()) {

vocabularies.add(term.tostring());

}

ts.end();

return vocabularies;

} catch (exception e) {

logger.error(e.getmessage(), e);

} finally {

if (ts != null) {

try {

ts.close();

} catch (ioexception e) {

e.printstacktrace();

}

return null;

}

调用方式：

string str =

"已知三角形abc中，角a等于角b加角c，那么三角形abc是 a、锐角三角形 b、直角三角形 c、钝角三角形 d、不能确定"

;

analyzer analyzer = new ikanalyzer(true);

iklist = analysisbyik(analyzer, "myfield", str);

listanalyzer.addall(iklist);

输出结果listanalyzerd：

[已知, 三角形, abc, 中, 角, a, 等于, 角, b, 加, 角, c, 那么, 三角形, abc, 是, a, 锐角三角形, b, 直角三角形, c, 钝角三角形, d, 不能, 确定]

但是由于公式切词是原来公司大牛写的，在满足公式切词的条件下，中文切词的ikanalyzer 2012f与其不兼容。于是尝试其他版本，最终决定用 ikanalyzer 3.2.8 实现了兼容。

二、ikanalyzer 3.2.8 + lunece 3.1.0 兼容版本