- Standard - 默认分词器,按词切分支持多语言,并小写处理
- Simple - 非字母切分, 小写处理 (UU-a 切成 uu,a doni`t 切成 doni 和 t)
- Stop - 小写处理,停用词过滤(the,a,is,2)
- Whitespace - 按照空格切分,不转小写
- Keyword - 不分词,直接将输入当作输出
- pattern - 通过正则自定义分隔符, 默认是 \W+,即非词符号分割
POST _analyze
{
"analyzer": "standard",
"text": "the sadf ss 2 Aeis"
}
{
"tokens": [
{
"token": "the",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "sadf",
"start_offset": 4,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "ss",
"start_offset": 9,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "2",
"start_offset": 12,
"end_offset": 13,
"type": "<NUM>",
"position": 3
},
{
"token": "aeis",
"start_offset": 14,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 4
}
]
}