ElasticSearch安装使用

标签：count type 使用 value doc ElasticSearch offset 安装 match

要吃多少根冰棍才能说出如此冰冷刺骨的话语

简介

有了mysql，为什么还要用elasticsearch?

mysql更多是用来存储数据，在数据量过多的时候，使用ES来检索数据（快）。

ES基本概念

Index（db库）——> type（table 表）——> document（一行数据）

ES检索数据为什么这么快

核心：倒排索引

如：保存记录

红海行动
探索红海行动
红海特别行动
红海记录片
特工红海特别探索

将内容分词记录到索引中

词	记录
红海	1,2,3,4,5
行动	1,2,3
探索	2,5
特别	3,5
纪录片	4,
特工	5

查询红海特工行动：查出后计算相关性得分，3号记录命中了2次，且3号本身才有3个单词，2/3，所以3号最匹配。

ES安装

下载ES（数据存储与检索，相当于mysql)，kibana(可视化检索，相当于navicat)

docker pull elasticsearch:7.17.6
docker pull kibana:7.17.6
版本要统一

容器配置

# 将docker里的目录挂载到linux的/mydata目录中
# 修改/mydata就可以改掉docker里的
mkdir -p /home/docker/elasticsearch/config
mkdir -p /home/docker/elasticsearch/data

# es可以被远程任何机器访问
echo "http.host: 0.0.0.0" >/home/docker/elasticsearch/config/elasticsearch.yml

# 递归更改权限，es需要访问
chmod -R 777 /home/docker/elasticsearch/

启动容器

# 9200是用户交互端口 9300是集群心跳端口
# -e指定是单阶段运行(单机)
# -e指定占用的内存大小，生产时可以设置32G
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e  "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /home/docker/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /home/docker/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /home/docker/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.17.6


# 设置开机启动elasticsearch
docker update elasticsearch --restart=always

# kibana指定了了ES交互端口9200  # 5600位kibana主页端口
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://ip:9200 -p 5601:5601 -d kibana:7.17.6


# 设置开机启动kibana
docker update kibana  --restart=always

docker使用小技巧：

在启动docker容器的时候，如果容器运行不起来或者起来马上挂掉，可以查看启动日志

dockerlogs '容器id/容器name'

启动测试

# 查看ES是否正常启动
# 浏览器访问：http://ip:9200

{
    "name": "66718a266132",
    "cluster_name": "elasticsearch",
    "cluster_uuid": "xhDnsLynQ3WyRdYmQk5xhQ",
    "version": {
        "number": "7.4.2",
        "build_flavor": "default",
        "build_type": "docker",
        "build_hash": "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
        "build_date": "2019-10-28T20:40:44.881551Z",
        "build_snapshot": false,
        "lucene_version": "8.2.0",
        "minimum_wire_compatibility_version": "6.8.0",
        "minimum_index_compatibility_version": "6.0.0-beta1"
    },
    "tagline": "You Know, for Search"
}

# 查看kibana是否正常启动
# 浏览器访问： http://ip:5601/app/kibana

ES基础操作之批量操作——`bulk`

在kibana的dev tools里进行操作

POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}

#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
  "took" : 304,
  "errors" : false,
  "items" : [
    {
      "delete" : { 删除
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 1,
        "result" : "not_found", 没有该记录
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 404 没有该
      }
    },
    {
      "create" : {  创建
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 2,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {  保存
        "_index" : "website",
        "_type" : "blog",
        "_id" : "5sKNvncBKdY1wAQmeQNo",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "update" : { 更新
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 3,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}

ES进阶检索

es支持两种基本方式检索

通过uri + 检索参数检索文档
通过uri + 请求体检索文档

通过`uri + 检索参数`检索文档

请求示例：

GET bank/_search?q=*&sort=account_number:asc
# 参数说明
q*: 查询所有
sort: 排序字段
asc: 升序

# 检索bank下所有信息，包括type和docs
GET bank/_search

通过`uri + 请求体`检索文档

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" },
    { "balance":"desc"}
  ]
}

查询返回内容

took – 花费多少ms搜索
timed_out – 是否超时
shards – 多少分片被搜索了，以及多少成功/失败的搜索分片
max_score –文档相关性最高得分
hits.total.value - 多少匹配文档被找到
hits.sort - 结果的排序key（列），没有的话按照score排序
hits._score - 相关得分 (not applicable when using match_all)

ES特定查询语言DSL

es提供的一个可以执行查询Json风格的DSL（domain-specific language)。

基本语法格式

典型查询结构

{
    QUERY_NAME:{ #使用的功能
        FIELD_NAME:{ #功能参数
            ARGUMENT:VALUE,
            ARGUMENT:VALUE,...
        }
    }
}

示例：

GET bank/_search
{
    "query" : { #查询的字段
        "match_all":{}
    },
    "from":0, #从第几条文档开始查
    "size":5,
    "_source":["balabce"], #要返回的字段
    "sort":[
    {
        "account_number":{ #返回结果按哪个列排序
            "order":"desc"
        }
    }
    ]
}

参数说明：

match_all：查询类型【代表查询所有的索引】，es中可以在query中组合非常多的查询类型完成复杂查询。
除了query参数外，可以传递其它参数过滤查询结果。
from + size限定，完成分页功能。
sort排序，多字段排序，会在前序字段相等时后续字段内部排序，否则以前序为准。

查询结果：

{
  "took" : 18,  #   花了18ms
  "timed_out" : false,  # 没有超时
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,  # 命令1000条
      "relation" : "eq"   
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "999",  # 第一条数据id是999
        "_score" : null,  # 得分信息
        "_source" : {
          "firstname" : "Dorothy",
          "balance" : 6087
        },
        "sort" : [  #  排序字段的值
          999
        ]
      },
      省略......

query/match匹配查询

如果是非字符串，会进行精确匹配。如果是字符串，会进行全文检索。

基本类型（非字符串），精确匹配

GET bank/_search
{
    "query":{
        "match":{
            "account_number":"20"
        }
    }
}

查询结果：返回account_number=20的数据

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,  // 得到一条
      "relation" : "eq"
    },
    "max_score" : 1.0,  # 最大得分
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "20",
        "_score" : 1.0,
        "_source" : {  # 该条文档信息
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "elinorratliff@scentric.com",
          "city" : "Ribera",
          "state" : "WA"
        }
      }
    ]
  }
}

字符串，全文检索

GET bank/_search
{
    "query": {
        "match": {
            "address":"kings"
        }
    }
}

查询结果：最终会按照评分进行排序，会对检索条件进行分词匹配

{
  "took" : 30,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 5.990829,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "20",
        "_score" : 5.990829,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "elinorratliff@scentric.com",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "722",
        "_score" : 5.990829,
        "_source" : {
          "account_number" : 722,
          "balance" : 27256,
          "firstname" : "Roberts",
          "lastname" : "Beasley",
          "age" : 34,
          "gender" : "F",
          "address" : "305 Kings Hwy",
          "employer" : "Quintity",
          "email" : "robertsbeasley@quintity.com",
          "city" : "Hayden",
          "state" : "PA"
        }
      }
    ]
  }
}

query/match_phrase[不拆分匹配]

将需要匹配的值当成一整个单词（不分词）进行检索

match_phrase：不拆分字符串进行检索
字段.keyword：必须全匹配上才检索成功

两者区别：

使用keyword,匹配的条件就是要显示字段的全部值，精确匹配。

match_phrase是做短语匹配，只要文本中包含匹配条件，就能匹配到。

使用示例：

GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill road"   #  就是说不要匹配只有mill或只有road的，要匹配mill road一整个子串
    }
  }
}

{
  "took" : 32,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 8.926605,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 8.926605,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road", # "mill road"
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "990 Mill"  # 字段后面加上 .keyword
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0, # 因为要求完全equal，所以匹配不到
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

query/multi_math[多字段匹配]

如：state或address中包含mill（查询过程中，会对查询条件进行分词）

GET bank/_search
{
  "query": {
    "multi_match": {  # 前面的match仅指定了一个字段。
      "query": "mill",
      "fields": [ # state和address有mill子串  不要求都有
        "state",
        "address"
      ]
    }
  }
}

{
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 5.4032025,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",  # 有mill
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"  # 没有mill
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane", # mill
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"  # 没有mill
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "345",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 345,
          "balance" : 9812,
          "firstname" : "Parker",
          "lastname" : "Hines",
          "age" : 38,
          "gender" : "M",
          "address" : "715 Mill Avenue",  # 
          "employer" : "Baluba",
          "email" : "parkerhines@baluba.com",
          "city" : "Blackgum",
          "state" : "KY"  # 没有mill
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "472",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 472,
          "balance" : 25571,
          "firstname" : "Lee",
          "lastname" : "Long",
          "age" : 32,
          "gender" : "F",
          "address" : "288 Mill Street", #
          "employer" : "Comverges",
          "email" : "leelong@comverges.com",
          "city" : "Movico",
          "state" : "MT" # 没有mill
        }
      }
    ]
  }
}

query/bool/must复合查询

must：必须达到must所列举的所有条件
must_not：必须不匹配must_not所列举的所有条件
should：应该满足should所列举的条件，越满足得分越高

使用示例：

GET bank/_search
{
   "query":{
        "bool":{  # 
             "must":[ # 必须有这些字段
              {"match":{"address":"mill"}},
              {"match":{"gender":"M"}}
             ]
         }
    }
}

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "gender": "M" }},
        { "match": {"address": "mill"}}
      ],
      "must_not": [  # 不可以是指定值
        { "match": { "age": "38" }}
      ]
   }
}

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match": {
            "address": "mill"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "age": "18"
          }
        }
      ],
      "should": [
        {
          "match": {
            "lastname": "Wallace"
          }
        }
      ]
    }
  }
}

query/filter[结果过滤]

must 贡献得分
should 贡献得分
must_not 不贡献得分
filter 不贡献得分

并不是所有的查询都需要产生分数，特别是那些仅用于filter过滤的文档。

为了不计算分数，es会自动检查场景并且优化查询的执行。

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": {"address": "mill" } }
      ],
      "filter": {  # query.bool.filter
        "range": {
          "balance": {  # 哪个字段
            "gte": "10000",
            "lte": "20000"
          }
        }
      }
    }
  }
}

先是查询所有匹配address=mill的文档，然后再根据10000<=balance<=20000进行过滤查询结果。

query/term

和match一样，匹配某个属性的值。

区别：

全文检索字段用match
其他非text字段匹配用term

使用示例：

GET bank/_search
{
  "query": {
    "term": {
      "address": "mill Road"
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0, # 没有
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

更换为match进行检索

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 32,
      "relation" : "eq"
    },
    "max_score" : 8.926605,
    "hits" : [

aggs/agg1聚合

聚合提供了从数据中分组和提取数据的能力。

aggs：执行聚合

"aggs":{ #聚合
    "aggs_name":{ # 这次聚合的名字，方便展示在结果集中
        "AGG_TYPE" {} #聚合的类型（avg,term,terms)
    }
}

# terms: 看值的可能性分布，会合并锁查字段，给出计数即可
# avg: 看值的平均分布

使用示例：

address中包含mill的所有人的年龄分布以及平均年龄，但不显示这些人的详情

# 分别为包含mill、，平均年龄、
GET bank/_search
{
  "query": { # 查询出包含mill的
    "match": {
      "address": "Mill"
    }
  },
  "aggs": { #基于查询聚合
    "ageAgg": {  # 聚合的名字，随便起
      "terms": { # 看值的可能性分布
        "field": "age",
        "size": 10
      }
    },
    "ageAvg": { 
      "avg": { # 看age值的平均
        "field": "age"
      }
    },
    "balanceAvg": {
      "avg": { # 看balance的平均
        "field": "balance"
      }
    }
  },
  "size": 0  # 不看详情
}

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4, // 命中4条
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : { // 第一个聚合的结果
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 38,  # age为38的有2条
          "doc_count" : 2
        },
        {
          "key" : 28,
          "doc_count" : 1
        },
        {
          "key" : 32,
          "doc_count" : 1
        }
      ]
    },
    "ageAvg" : { // 第二个聚合的结果
      "value" : 34.0  # age字段的平均值是34
    },
    "balanceAvg" : {
      "value" : 25208.0
    }
  }
}

aggs/aggName/aggs/aggName子聚合

按照年龄聚合，并且求这些年龄段的这些人的平均薪资

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": { # 看分布
        "field": "age",
        "size": 100
      },
      "aggs": { # 与terms并列
        "ageAvg": { #平均
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

{
  "took" : 49,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 31,
          "doc_count" : 61,
          "ageAvg" : {
            "value" : 28312.918032786885
          }
        },
        {
          "key" : 39,
          "doc_count" : 60,
          "ageAvg" : {
            "value" : 25269.583333333332
          }
        },
        {
          "key" : 26,
          "doc_count" : 59,
          "ageAvg" : {
            "value" : 23194.813559322032
          }
        },
        {
          "key" : 32,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 23951.346153846152
          }
        },
        {
          "key" : 35,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 22136.69230769231
          }
        },
        {
          "key" : 36,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 22174.71153846154
          }
        },
        {
          "key" : 22,
          "doc_count" : 51,
          "ageAvg" : {
            "value" : 24731.07843137255
          }
        },
        {
          "key" : 28,
          "doc_count" : 51,
          "ageAvg" : {
            "value" : 28273.882352941175
          }
        },
        {
          "key" : 33,
          "doc_count" : 50,
          "ageAvg" : {
            "value" : 25093.94
          }
        },
        {
          "key" : 34,
          "doc_count" : 49,
          "ageAvg" : {
            "value" : 26809.95918367347
          }
        },
        {
          "key" : 30,
          "doc_count" : 47,
          "ageAvg" : {
            "value" : 22841.106382978724
          }
        },
        {
          "key" : 21,
          "doc_count" : 46,
          "ageAvg" : {
            "value" : 26981.434782608696
          }
        },
        {
          "key" : 40,
          "doc_count" : 45,
          "ageAvg" : {
            "value" : 27183.17777777778
          }
        },
        {
          "key" : 20,
          "doc_count" : 44,
          "ageAvg" : {
            "value" : 27741.227272727272
          }
        },
        {
          "key" : 23,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27314.214285714286
          }
        },
        {
          "key" : 24,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 28519.04761904762
          }
        },
        {
          "key" : 25,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27445.214285714286
          }
        },
        {
          "key" : 37,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27022.261904761905
          }
        },
        {
          "key" : 27,
          "doc_count" : 39,
          "ageAvg" : {
            "value" : 21471.871794871793
          }
        },
        {
          "key" : 38,
          "doc_count" : 39,
          "ageAvg" : {
            "value" : 26187.17948717949
          }
        },
        {
          "key" : 29,
          "doc_count" : 35,
          "ageAvg" : {
            "value" : 29483.14285714286
          }
        }
      ]
    }
  }
}

nested对象聚合

属性是"type": “nested”,因为是内部的属性进行检索

数组类型的对象会被扁平化处理（对象的每个属性会分别存储到一起）

user.name=["aaa","bbb"]
user.addr=["ccc","ddd"]

这种存储方式，可能会发生如下错误：
错误检索到{aaa,ddd}，这个组合是不存在的

数组的扁平化处理会使检索能检索到本身不存在的，为了解决这个问题，就采用了嵌入式属性，数组里是对象时用嵌入式属性（不是对象无需用嵌入式属性）。

GET articles/_search
{
  "size": 0, 
  "aggs": {
    "nested": { # 
      "nested": { #
        "path": "payment"
      },
      "aggs": {
        "amount_avg": {
          "avg": {
            "field": "payment.amount"
          }
        }
      }
    }
  }
}

Mapping字段映射

es字段类型

参考： https://www.elastic.co/guide/en/elasticsearch/reference/7.16/mapping-types.html

映射是用来定义一个文档，以及它所包含的属性是如何存储和索引的。

如：使用mapping来定义：

哪些字符串应该被看做全文本属性
哪些属性包含数字、日期或地理位置
文档中的所有属性是否都能被索引
日期的格式
自定义映射规则来执行动态添加属性
查看mapping信息：GET bank/_mapping

 {
    "bank" : {
      "mappings" : {
        "properties" : {
          "account_number" : {
            "type" : "long" # long类型
          },
          "address" : {
            "type" : "text", # 文本类型，会进行全文检索，进行分词
            "fields" : {
              "keyword" : { # addrss.keyword
                "type" : "keyword",  # 该字段必须全部匹配到
                "ignore_above" : 256
              }
            }
          },
          "age" : {
            "type" : "long"
          },
          "balance" : {
            "type" : "long"
          },
          "city" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "email" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "employer" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "firstname" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "gender" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "lastname" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "state" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }

es的不同版本

ElasticSearch7-去掉type概念

关系型数据库中两个数据表示是独立的，即使他们里面有相同名称的列也不影响使用，但ES中不是这样的。elasticsearch是基于Lucene开发的搜索引擎，而ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的。

两个不同type下的两个user_name，在ES同一个索引下其实被认为是同一个filed，你必须在两个不同的type中定义相同的filed映射。否则，不同type中的相同字段名称就会在处理中出现冲突的情况，导致Lucene处理效率下降。
去掉type就是为了提高ES处理数据的效率。

Elasticsearch 7.x URL中的type参数为可选。比如，索引一个文档不再要求提供文档类型。

Elasticsearch 8.x 不再支持URL中的type参数。

解决：将索引从多类型迁移到单类型，每种类型文档一个独立索引

将已存在的索引下的类型数据，全部迁移到指定位置即可。详见数据迁移

分词

一个tokenizer分词器接收一个字符流，将之分割为独立的tokens词元（通常是独立的单词），然后输出tokens流。

如：

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 Brown-Foxes bone."
}

{
  "tokens" : [
    {
      "token" : "the",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "2",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<NUM>",
      "position" : 1
    },
    {
      "token" : "brown",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "foxes",
      "start_offset" : 12,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "bone",
      "start_offset" : 18,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]
}

对于中文，需要安装额外的分词器。

ik分词器

安装

所有的语言分词，默认使用的都是"standard analyzer"，这些分词器针对中文的分词不友好，因此，需要安装中文的分词器。

下载

下载路径： https://github.com/medcl/elasticsearch-analysis-ik/releases

创建ik文件夹，并将下载好的zip文件包解压，放入到ik文件夹

注意，这里不创建ik文件夹直接解压在plugins目录下的话，服务起不了，会报错。

#进入es安装目录下的plugins下，创建ik文件夹
cd /home/docker/elasticsearch/plugins
mkdir ik
 
#将下载好的zip包放入ik文件夹下，执行解压
unzip elasticsearch-analysis-ik-7.17.6.zip

重启服务
验证

http://192.168.106.130:9200/_cat/plugins

f70ffa187733 analysis-ik 7.17.6

分词器测试

使用默认的分词器

GET _analyze
{
   "text":"我是中国人"
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "中",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "国",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    }
  ]
}

GET _analyze
{
   "analyzer": "ik_smart", 
   "text":"我是中国人"
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

GET _analyze
{
   "analyzer": "ik_max_word", 
   "text":"我是中国人"
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "国人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

自定义词库

修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict"></entry>
     <!--用户可以在这里配置自己的扩展停止词字典-->
    <entry key="ext_stopwords"></entry>
    <!--用户可以在这里配置远程扩展字典 -->
    <entry key="remote_ext_dict">http://ip/es/fenci.txt</entry> 
    <!--用户可以在这里配置远程扩展停止词字典-->
    <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

修改完成后，重启elasticsearch容器。

更新完成后，es只会对于新增的数据用更新分词。历史数据是不会重新分词的。如果想要历史数据重新分词，需要执行：

POST my_index/_update_by_query?conflicts=proceed

标签：count,type,使用,value,doc,ElasticSearch,offset,安装,match
From： https://www.cnblogs.com/l12138h/p/16716351.html

要吃多少根冰棍才能说出如此冰冷刺骨的话语

简介

ES基本概念

ES检索数据为什么这么快

ES基础操作之批量操作——`bulk`

ES进阶检索

通过`uri + 检索参数`检索文档

通过`uri + 请求体`检索文档

查询返回内容

ES特定查询语言DSL

基本语法格式

query/match匹配查询

query/match_phrase[不拆分匹配]

query/multi_math[多字段匹配]

query/bool/must复合查询

query/filter[结果过滤]

query/term

aggs/agg1聚合

aggs/aggName/aggs/aggName子聚合

nested对象聚合

Mapping字段映射

es的不同版本

分词

ik分词器

安装

分词器测试

自定义词库

相关文章

赞助商

阅读排行

ElasticSearch安装使用

要吃多少根冰棍才能说出如此冰冷刺骨的话语

简介

ES基本概念

ES检索数据为什么这么快

ES基础操作之批量操作——bulk

ES进阶检索

通过uri + 检索参数检索文档

通过uri + 请求体检索文档

查询返回内容

ES特定查询语言DSL

基本语法格式

query/match匹配查询

query/match_phrase[不拆分匹配]

query/multi_math[多字段匹配]

query/bool/must复合查询

query/filter[结果过滤]

query/term

aggs/agg1聚合

aggs/aggName/aggs/aggName子聚合

nested对象聚合

Mapping字段映射

es的不同版本

分词

ik分词器

安装

分词器测试

自定义词库

相关文章

赞助商

阅读排行

ES基础操作之批量操作——`bulk`

通过`uri + 检索参数`检索文档

通过`uri + 请求体`检索文档