ES 分词与检索问题,大佬帮忙看看

ES的Mapping映射字段

"title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": [
          "fullText"
        ]
      },
     "titleZh": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": [
          "fullText"
        ]
      },
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": [
          "fullText"
        ],
    		"term_vector": "with_positions_offsets"
      },
      "contentZh": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": [
          "fullText"
        ],
    		"term_vector": "with_positions_offsets"
      }

检索查询语法

{
    "bool" : {
      "should" : [
        {
          "match" : {
            "title" : {
              "query" : "8500万",
              "operator" : "OR",
              "analyzer" : "ik_smart",
              "prefix_length" : 0,
              "max_expansions" : 5,
              "minimum_should_match" : "1",
              "fuzzy_transpositions" : false,
              "lenient" : false,
              "zero_terms_query" : "NONE",
              "auto_generate_synonyms_phrase_query" : false,
              "boost" : 1.0
            }
          }
        },
        {
          "match_phrase" : {
            "title" : {
              "query" : "8500万",
              "analyzer" : "ik_smart",
              "slop" : 5,
              "zero_terms_query" : "NONE",
              "boost" : 1.0
            }
          }
        },
        {
          "match" : {
            "titleZh" : {
              "query" : "8500万",
              "operator" : "OR",
              "analyzer" : "ik_smart",
              "prefix_length" : 0,
              "max_expansions" : 5,
              "minimum_should_match" : "1",
              "fuzzy_transpositions" : false,
              "lenient" : false,
              "zero_terms_query" : "NONE",
              "auto_generate_synonyms_phrase_query" : false,
              "boost" : 1.0
            }
          }
        },
        {
          "match_phrase" : {
            "titleZh" : {
              "query" : "8500万",
              "analyzer" : "ik_smart",
              "slop" : 5,
              "zero_terms_query" : "NONE",
              "boost" : 1.0
            }
          }
        },
        {
          "match" : {
            "content" : {
              "query" : "8500万",
              "operator" : "OR",
              "analyzer" : "ik_smart",
              "prefix_length" : 0,
              "max_expansions" : 5,
              "minimum_should_match" : "1",
              "fuzzy_transpositions" : false,
              "lenient" : false,
              "zero_terms_query" : "NONE",
              "auto_generate_synonyms_phrase_query" : false,
              "boost" : 1.0
            }
          }
        },
        {
          "match_phrase" : {
            "content" : {
              "query" : "8500万",
              "analyzer" : "ik_smart",
              "slop" : 5,
              "zero_terms_query" : "NONE",
              "boost" : 1.0
            }
          }
        },
        {
          "match" : {
            "contentZh" : {
              "query" : "8500万",
              "operator" : "OR",
              "analyzer" : "ik_smart",
              "prefix_length" : 0,
              "max_expansions" : 5,
              "minimum_should_match" : "1",
              "fuzzy_transpositions" : false,
              "lenient" : false,
              "zero_terms_query" : "NONE",
              "auto_generate_synonyms_phrase_query" : false,
              "boost" : 1.0
            }
          }
        },
        {
          "match_phrase" : {
            "contentZh" : {
              "query" : "8500万",
              "analyzer" : "ik_smart",
              "slop" : 5,
              "zero_terms_query" : "NONE",
              "boost" : 1.0
            }
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  }

检索结果

问题

  • 8500万没有高亮
    • 我知道是分词原因,但如何在不改变分词方式的情况能让检索内容都高亮上
  • 相关性排序
    • 如何提高权重,以检索匹配度来排序
  • 检索优化
    • 这是个另外的问题,我有个场景是用标题全部文本去检索,但是检索的很慢,还会基于检索内容匹配到很多杂项,如何提高精准度,如果全文匹配,只检索到个别的文档?
2 Likes

顶顶顶啊

1 Like

帮顶

1 Like

再顶下

第一个问题:我理解只靠ES功能应该是无法实现的,只能是有分词匹配了才会高亮;
第二个问题:在没有其他排序条件下,ES默认返回的结果中是有相关度score分的;如果想要修改score的计算规则,可以通过function_score来实现;
第三个问题:其实这些都可以问GPT


另外你上面贴的query,我感觉可以直接用一个multi_match来实现,不需要每个字段写一个match,还能控制不同字段间的权重。
最后,我对 ES 理解和使用毕竟有限,有误的地方欢迎指正与探讨~

1 Like