Elasticsearch Term Vectors(词频统计)

mac2024-03-31  30

Term Vectors API returns information and statistics on terms in the fields of a particular document. The document could be stored in the index or artificially provided by the user.

作用:如果想进行全文检索,即从一个词搜索与它相关的文档,这就是Term Vectors。

Term Vectors记录的信息:

doc_freq索引中有几个记录出现、ttf该词在全部记录中出现的次数、term_freq在当前文档词的频率

tokens:position位置、start_offset开始的偏移值、end_offset结束的偏移值

JAVA:

@Test     public void testTermVectors() throws IOException {                  TermVectorsRequest request = new TermVectorsRequest("article_index", "15");         request.setFields("keyword");         request.setFieldStatistics(true);          request.setTermStatistics(true);          request.setPositions(true);          request.setOffsets(true);          request.setPayloads(false);          Map<String, Integer> filterSettings = new HashMap<>();         filterSettings.put("max_num_terms", 3);         filterSettings.put("min_term_freq", 1);         filterSettings.put("max_term_freq", 10);         filterSettings.put("min_doc_freq", 1);         filterSettings.put("max_doc_freq", 100);         filterSettings.put("min_word_length", 1);         filterSettings.put("max_word_length", 10);         request.setFilterSettings(filterSettings);          TermVectorsResponse response =  client.termvectors(request, RequestOptions.DEFAULT);                  List<TermVector> termVectorList = response.getTermVectorsList();         for(TermVector termVector:termVectorList) {             String fieldName = termVector.getFieldName();             FieldStatistics fieldStatistics = termVector.getFieldStatistics();             List<Term> terms = termVector.getTerms();             for(Term term:terms) {                 System.out.println("----term---"+term.getTerm()+"  -DocFreq:-" + term.getDocFreq()+"  -TermFreq:-"+term.getTermFreq()+"--"+term.getTokens());             }         }     }

 

 

更多代码请参考:https://github.com/hsn999/Elasticsearch_7_springboot_demo

 

 

最新回复(0)