1、下载Solr5.3.1
http://mirror.bit.edu.cn/apache/lucene/solr/5.3.1/
wget http:
//mirror.bit.edu.cn/apache/lucene/solr/5.3.0/solr-5.3.0.tgz
2、解压压缩包
tar zxf solr-
5.3.
1.tgz
或
unzip solr-
5.3.
1.zip
3、配置solr
1、复制solr项目文件
mkdir -p /data/web/solr/solr_app/cp -r /data/solr-
5.3.
1/server/solr-webapp/webapp
/* /data/web/solr/solr_app/
2、复制dll文件
cp /data/solr-
5.3.
1/server/lib/ext
/* /data/web/solr/solr_app/WEB-INF/lib/
3、复制日志文件
mkdir /data/web/solr/solr_app/WEB-INF/
classes
cp /data/solr-
5.3.
1/server/resources/log4j.properties /data/web/solr/solr_app/WEB-INF/classes/
4、修改solr.log文件的存储位置:默认在/root/logs/solr.log
vim /data/web/solr/solr_app/WEB-INF/classes/log4j.properties
改成自己的日志路径
5、复制solr.xml文件到web.xml里面的<env-entry-value>的路径下
mkdir -p /data/web/solr/solr_app/WEB-INF/
solr_home
cp /data/solr-5.3.1/example/example-DIH/solr/solr.xml /data/web/solr/solr_app/WEB-INF/solr_home/
6、配置solr_home
vim /data/web/solr/solr_app/WEB-INF/web.xml --修改env-entry-value的值:/data/web/solr/solr_app/WEB-INF/solr_home
tomcat配置->Server.xml->Connector->connectionTimeout="20000"不知道为什么,这个值大了启动tomcat,solr页面显示就是失败的。
启动tomcat,此时没有集合,如下图:
4、配置solr集合
1、进入solr_home,开始配置solr的索引库、分词器、数据源和定时任务:
cd /data/web/solr/solr_app/WEB-INF/solr_home/
2、为某一个语言创建solr配置,首先需要该语言的目录,比如:英文
mkdir pc_EN
cd pc_EN
touch core.properties
mkdir conf
mkdir data
3、编辑core.properties文件,设置索引名称和索引存放的位置:
vim core.properties
--指定索引文件的存放位置(solr_index目录可以创建了mkdir -p /data/web/solr/solr_app/WEB-INF/solr_index) --文件内容name=pc_ENdataDir=/data/web/solr/solr_app/WEB-INF/solr_index/master/pc_EN/data
4、进入conf目录设置索引的数据格式、数据源
cd conffind /data -name solrconfig.xml
把rss文件夹下面的solrconfig.xml复制到pc_EN/conf目录里面
cp /data/solr-
5.3.
0/example/example-DIH/solr/rss/conf/solrconfig.xml solrconfig.xml
设置solrconfig.xml关联website-data-config.xml文件
vim solrconfig.xml --搜索name="/dataimport"
设置solrconfig.xml的solr搜索结果返回的数据格式为:xml
设置solrconfig.xml关联schema.xml文件,增加如下代码:
<requestHandler name=
"/replication" class=
"solr.ReplicationHandler" >
<lst name=
"master">
<str name=
"replicateAfter">commit</str>
<str name=
"replicateAfter">startup</str>
<str name=
"confFiles">schema.xml</str>
</lst>
</requestHandler>
完整的solrconfig.xml文件
1 <?xml version=
"1.0" encoding=
"UTF-8" ?>
2 <!--
3 Licensed to the Apache Software Foundation (ASF) under one or more
4 contributor license agreements. See the NOTICE file distributed with
5 this work
for additional information regarding copyright ownership.
6 The ASF licenses
this file to You under the Apache License, Version
2.0
7 (the
"License"); you may not use
this file except
in compliance with
8 the License. You may obtain a copy of the License at
9
10 http:
//www.apache.org/licenses/LICENSE-2.0
11
12 Unless required by applicable law or agreed to
in writing, software
13 distributed under the License
is distributed on an
"AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License
for the specific language governing permissions and
16 limitations under the License.
17 -->
18
19 <!--
20 This
is a stripped down config file used
for a simple example...
21 It
is *not* a good example to work
from.
22 -->
23 <config>
24 <luceneMatchVersion>
5.3.
1</luceneMatchVersion>
25 <!-- The DirectoryFactory to use
for indexes.
26 solr.StandardDirectoryFactory, the
default,
is filesystem based.
27 solr.RAMDirectoryFactory
is memory based, not persistent, and doesn
't work with replication. -->
28 <directoryFactory name=
"DirectoryFactory" class=
"${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
29
30 <dataDir>${solr.data.dir:}</dataDir>
31
32 <!-- To enable
dynamic schema REST APIs, use the following
for <schemaFactory>
:
33
34 <schemaFactory
class=
"ManagedIndexSchemaFactory">
35 <
bool name=
"mutable">
true</
bool>
36 <str name=
"managedSchemaResourceName">managed-schema</str>
37 </schemaFactory>
38
39 When ManagedIndexSchemaFactory
is specified, Solr will load the schema
from
40 he resource named
in 'managedSchemaResourceName', rather than
from schema.xml.
41 Note that the managed schema resource CANNOT be named schema.xml. If the managed
42 schema does not exist, Solr will create it after reading schema.xml, then rename
43 'schema.xml' to
'schema.xml.bak'.
44
45 Do NOT hand edit the managed schema -
external modifications will be ignored and
46 overwritten
as a result of schema modification REST API calls.
47
48 When ManagedIndexSchemaFactory
is specified with mutable =
true, schema
49 modification REST API calls will be allowed; otherwise, error responses will be
50 sent back
for these requests.
51 -->
52 <codecFactory
class=
"solr.SchemaCodecFactory"/>
53 <schemaFactory
class=
"ClassicIndexSchemaFactory"/>
54
55 <updateHandler
class=
"solr.DirectUpdateHandler2">
56 <updateLog>
57 <str name=
"dir">${solr.data.dir:}</str>
58 <
int name=
"numVersionBuckets">${solr.ulog.numVersionBuckets:
65536}</
int>
59 </updateLog>
60 </updateHandler>
61
62 <query>
63 <!--
Max Boolean Clauses
64
65 Maximum number of clauses
in each BooleanQuery, an exception
66 is thrown
if exceeded.
67
68 ** WARNING **
69
70 This option actually modifies a
global Lucene property that
71 will affect all SolrCores. If multiple solrconfig.xml files
72 disagree on
this property, the value at any given moment will
73 be based on the last SolrCore to be initialized.
74
75 -->
76 <maxBooleanClauses>
1024</maxBooleanClauses>
77
78
79 <!--
Solr Internal Query Caches
80
81 There are two implementations of cache available
for Solr,
82 LRUCache, based on a synchronized LinkedHashMap, and
83 FastLRUCache, based on a ConcurrentHashMap.
84
85 FastLRUCache has faster gets and slower puts
in single
86 threaded operation and thus
is generally faster than LRUCache
87 when the hit ratio of the cache
is high (>
75%
), and may be
88 faster under other scenarios on multi-
cpu systems.
89 -->
90
91 <!--
Filter Cache
92
93 Cache used by SolrIndexSearcher
for filters (DocSets),
94 unordered sets of *all*
documents that match a query. When a
95 new searcher
is opened, its caches may be prepopulated or
96 "autowarmed" using data
from caches
in the old searcher.
97 autowarmCount
is the number of items to prepopulate. For
98 LRUCache, the autowarmed items will be the most recently
99 accessed items.
100
101 Parameters:
102 class -
the SolrCache implementation LRUCache or
103 (LRUCache or FastLRUCache)
104 size - the maximum number of entries
in the cache
105 initialSize -
the initial capacity (number of entries) of
106 the cache. (see java.util.HashMap)
107 autowarmCount - the number of entries to prepopulate
from
108 and old cache.
109 -->
110 <filterCache
class=
"solr.FastLRUCache"
111 size=
"512"
112 initialSize=
"512"
113 autowarmCount=
"0"/>
114
115 <!--
Query Result Cache
116
117 Caches results of searches -
ordered lists of document ids
118 (DocList) based on a query, a sort, and the range of documents requested.
119 Additional supported parameter by LRUCache:
120 maxRamMB - the maximum amount of RAM (
in MB) that
this cache
is allowed
121 to occupy
122 -->
123 <queryResultCache
class=
"solr.LRUCache"
124 size=
"512"
125 initialSize=
"512"
126 autowarmCount=
"0"/>
127
128 <!--
Document Cache
129
130 Caches Lucene Document objects (the stored fields
for each
131 document). Since Lucene
internal document ids are transient,
132 this cache will not be autowarmed.
133 -->
134 <documentCache
class=
"solr.LRUCache"
135 size=
"512"
136 initialSize=
"512"
137 autowarmCount=
"0"/>
138
139 <!-- custom cache currently used by block join -->
140 <cache name=
"perSegFilter"
141 class=
"solr.search.LRUCache"
142 size=
"30"
143 initialSize=
"0"
144 autowarmCount=
"30"
145 regenerator=
"solr.NoOpRegenerator" />
146
147 <!--
Lazy Field Loading
148
149 If
true, stored fields that are not requested will be loaded
150 lazily. This can result
in a significant speed improvement
151 if the usual
case is to not load all stored fields,
152 especially
if the skipped fields are large compressed text
153 fields.
154 -->
155 <enableLazyFieldLoading>
true</enableLazyFieldLoading>
156
157 <!--
Result Window Size
158
159 An optimization
for use with the queryResultCache. When a search
160 is requested, a superset of the requested number of document ids
161 are collected. For example,
if a search
for a particular query
162 requests matching documents
10 through
19, and queryWindowSize
is 50,
163 then documents
0 through
49 will be collected and cached. Any further
164 requests
in that range can be satisfied via the cache.
165 -->
166 <queryResultWindowSize>
20</queryResultWindowSize>
167
168 <!-- Maximum number of documents to cache
for any entry
in the
169 queryResultCache.
170 -->
171 <queryResultMaxDocsCached>
200</queryResultMaxDocsCached>
172
173 <!--
Use Cold Searcher
174
175 If a search request comes
in and there
is no current
176 registered searcher, then immediately register the still
177 warming searcher and use it. If
"false" then all requests
178 will block until the first searcher
is done warming.
179 -->
180 <useColdSearcher>
false</useColdSearcher>
181
182 <!--
Max Warming Searchers
183
184 Maximum number of searchers that may be warming
in the
185 background concurrently. An error
is returned
if this limit
186 is exceeded.
187
188 Recommend values of
1-
2 for read-only slaves, higher
for
189 masters w/
o cache warming.
190 -->
191 <maxWarmingSearchers>
2</maxWarmingSearchers>
192
193 </query>
194
195 <requestDispatcher handleSelect=
"true" >
196 <requestParsers enableRemoteStreaming=
"true" multipartUploadLimitInKB=
"2048" formdataUploadLimitInKB=
"2048" />
197 </requestDispatcher>
198
199 <requestHandler name=
"/select" class=
"solr.SearchHandler">
200 <lst name=
"defaults">
201 <str name=
"echoParams">
explicit</str>
202 <str name=
"wt">xml</str>
203 <str name=
"indent">
true</str>
204 <
int name=
"rows">
10</
int>
205 </lst>
206 </requestHandler>
207
208 <requestHandler name=
"/analysis/field" startup=
"lazy" class=
"solr.FieldAnalysisRequestHandler" />
209
210 <requestHandler name=
"/admin/ping" class=
"solr.PingRequestHandler">
211 <lst name=
"invariants">
212 <str name=
"q">*:*</str>
213 </lst>
214 <lst name=
"defaults">
215 <str name=
"echoParams">all</str>
216 </lst>
217 </requestHandler>
218
219 <requestHandler name=
"/dataimport" class=
"org.apache.solr.handler.dataimport.DataImportHandler">
220 <lst name=
"defaults">
221 <str name=
"config">website-data-config.xml</str>
222 </lst>
223 </requestHandler>
224
225 <requestHandler name=
"/replication" class=
"solr.ReplicationHandler" >
226 <lst name=
"master">
227 <str name=
"replicateAfter">commit</str>
228 <str name=
"replicateAfter">startup</str>
229 <str name=
"confFiles">schema.xml</str>
230 </lst>
231 </requestHandler>
232
233 <!-- config
for the admin
interface -->
234 <admin>
235 <defaultQuery>*:*</defaultQuery>
236 </admin>
237
238 </config>
solrconfig.xml
schema.xml用来设置solr需要索引的字段
完整的schema.xml
1 <?xml version="1.0" ?>
2
3 <schema name="website" version="1.5">
4 <types>
5 <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" />
6 <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true" />
7 <fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>
8 <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
9 <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
10 <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
11 <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
12 <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0" />
13 <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0" />
14 <fieldType name="sfloat" class="solr.TrieFloatField" precisionStep="8" omitNorms="true" positionIncrementGap="0" />
15 <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" omitNorms="true" positionIncrementGap="0" />
16 <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0" />
17 <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0" />
18 <fieldType name="tdates" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0" multiValued="true"/>
19 <fieldType name="tints" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
20 <fieldType name="tfloats" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
21 <fieldType name="tlongs" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
22 <fieldType name="tdoubles" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
23 <fieldType name="text" class="solr.TextField">
24 <analyzer type="index" class="org.apache.lucene.analysis.en.EnglishAnalyzer"/>
25 <analyzer type="query" class="org.apache.lucene.analysis.en.EnglishAnalyzer"/>
26 </fieldType>
27 </types>
28 <!-- general -->
29 <fields>
30 <field name="_version_" type="long" indexed="true" stored="true"/>
31 <field name="CultureID" type="int" indexed="false" stored="true" />
32 <field name="DescriptionFull" type="text" indexed="true" stored="false" />
33 <field name="DescriptionShort" type="text" indexed="true" stored="false" />
34 <field name="ImageJSON" type="text" indexed="false" stored="true" />
35 <field name="IsHot" type="int" indexed="false" stored="true" />
36 <field name="IsMutilColor" type="int" indexed="false" stored="true" default="" />
37 <field name="LeiMuNameJSON" type="text" indexed="true" stored="true" />
38 <field name="PID" type="string" indexed="true" stored="true" />
39 <field name="PropertyText" type="text" indexed="true" stored="true" />
40 <field name="RequiredText" type="text" indexed="true" stored="true" />
41 <field name="SPUID" type="long" indexed="true" stored="true" />
42 <field name="Sort" type="int" indexed="true" stored="true" />
43 <field name="Status" type="int" indexed="true" stored="true" />
44 <field name="Title" type="text" indexed="true" stored="true" />
45 <field name="UpTime" type="date" indexed="true" stored="true" />
46 <field name="Price" type="double" indexed="true" stored="true" />
47 <field name="SaleCount" type="long" indexed="true" stored="true" />
48 <field name="CustomerRatingCount" type="long" indexed="false" stored="true" />
49 <field name="DisCount" type="double" indexed="true" stored="true" />
50 <field name="Basic_search" type="text" indexed="true" stored="false" multiValued="true"/>
51 </fields>
52
53 <!-- field to use to determine and enforce document uniqueness. -->
54 <uniqueKey>SPUID
</uniqueKey>
55 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
56 <defaultSearchField>Basic_search
</defaultSearchField>
57 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
58 <solrQueryParser defaultOperator="OR"/>
59 <copyField source="PID" dest="Basic_search" />
60 <copyField source="DescriptionFull" dest="Basic_search" />
61 <copyField source="DescriptionShort" dest="Basic_search" />
62 <copyField source="LeiMuNameJSON" dest="Basic_search" />
63 <copyField source="PropertyText" dest="Basic_search" />
64 <copyField source="RequiredText" dest="Basic_search" />
65 <copyField source="Title" dest="Basic_search" />
66 </schema>
schema.xml
website-data-config.xml设置数据源和数据源格式与schema.xml的字段匹配
完整的website-data-config.xml
1 <?xml version="1.0" encoding="UTF-8" ?>
2 <dataConfig>
3 <dataSource type="URLDataSource" encoding="UTF-8" />
4 <document>
5 <entity name="website"
6 processor="XPathEntityProcessor"
7 forEach="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel |/LuceneSpuXmlModel"
8 url="http://url/product?cultureId=1&pageSize=100&pageIndex=1&siteId=6&platform=1"
9 transformer="RegexTransformer,DateFormatTransformer"
10 connectionTimeout="120000"
11 readTimeout="300000"
12 stream="true">
13 <field column="SPUID" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/SPUID" />
14 <field column="PID" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/PID" />
15 <field column="Title" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/Title" />
16 <field column="Status" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/Status" />
17 <field column="CultureID" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/CultureID" commonField="true" />
18 <field column="LeiMuNameJSON" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/LeiMuNameJSON" />
19 <field column="DescriptionShort" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/DescriptionShort" commonField="true" />
20 <field column="DescriptionFull" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/DescriptionFull" commonField="true" />
21 <field column="Sort" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/Sort" />
22 <field column="ImageJSON" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/ImageJSON" />
23 <field column="PropertyText" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/PropertyText" />
24 <field column="RequiredText" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/RequiredText" />
25 <field column="IsHot" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/IsHot" />
26 <field column="IsMutilColor" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/IsMutilColor" />
27 <field column="UpTime" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/UpTime" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss"/>
28 <field column="Price" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/Price" />
29 <field column="SaleCount" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/SaleCount" />
30 <field column="CustomerRatingCount" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/CustomerRatingCount" />
31 <field column="DisCount" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/DisCount" />
32
33 <field column="$hasMore" xpath="/LuceneSpuXmlModel/HasMore" />
34 <field column="$nextUrl" xpath="/LuceneSpuXmlModel/NextPageUrl" />
35 </entity>
36 </document>
37 </dataConfig>
website-data-config.xml
启动Tomcat运行solr出错:
复制数据倒入的包:
cp /data/solr-5.3.1/dist/solr-dataimporthandler-* /data/web/solr/solr_app/WEB-INF/lib/
启动tomcat_solr成功界面如下:
5、设置solr定时任务
1、复制定时任务包(如果没有复制过)
cp /data/solr-5.3.1/dist/solr-dataimporthandler-* /data/web/solr/solr_app/WEB-INF/lib/
2、还需要一个jar也复制到/data/web/solr/solr_app/WEB-INF/lib/下面:
apache-solr-dataimportscheduler-1.0.jar
3、修改Web.xml,添加配置节点:
<listener>
<listener-class>
org.apache.solr.handler.dataimport.scheduler.ApplicationListener
</listener-class>
</listener>
4、回到solr_home目录创建conf目录,创建dataimport.properties定时任务文件:
5、编辑dataimport.properties定时任务文件:
a、设置syncCores,server,port
b、设置时间间隔、开始时间:
http://my.oschina.net/lsf930709/blog/620738(参考文章)
转载于:https://www.cnblogs.com/qiyebao/p/5432201.html