- 浏览: 90739 次
- 性别:
- 来自: 长沙
文章分类
- 全部博客 (36)
- 开源框架应用 (5)
- java (2)
- Database (3)
- 杂聊 (0)
- Linux (8)
- Chrome (1)
- centos (6)
- svn (1)
- wiki (1)
- Elasticsearch (9)
- Facet (0)
- Bugzilla (1)
- tomcat集群 (0)
- apache项目 (0)
- GIT (1)
- mongodb集群 (1)
- Elasticsearch插件Mongodb River 安装 (0)
- Elasticsearch Mongodb River (1)
- Elasticsearch Suggest Plugin (0)
- drools (1)
- M9 (1)
- maven (0)
- 狼与兔子 (0)
- Tomcat (1)
- Enonic (0)
- elasticsearch jetty (0)
- nexus (0)
- 序列化传输 (1)
- 批量修改文件后缀 (0)
- BP神经网络 (0)
- Devops (1)
最新评论
-
maxrocray:
两种方式: 1. 配置indexmapping. 这样可以为每 ...
Elasticsearch 10版本插件安装 -
rmn190:
请问下, 用mongo river导数据时, 中文分词成功没? ...
Elasticsearch 10版本插件安装
Spring3.1, Hibernate3.6,Lucene3.0.3以及IKAnalyzer3.2.3, 数据库采用Mysql,连接池采用dbcp.主要Jar如下:
Spring重要Bean配置:
Lucene工具类:
建立索引的方法:
对增量数据循环建立索引:
主要代码以备忘。
Spring重要Bean配置:
<!--配置定时任务 --> <bean id="bagnetTask" class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"> <property name="targetObject"> <ref bean="bagnetJob" /> </property> <property name="targetMethod"> <value>runJobs</value> </property> <!-- keep the job from running while the previous one hasn't finished yet --> <property name="concurrent" value="false" /> </bean> <!--配置定时任务触发器 --> <bean id="jobTrigger" class="org.springframework.scheduling.quartz.CronTriggerBean"> <property name="jobDetail"> <ref bean="bagnetTask" /> </property> <!-- every 10 mins in the working day from 9:00 to 19:00 we create/update index --> <property name="cronExpression"> <value>0 0/10 9-19 *,* * ?</value> </property> </bean> <!-- 定时调用的Scheduler--> <bean autowire="no" class="org.springframework.scheduling.quartz.SchedulerFactoryBean"> <property name="triggers"> <list> <ref local="jobTrigger" /> </list> </property> </bean> <!-- 事务管理器 --> <bean id="transactionManger" class="org.springframework.orm.hibernate3.HibernateTransactionManager"> <property name="sessionFactory"> <ref bean="sessionFactory" /> </property> </bean> <!-- 配置事务拦截器--> <bean id="transactionInterceptor" class="org.springframework.transaction.interceptor.TransactionInterceptor"> <property name="transactionManager"> <ref bean="transactionManger" /> </property> <!-- 下面定义事务传播属性--> <property name="transactionAttributes"> <props> <prop key="find*">PROPAGATION_REQUIRED</prop> <prop key="delete*">PROPAGATION_REQUIRED</prop> <prop key="add*">PROPAGATION_REQUIRED</prop> <prop key="update*">PROPAGATION_REQUIRED</prop> <prop key="do*">PROPAGATION_REQUIRED</prop> </props> </property> </bean> <!-- 自动代理 --> <bean id="autoBeanNameProxyCreator" class="org.springframework.aop.framework.autoproxy.BeanNameAutoProxyCreator"> <property name="beanNames"> <list> <value>*Service</value> </list> </property> <property name="interceptorNames"> <list> <idref local="transactionInterceptor" /> </list> </property> <!-- 这里的配置是必须的,否则无法完成代理的类型转化 这是使用CGLIB来生成代理 --> <property name="proxyTargetClass" value="true" /> </bean>
Lucene工具类:
package com.dx.bags.util; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.wltea.analyzer.lucene.IKAnalyzer; public class LuceneUtil { private static LuceneUtil instance; private Analyzer analyzer; private Directory picDirectory; private Directory topDirectory; public Analyzer getAnalyzer() { return analyzer; } public void setAnalyzer(Analyzer analyzer) { this.analyzer = analyzer; } public Directory getPicDirectory() { return picDirectory; } public void setPicDirectory(Directory picDirectory) { this.picDirectory = picDirectory; } public Directory getTopDirectory() { return topDirectory; } public void setTopDirectory(Directory topDirectory) { this.topDirectory = topDirectory; } private LuceneUtil() { analyzer = new IKAnalyzer(); try { picDirectory = FSDirectory.open(new File(DXConstants.PIC_INDEX_DIR)); topDirectory = FSDirectory.open(new File(DXConstants.TOP_INDEX_DIR)); } catch (IOException e) { throw new RuntimeException(e); } } public static LuceneUtil getInstance() { if (null == instance) instance = new LuceneUtil(); return instance; } }
建立索引的方法:
public static void createTopIndex(List<SearchTop> list, boolean isFirstTime) { RAMDirectory ramDirectory = new RAMDirectory(); try { IndexWriter indexwriter = new IndexWriter(ramDirectory, LuceneUtil.getInstance().getAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); if (list != null && list.size() > 0) { for (int i = 0; i < list.size(); i++) { Document doc = new Document(); SearchTop searchTop = list.get(i); Field field = new Field("id", String.valueOf(searchTop.getId()), Field.Store.YES, Field.Index.NO); doc.add(field); String title = DXUtil.characterUtil(searchTop.getTitle()); if(title!=null && !"".equals(title)){ field = new Field("title", title, Field.Store.YES, Field.Index.ANALYZED); doc.add(field); } String description = searchTop.getDescription(); if(description!=null && !"".equals(description)){ field = new Field("description", description, Field.Store.NO,Field.Index.ANALYZED); doc.add(field); } field = new Field("cid", String.valueOf(searchTop.getCid()), Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(field); field = new Field("addtime", String.valueOf(searchTop.getCreateDate()), Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(field); String content = DXUtil.characterUtil(searchTop.getContents()); if(content!=null && !"".equals(content)){ field = new Field("contents", content, Field.Store.NO, Field.Index.ANALYZED); doc.add(field); } field = new Field("sex", String.valueOf(searchTop.getSex()), Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(field); field = new Field("tid",String.valueOf(searchTop.getTid()),Field.Store.YES,Field.Index.NOT_ANALYZED); doc.add(field); field = new Field("topicid", String.valueOf(searchTop.getTopicId()), Field.Store.YES, Field.Index.ANALYZED); doc.add(field); field = new Field("piccount", String.valueOf(searchTop.getPiccount()), Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(field); String city = DXUtil.characterUtil(searchTop.getCityName()); if(city!=null && !"".equals(city)){ field = new Field("cityname", city, Field.Store.YES, Field.Index.ANALYZED); doc.add(field); } field = new Field("purview", dealPurview(searchTop.getSex(), searchTop.getCid()), Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(field); String seasonname = DXUtil.characterUtil(searchTop.getSeasonName()); if(seasonname!=null && !"".equals(seasonname)){ field = new Field("seasonname", seasonname, Field.Store.YES, Field.Index.ANALYZED); doc.add(field); } String recordurl = dealTopUrl(searchTop.getSex(), searchTop.getCid(), searchTop.getTid()); if(recordurl!=null && !"".equals(recordurl)){ field = new Field("recordurl", recordurl, Field.Store.YES, Field.Index.NOT_ANALYZED); doc.add(field); } indexwriter.addDocument(doc); } } indexwriter.optimize(); indexwriter.close(); IndexWriter writer = new IndexWriter(LuceneUtil.getInstance() .getTopDirectory(), LuceneUtil.getInstance() .getAnalyzer(), isFirstTime, IndexWriter.MaxFieldLength.LIMITED); writer.addIndexesNoOptimize(new Directory[] { ramDirectory }); writer.close(); } catch (CorruptIndexException e) { throw new RuntimeException(e); } catch (IOException ex) { throw new RuntimeException(ex); } }
对增量数据循环建立索引:
private void createTopIndex() { long count = searchTopService.findTotalRecordNum(); boolean flag = DXUtil.hasNotFile(new File(DXConstants.TOP_INDEX_DIR)); List<SearchTop> tops = new ArrayList<SearchTop>(); int max = DXConstants.PAGE_SIZE; int first = Integer.parseInt(DXUtil .getRecordNumFromFile(DXConstants.TOP_TXT)); while (first < count) { tops = searchTopService.findTopToIndex(first, max); first = first + max; IndexCreator.createTopIndex(tops, flag); if(flag == true) flag = false; DXUtil.writeContent(String.valueOf(first), DXConstants.TOP_TXT); } if (first >= count) DXUtil.writeContent(String.valueOf(count), DXConstants.TOP_TXT); optimiseTopicIndex(flag); } //optimize index private void optimiseTopicIndex(boolean flag) { IndexWriter writer = null; try { writer = new IndexWriter( LuceneUtil.getInstance().getTopDirectory(), LuceneUtil .getInstance().getAnalyzer(), flag, IndexWriter.MaxFieldLength.LIMITED); writer.optimize(); writer.close(); } catch (Exception e) { } }
主要代码以备忘。
发表评论
-
Linux下ES环境搭建
2013-03-19 08:39 0全文检索服务环境搭建 * 本手册只列举了在Linux 6 ... -
Eclipse 插件地址
2012-11-06 21:05 1556git 插件:http://download.eclips ... -
apache 经典项目
2012-10-22 15:06 957Apache Shiro安全框架 http://sh ... -
struts2 filters configuration sequence in web.xml
2011-09-19 14:46 660struts 2、sitemesh、urlrewrite co ... -
Spring3, Hibernate3.6与Proxool连接池配置
2011-08-05 10:27 2410鉴于Spring3.0不采用Servlet启动,改用liste ...
相关推荐
lucene3.5 IKAnalyzer3.2.5 实例中文分词通过,目前在网上找的lucene 和IKAnalyzer 的最新版本测试通过。内含:示例代码,以及最新jar包。 lucene lucene3.5 IKAnalyzer IKAnalyzer3.2.5 jar 中文 分词
solr的IK分词器JAR及配置文件 jar包和配置文件的放置位置不一样,详情可搜索 IK Analyzer 是一个开源的,基于java语言开发的轻量级的中文分词工具包。...org.wltea.analyzer.lucene.IKAnalyzer jar
Lucene IK Analyzer 3.0 Lucene的IK Analyzer 3.0 中文分词器 Lucene IK Analyzer 3.0 Lucene的IK Analyzer 3.0 中文分词器Lucene IK Analyzer 3.0 Lucene的IK Analyzer 3.0 中文分词器
中文分词检索IKAnalyzer3.2.3Stable+spring3.0.5+hibernate3.6.7+hibernate-search3.4.0.Final+maven hibernate-search3.4.0用的是lucene3.1
由于林良益先生在2012之后未对IKAnalyzer进行更新,后续lucene分词接口发生变化,导致不可使用,所以此jar包支持lucene6.0以上版本
关于lucene的IKAnalyzer分词器以及与lucene4.3共同使用时发生的问题解决包
lucene-IKAnalyzer2012_u6-lukeall.rar压缩包中包含lucene-4.10.3依赖包、中文分词器IKAnalyzer2012_u6的依赖包和索引库查看工具lukeall-4.10.0.jar(将jar拷贝到相应的索引库中双击打开即可查看)。解压后就可以...
提示:IKAnalyzer中文分词器支持Lucene6.0以上,IKAnalyzer中文分词器支持Lucene6.0以上。
支持高版本Lucene,包括最新的Lucene7.3.1,本人亲自测试过,绝对能用!
该jar包之前只支持Lucene4.7.2,因为我自己的项目用到的是Lucene5.3.1,所以我自己重写了IKAnalyzer.java以及IKTokenizer.java,并且重新编译之后替换了之前的.class文件,现在可以适用于Lucene5.3.1
解决lucene4.0与IKAnalyzer的冲突。解决Exception in thread "main" java.lang.VerifyError: class org.wltea.analyzer.lucene.IKAnalyzer overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;...
Lucene5.21+IkAnalyzer2012_V5入门案例,看不懂你来打我。
里面含有lucene全文检索所需要的一些jar包以及中文检索器IKAnalyzer
来自牛人修改的IKAnalyzer,https://github.com/sea-boat/IKAnalyzer-Mirror,亲测Lucene7.2可用
此版本是基于IK-Analyzer-2012FF修改而来,专门适用于Lucene 5.2.1。 IK Analyzer 是一个开源的,基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版开始, IKAnalyzer已经推出了4个大版本。最初,它...
使用IK分词器,应为该集群使用到的solr版本为4.10.3-cdh5.7.5,所以使用的 IK 包为IKAnalyzer2012FF_u1.jar,如果是3x的solr,使用IKAnalyzer2012_u6.jar solr-4.10.3下载地址:...
使用lucene-3.5和IKAnalyzer2012,实现基础的全文检索实现
Lucene4.7+IK Analyzer中文分词入门教程
支持lucene5的 IKAnalyzer中文分词器 IKAnalyzer5.jar