引用本文:黄名选,朱丽娜.基于SRCSAC评价框架挖掘的跨语言查询译后扩展[J].控制与决策,2020,35(11):2787-2796
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览次   下载 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于SRCSAC评价框架挖掘的跨语言查询译后扩展
黄名选1,朱丽娜2
(1. 广西财经学院广西跨境电商智能信息处理重点实验室,南宁530003;2. 广西财经学院信息与统计学院,南宁530003)
摘要:
提出一种面向查询扩展的基于评价框架SRCSAC(support-relevancy-chi-square analysis-confidence)的加权关联规则挖掘算法,给出跨语言查询译后扩展模型和新的扩展词权值计算方法,并提出基于SRCSAC框架挖掘的跨语言查询译后扩展算法.该算法采用支持度-关联度框架和新的剪枝策略挖掘有效频繁项集,通过卡方分析-置信度框架从有效频繁项集中提取加权关联规则,根据扩展模型从关联规则中获取优质扩展词,实现跨语言译后扩展.实验结果表明:所提算法能有效遏制查询主题漂移和词不匹配问题;与基准检索比较,其前件扩展、后件扩展和混合扩展的MAP最低平均增幅分别为86.85%、86.04%和86.00%;与对比方法比较,其长查询检索的MAP最低平均增幅分别可达12.23%、9.06%和12.6%,都高于短查询检索的增幅;与后件扩展算法比较,前件扩展和混合扩展的MAP最高增幅可达5.5%;置信度有助于提升前件扩展和混合扩展算法的检索性能,关联度有利于后件扩展算法检索性能的提高,支持度和关联度对后件扩展算法的短查询检索更有效.
关键词:  信息检索  查询扩展  跨语言信息检索  自然语言处理
DOI:10.13195/j.kzyjc.2018.1647
分类号:TP391
基金项目:国家自然科学基金项目(61762006,61562004);广西应用经济学一流学科(培育)开放性课题项目(2018MA07);广西(东盟)财经研究中心开放性课题项目(2018DMCJYB08).
Cross language query post-translation expansion based on the SRCSAC evaluation framework mining
HUANG Ming-xuan1,ZHU Li-na2
(1. Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing,Guangxi University of Finance and Economics,Nanning 530003,China;2. School of Information and Statistics,Guangxi University of Finance and Economics,Nanning 530003,China)
Abstract:
An algorithm of weighted association rules mining for query expansion is proposed based on the evaluation framework of support-relevancy-chi-square analysis-confidence(SRCSAC). And the models of cross language query post-translation expansion(CLQPTE) are presented and a new computing method of the expansion term weight is given. Then, an algorithm of CLQPTE is proposed forward based on the SRCSAC framework mining. The algorithm uses the support-relevancy framework and the pruning method to mine effective frequent itemsets, and extracts the weighted association rules from the frequent itemsets in terms of the framework of chi-square-confidence. The high quality expansion terms are obtained from the association rules according to the expansion models in order to carry out CLQPTE. The experimental results show that the proposed algorithms can effectively restrain the issue of query topic drift and term mismatch. Compared with the benchmark retrieval, the MAP minimum average increases(MAIs) of the proposed antecedent expansion(AE), consequent expansion(CE) and hybrid expansion(HE) of the association rules are 86.85%,86.04% and 86.00%, respectively. Compared with the contrast methods, the MAP MAIs of the long queries retrieval for the proposed AE, CE and HE algorithms can reach 12.23%, 9.06% and 12.6%, respectively, which are all higher than those of the short queries retrieval. The MAP maximum increase of the AE and HE can be up to 5.5% compared with the CE algorithm. The confidence is helpful to improve the retrieval performance of the AE and HE algorithms, and the relevancy is more conducive to the improvement of retrieval performance of the CE. The support and relevancy are more effective for short queries retrieval based on the CE algorithm.
Key words:  information retrieval  query expansion  cross language information retrieval  natural language processing

用微信扫一扫

用微信扫一扫