N-gramԭ;о
N-gramĻԭ
N-gramǼѧ۷ڵĸָһıNĿitemСĿitemڡĸʻԡͨN-gramsȡıϿ⡣
N=1ʱΪunigramN=2ΪbigramN=3ΪtrigramԴơ
˵informationretrievalΪһı5-gramsitemsΪ
infor,nform,forma,ormat,rmati,matio,ation,tion,ionr,onre,nret,retr,retri,etrie,triev,rieva,ieval
ʱΪ˱ڷǰӿոͶ4items____i,___in,__inf,_info
ģݼ𡱣bigramΪ
죬ݣݼ٣ˣ
ģ͵ԭǻôһ˼룺ԻУTijָTNitemijָɵģ¹ʽʾ
P(T)=P(W1W2W3Wn)=P(W1)P(W2|W1)P(W3|W1W2)P(Wn|W1W2Wn-1)
ϹʽʵӦáʱƷģͣģΪһʵijֽǰֵļʡʹʽ
P(W1)P(W2|W1)P(W3|W1W2)P(Wn|W1W2Wn-1)P(W1)P(W2|W1)P(W3|W2)P(Wn|Wn-1)
ͨbigramtrigramм㡣
N-gram;
208090,n-gram㷺ıѹ,ƴд,ַ,ʶ90,üȻԴԶõµӦ,Զ,Զ,Զ,,ָıзֵȡ
ĿǰN-gramΪõľȻԵԶܡn-gramԶ,һ˹Ԥķ(Classification),ֳƷ;һ˹Ԥķ(Clustering),ֳƾࡣ˹Ԥķ,ָ˹Ԥȷֺ(Yahoo!IJνṹ),Ȼ,ض㷨Զؽӵݿijһࡣȱ,ԤȾ߱ͷ֪ʶ˹Ԥķ,ָԶʶ(),ԤȾ߱ͷ֪ʶ
ϢĵʱԴָʽ˹ϢʶͷѾòʵȻԱĵԶڳΪʵKDASVMûѧϰTrainingdataγɷຯȻTestingdataвȷԡ
N-gramеƽ
һ2000ʵıʹbigramͻγ20002000ľtrigramγ200020002000ľкд0ֵϡʱҪƽdatasmoothingʹP(Wk)0
N-gramо
ϽܶdzȤԲοо
ͼ .N-gram
ƣģһֻ N-Gram Զ鱨ѧhttp://study.hbecrc.org/lcq/xueshuyanjiu/UploadFiles_9984/200704/20070417110725112.pdf
George DoddingtonAutomatic Evaluation of MachineTranslation Quality Using N-gram Co-Occurrence Statisticshttp://dl.acm.org/citation.cfm?id=1289189.1289273
http://blog.sciencenet.cn/blog-713101-797384.html
һƪ[ת]Matlabsvmʵ
һƪ[ת]matlabSVMʵ-Xu Cui