Wiki-based ʏ̎c[
Wiki-based Alternative name dictionary generation tool


戵

(1)͂߂
Wiki-based ʏ̎c[(ȌA{c[ƋL)́AWikipedia_vf[^gp邱ƂɂA
Ő̂ɑ΂ʏ̂̎쐬c[łB
{c[́AwikipediaLf[^úvúvuʏ́v3ނ̃L[[hÂɑ΂
ʏ̌̒os܂B
ɁAwikipedia_CNgpĒoʏ̌̏ƍEIʂsƂŁAx̍ʏ̎
쐬܂B


(2)g

1._vt@C̏
wikipedia{ŃTCgAʏ̎쐬ɕKvȃt@C_E[hA
𓀂ĂB
𓀌̃t@C̓t@Cȉ̂ƂɂȂĂ邩mFAׂētH_֒uĂB

Ewiki_vf[^_E[hTCg
http://dumps.wikimedia.org/jawiki/
EKvt@C
jawiki-[xxxxx]-pages-articles.xml    : wiki̋Lf[^BkAɕăAbv[hĂ
jawiki-[xxxxx]-redirect.sql          : wikĩ_CNgɊւf[^B
jawiki-[xxxxx]-page.sql              : wikȋSy[W̃^CgEy[WԍɊւf[^
[xxxxx]̓f[^_vꂽt^CX^v
L3̃t@ĆA^CX^vgpĂB



2.ʏ̎c[̎s
ʏ̎c[ make_abbreviated_dictionary.pl perlvOƂĎsĂB
c[sہAAɂč쐬鎫̐ݒsA
OAlŎgp_vf[^w肵ĂB

s(ActivePerl)
 perl make_abbreviated_dictionary.pl b111 0 C:\wiki-dumpdata\data 20140503


E(b111)Fʏ̎w
wiki_vf[^ʏ̂𒊏oۂɎgpL[[hݒ肵܂B
{c[ł́úvúvuʏ́v̎Oނ̃L[[hgp\łAeL[[hɑ΂
gp/sgpݒ肷邱Ƃł܂B
L[[h̎gp/sgp̐ݒ́A2i3gpāAȉ̂悤ɍs܂B
úvF(gp/sgp  b**1/b**0) gpꍇ͑1ڂ̐1ɁAgpȂꍇ0ɐݒ肵Ă
úvF(gp/sgp  b*1*/b*0*) gpꍇ͑2ڂ̐1ɁAgpȂꍇ0ɐݒ肵Ă
uʏ́vF(gp/sgp  b1**/b0**) gpꍇ͑3ڂ̐1ɁAgpȂꍇ0ɐݒ肵Ă



E(0)
L[[hɒoꂽʏ̌ɑ΂āA_CNggpƍsǂݒ肵܂B
̂Ƃ̕ʏ̌_CNgɂďƍ邱ƂŁAʓIɗpĂ\̍ʏ
݂̂𒊏o邱Ƃł܂B

ƍsꍇ͑̒l1ɁAsȂꍇ0ɐݒ肵ĂB



EO(C:\wiki-dumpdata\data)
pӂwik_vf[^ۑĂtH_ݒ肵܂B
Őݒ肵tH__vf[^ǂݍ݁A쐬܂B



El(20140506)
pӂwiki_vf[^̃^CX^vw肵܂B
OŐݒ肵tH_AŎw肵^CX^v_vf[^ǂݍ݁A쐬܂B



2.쐬t@C
{c[sƁAʏ̎t@CƂƂɁAɎgp钆ԃt@C쐬܂B


Eԃt@C
abstracted_page_data.datFSLf[^A_CNgɎw肳ĂL݂̂𒊏of[^
redirect_title_list.txt F_CNgE_CNg̃y[W^Cg̑g𒊏of[^
redirect_title_list_sorted.txtF_CNgɑ΂郊_CNg̃y[W^Cg܂Ƃ߂f[^


Eʏ̎t@C
abbreviate_word_*1-*2.txtF쐬ꂽʏ̎t@C
 *1Fs̑Ŏw肵l10iɒl
 *2Fs̑Ŏw肵l


windwsp̌`ƂȂĂ܂BlinuxpɕύX邽߂ɂ́Autil.pl̕""i~}[NQj
"/"ɑSuĂ΁AlinuxŎgpł܂D

(3)ʏ̎t@C
{c[s邱Ƃō쐬ꂽʏ̎́Aȉ̗lȏƂȂĂ܂B
  -> 1,,2,,3EEE
oꂽꂪꍇɂ́u,,vŋ؂ĕ\L܂B
ʏ̂ЂƂȏ㒊oꂽP̂ݕ\L܂B

1Foꂽꂪ1̏ꍇ
  -> 1

2Foꂽꂪ3̏ꍇ
  -> 1,,2,,3


