Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于地址后期出现高级信息对标准化的影响 #165

Open
Borber opened this issue Jan 6, 2022 · 5 comments
Open

关于地址后期出现高级信息对标准化的影响 #165

Borber opened this issue Jan 6, 2022 · 5 comments

Comments

@Borber
Copy link

Borber commented Jan 6, 2022

去除后期出现的更高级的信息. 会大幅提升相似度, 作者大大能优化一些这种情况吗?

String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";

结果:

海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=null, 
	roadNum=null, 
	buildingNum=A-32, 
	text=西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=11#楼2单元203, 
	text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
	doc1=Document(terms=[Term(灵山镇), Term(A), Term(32), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=null, roadNum=null, roadNumValue=0), 
	doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	terms=[io.patamon.geocoding.similarity.MatchedTerm@2cfb4a64], 
	similarity=0.4886777774252209
)

去除第二个海口市

String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";

结果

海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=A-32, 
	text=绿地城润园灵山西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=11#楼2单元203, 
	text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
	doc1=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(A), Term(32), Term(绿地城), Term(润园), Term(灵山), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	terms=[io.patamon.geocoding.similarity.MatchedTerm@4b6995df, io.patamon.geocoding.similarity.MatchedTerm@2fc14f68, io.patamon.geocoding.similarity.MatchedTerm@591f989e, io.patamon.geocoding.similarity.MatchedTerm@66048bfd, io.patamon.geocoding.similarity.MatchedTerm@61443d8f], 
	similarity=0.7152705001057788
)
@Borber Borber changed the title 关于地址重复出现市级信息对标准化的影响 关于地址后期出现高级信息对标准化的影响 Jan 6, 2022
@IceMimosa
Copy link
Member

卧槽这么复杂呢,应该是被第二个 海口市 干扰了。😂

@Borber
Copy link
Author

Borber commented Jan 6, 2022

卧槽这么复杂呢,应该是被第二个 海口市 干扰了。😂

是的呀, 我上面去除了第二个相似度就比较高了

@IceMimosa
Copy link
Member

好的,我有空看下能否优化。话说你那边是不是生成了国标的地址,能贡献进来不,不知道准确率如何。😏

@Borber
Copy link
Author

Borber commented Jan 6, 2022

好的,我有空看下能否优化。话说你那边是不是生成了国标的地址,能贡献进来不,不知道准确率如何。😏

国标感觉也没精确多少, 我测试起来感觉差不多

@IceMimosa
Copy link
Member

@Borber 好的,主要是担心有些新增地址和旧地址变更,可能需要拿库里面的地址一起做个对比才能看出来。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants