-
Notifications
You must be signed in to change notification settings - Fork 30
Description
This is for the case where input address contains street name and street number.
Problem
Currently, for our accuracy test, we compare the distance between the source and our result, which requires a source of truth of correct coordinates, but sometimes it can't reflect some mistakes like:
input address => our result
九龍利得街11號 => 九龍利得街48號海角樓 (OGCIO doesn't have number 11, but have 10,12)
北帝街 123號 => 九龍北帝街111號福祥大廈 (OGCIO has number 123)
深水埗窩仔街13號地下 => 九龍窩仔街120號美賢樓(第2座) (OGCIO has number 13)
海泓道10號 => 九龍海泓道8號香港管理專業協會李國寶中學 (OGCIO has number 10)
大埔墟寶鄉街62至66號地下 => (OGCIO has number 62)
Our accuracy test will still mark these results are correct, as the distance is shorter than the threshold (i.e., 0.1km, p.s: if we increase the threshold, the accuracy test correctness would increase to almost 90%).
But what matter is the correctness of the street number
The desired result should be:
九龍利得街11號 => 九龍利得街10號/九龍利得街12號/未有九龍利得街11號 (to be defined)
Proposed solution
Please refer to this simple program.
The parsing algorithm should consider the street number, both the parser and the accuracy test:
Parser
- extract street number using regex, retain integer parts and compare with the input address to make sure it is as close as possible
Accuracy test
- the test should also take account into the difference of the street numbers, instead of the distance of coordinates.