Skip to content

Consider street number for parser algorithm and testing  #97

@cswbrian

Description

@cswbrian

This is for the case where input address contains street name and street number.

Problem

Currently, for our accuracy test, we compare the distance between the source and our result, which requires a source of truth of correct coordinates, but sometimes it can't reflect some mistakes like:

input address => our result
九龍利得街11號 => 九龍利得街48號海角樓 (OGCIO doesn't have number 11, but have 10,12)
北帝街 123號 => 九龍北帝街111號福祥大廈 (OGCIO has number 123)
深水埗窩仔街13號地下 => 九龍窩仔街120號美賢樓(第2座) (OGCIO has number 13)
海泓道10號 => 九龍海泓道8號香港管理專業協會李國寶中學 (OGCIO has number 10)
大埔墟寶鄉街62至66號地下 => (OGCIO has number 62)

Our accuracy test will still mark these results are correct, as the distance is shorter than the threshold (i.e., 0.1km, p.s: if we increase the threshold, the accuracy test correctness would increase to almost 90%).

But what matter is the correctness of the street number
The desired result should be:
九龍利得街11號 => 九龍利得街10號/九龍利得街12號/未有九龍利得街11號 (to be defined)

Proposed solution

Please refer to this simple program.
The parsing algorithm should consider the street number, both the parser and the accuracy test:

Parser

  • extract street number using regex, retain integer parts and compare with the input address to make sure it is as close as possible

Accuracy test

  • the test should also take account into the difference of the street numbers, instead of the distance of coordinates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions