-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
According to pre-EDA, the pickup & dropoff locations can be the significant variables for analysis. Those variables are stored with geographic coordinates, i.e. longitude v.s. latitude. However, those values may be non-meaningful and bias in regression method,, since they are for labeling the location mathematically instead for quantity. Thus, finding the meaningful & efficient way to label the location is an issue. Here is an basic idea - Categorized to belonged districts instead of using continue value :
- Category types/ranges : Boroughs (5) > Community areas/boards, CB (18max/bor.) > Neighborhoods (?)
- Label with encode method to binary bits, e.g 010 110 for one of CB (total 59).
Discussion :
- Should we use multi-categories & single category?
- Use CB or Neighborhoods?
- How to extract neighborhoods numbers?
- How to map geographic coord. to particular district category?
Metadata
Metadata
Assignees
Labels
No labels