Build a dictionary to deal with abnormal Unicode characters (温 vs 溫 / 廈 vs 厦) and fullwidth character forms