Message-ID: <2016673052.301323.1369120456890.JavaMail.email@example.com> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_301322_1013233445.1369120456889" ------=_Part_301322_1013233445.1369120456889 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Address parser, which will be responsible for parsing an input a= ddress into a standardize format. This is necessary because the address dat= abase stores address data in a certain format. In order to geocode an addre= ss, the input address must be turned into a recognizable format first.
The parser parses the input address string by recognizing certain patter= n of U.S. addresses. It expects the input to be a reasonably formatted addr= ess in the format of
<name> <number> <predir> <street> <designator= > <postdir> <line 2> <city> <state> <zip><= /p>
=C2=A0Almost all of these components are optional, for example, the foll= owing are all valid inputs
The parser parses the input address by looking for certain separator key= words. For example, the number token is considered as separator between the= <name> component and the <street> component. Street designator= s such as 'st', 'ave' and 'blvd' signals the end of the <street> comp= onent.The states are also considered keywords. Keywords are required to spe= ll correctly otherwise the parser won't be able to identify them.For instan= ce, the mis-spelled 'street' in '123 main streeet' will not be recognized a= s the street designator because it's mis-spelled. State abbreviations (NJ, = PA, DC, etc) are required to have correctly spelling; this requirement is l= ess strict for full state names (new jersey, California, etc), the parser i= s able to automatically correct these spelling errors.
The parser is able to performs spelling correction on state names (only =
on spell-out states names, state abbreviations will not be auto corrected).=
outputs (mis-spelled 'PENSYLVANA' is automatically corrected)
In most of the normal cases, the parser performs parsing at the syntacti= cal level without considering the semantics of the input. That means that t= he parser cannot resolve ambiguous inputs such as '123 center lane st valle= y pa'. Without knowing whether 'valley' or 'st valley' are valid city names= in PA, the parser won't be able to make a decision of whether to parse 'st= valley' as the city and leave '123 center lane' as the street address or t= o parse 'valley' as the city and leave '123 center lane st' as the street a= ddress.
Many other geocoders have problems dealing with these ambigious addresse= s input because they also have a parser that works in a similar fashion. Su= ch geocoders required you to use comma to separate the street address from = the city because of this. The parser in JGeocoder will have the same proble= m if there is no comma in the input that separates the street address form = the city but to a much less extent. For instance, the parser is able to cor= rectly parse the inputs
Different from many geocoders such as google map and geocoder.us, JGeocoder's parser is designed to be bus=
iness address friendly. In business addresses, people often put the name of=
the business before the first line of the address and also similarly often=
, they might put the business name after the street line. Here are two exam=
ples of such business addresses:
Try these 2 addresses on google map and you will find that google map is= not able to understand them.
JGeocoder's parser on the other hand, is capable of handling such busine= ss addresses. With the JGeocoder's parser, you will get the following outpu= ts given the above 2 addresses.------=_Part_301322_1013233445.1369120456889--