Start a new topic

RegEx to Format and Abbreviate Street Names

Here is a large set of regular expressions designed to replace a street name with its standardized USPS abbreviation. (See the "Street Suffixes" and "Secondary Units" tabs). It takes an address such as "15800 Countrydrive Avenue blvd., apt. 1506" and formats it to "15800 Countrydrive Ave Blvd Apt 1506". This isn't perfect as it mistakes an intended part of the street name as an abbreviation. Hence "Avenue" becomes "Ave" and some of the meaning is lost. However if the street name includes a possible abbreviation (such as "Countrydrive") then the abbreviation is appropriately NOT applied.
Regular Expressions - Street Suffixes & Secondary Unit Designators.xlsx (UPDATED BY A LATER POST)

The list includes 202 Street Suffixes and 24 Secondary Unit Designators. The "Use Regular Expressions" check box must be painstakingly ticked for each item. The form needs a "Check All" button or link.

image


There are two regular expressions that should also be included, BLANK and PO Box. They are:

1. BLANK --> , AND \. (two match-on entries)
2. PO Box --> (?i)\b((P\.O\.)|(P\. O\.)|(PO BOX)|(POBOX)|(BOX)|(POB)|(P O BOX))\b

Hopefully someone can modify the regular expression format to restrict alterations to only the last word before the Secondary Unit Designator. That would be helpful.

1 person has this question

Those look ok to me. I would suggest just trying one line, like the Road one, and getting that to work first.

Make sure to try a full address and not just "rd" because it might be looking for only rd that is at the end of an address line.
Wayne is correct. The regular expressions are looking for a space before "rd" and it being at the end of the line or being followed by "," or " #".
Thank you! I tried the full address and it works, so it was the space I was forgetting about.

You may be able to help me with something else. Here is Australia our addresses are, e.g.
Level 1, 18 Bell Road or Unit 6, 22 Smith Drive

I want to be able to convert, say lvl 1 to Level 1 or U 6 to Unit 6 but still keep the comma.
I am using the following:
Replacement Value: Level
Value to match on: (?i)\b(level\.?|lvl\.?|lev\.?)\b
But I end up with Level 1 18 Bell Road. I can live with this but wondered if there was a way to get the comma in there.

Many thanks
Jenny


Jenny,

This should take care of it for you.

(level\.?(?=,? )|lvl\.?(?=,? )|lev\.?(?=,? ))
Thanks Nic, appreciate your response, but I'm afraid it doesn't work for me.

To make the issue a little clearer:
The raw data reads: lvl 11, 515 John Street

So in the dictionary I'm using:
Replacement Value: Level
Value to match on: (?i)\b(level\.?|lvl\.?|lev\.?)\b
which gives me: Level 11 515 John Street

Basically I need to change " lvl" to "Level" without losing the comma.

Many thanks
Jenny

Jenny,


Here are the results from the expression I posted.  They appear to be working in the example you gave.

image

Jenny, can you right-click on your "Street abbrev" dictionary in the list of dictionaries to export it to CSV and post it here? We'll take a look at it.

image


Thanks,
Jeff

Here it is, thanks, Jenny 

Building	(?i)\b(building\.?|bldg\.?)\b	TRUE
Floor	(?i)\b(floor\.?|fl\.?)\b	TRUE
Level	(?i)\b(level\.?|lvl\.?|lev\.?)\b	TRUE
Lot	(?i)\b(lot\.?|lt\.?)\b	TRUE
Room	(?i)\b(room\.?|rm\.?)\b	TRUE
Suite	(?i)\b(suite\.?|ste\.?)\b	TRUE
Unit	(?i)\b(unit\.?|u\.?)\b	TRUE
,	 ,	TRUE
Avenue	(?i)(?<= )(avnue\.?$|avnue\.?(?=,)|avnue\.?(?= #)|avn\.?$|avn\.?(?=,)|avn\.?(?= #)|avenue\.?$|avenue\.?(?=,)|avenue\.?(?= #)|avenu\.?$|avenu\.?(?=,)|avenu\.?(?= #)|aven\.?$|aven\.?(?=,)|aven\.?(?= #)|ave\.?$|ave\.?(?=,)|ave\.?(?= #)|av\.?$|av\.?(?=,)|av\.?(?= #))	TRUE
Boulevard	(?i)(?<= )(boulv\.?$|boulv\.?(?=,)|boulv\.?(?= #)|boulevard\.?$|boulevard\.?(?=,)|boulevard\.?(?= #)|boul\.?$|boul\.?(?=,)|boul\.?(?= #)|blvd\.?$|blvd\.?(?=,)|blvd\.?(?= #))	TRUE
Circle	(?i)(?<= )(crcle\.?$|crcle\.?(?=,)|crcle\.?(?= #)|crcl\.?$|crcl\.?(?=,)|crcl\.?(?= #)|circle\.?$|circle\.?(?=,)|circle\.?(?= #)|circl\.?$|circl\.?(?=,)|circl\.?(?= #)|circ\.?$|circ\.?(?=,)|circ\.?(?= #)|cir\.?$|cir\.?(?=,)|cir\.?(?= #))	TRUE
Court	(?i)(?<= )(ct\.?$|ct\.?(?=,)|ct\.?(?= #)|crt\.?$|crt\.?(?=,)|crt\.?(?= #)|court\.?$|court\.?(?=,)|court\.?(?= #))	TRUE
Crescent	(?i)(?<= )(cres\.?$|cres\.?(?=,)|cres\.?(?= #)|cr\.?$|cr\.?(?=,)|cr\.?(?= #)|crscnt\.?$|crscnt\.?(?=,)|crscnt\.?(?= #)|cresent\.?$|cresent\.?(?=,)|cresent\.?(?= #)|crescent\.?$|crescent\.?(?=,)|crescent\.?(?= #)|cres\.?$|cres\.?(?=,)|cres\.?(?= #)|crecent\.?$|crecent\.?(?=,)|crecent\.?(?= #))	TRUE
Drive	(?i)(?<= )(drv\.?$|drv\.?(?=,)|drv\.?(?= #)|drive\.?$|drive\.?(?=,)|drive\.?(?= #)|driv\.?$|driv\.?(?=,)|driv\.?(?= #)|dr\.?$|dr\.?(?=,)|dr\.?(?= #))	TRUE
Estate	(?i)(?<= )(estate\.?$|estate\.?(?=,)|estate\.?(?= #)|est\.?$|est\.?(?=,)|est\.?(?= #))	TRUE
Estates	(?i)(?<= )(ests\.?$|ests\.?(?=,)|ests\.?(?= #)|estates\.?$|estates\.?(?=,)|estates\.?(?= #))	TRUE
Gardens	(?i)(?<= )(grdns\.?$|grdns\.?(?=,)|grdns\.?(?= #)|gdns\.?$|gdns\.?(?=,)|gdns\.?(?= #)|gardens\.?$|gardens\.?(?=,)|gardens\.?(?= #))	TRUE
Heights	(?i)(?<= )(hts\.?$|hts\.?(?=,)|hts\.?(?= #)|ht\.?$|ht\.?(?=,)|ht\.?(?= #)|hgts\.?$|hgts\.?(?=,)|hgts\.?(?= #)|heights\.?$|heights\.?(?=,)|heights\.?(?= #)|height\.?$|height\.?(?=,)|height\.?(?= #))	TRUE
Lane	(?i)(?<= )(ln\.?$|ln\.?(?=,)|ln\.?(?= #)|lanes\.?$|lanes\.?(?=,)|lanes\.?(?= #)|lane\.?$|lane\.?(?=,)|lane\.?(?= #)|la\.?$|la\.?(?=,)|la\.?(?= #))	TRUE
Place	(?i)(?<= )(place\.?$|place\.?(?=,)|place\.?(?= #)|pl\.?$|pl\.?(?=,)|pl\.?(?= #))	TRUE
Road	(?i)(?<= )(road\.?$|road\.?(?=,)|road\.?(?= #)|rd\.?$|rd\.?(?=,)|rd\.?(?= #))	TRUE
Roads	(?i)(?<= )(roads\.?$|roads\.?(?=,)|roads\.?(?= #)|rds\.?$|rds\.?(?=,)|rds\.?(?= #))	TRUE
Street	(?i)(?<= )(strt\.?$|strt\.?(?=,)|strt\.?(?= #)|street\.?$|street\.?(?=,)|street\.?(?= #)|str\.?$|str\.?(?=,)|str\.?(?= #)|st\.?$|st\.?(?=,)|st\.?(?= #))	TRUE
Streets	(?i)(?<= )(streets\.?$|streets\.?(?=,)|streets\.?(?= #))	TRUE
Terrace	(?i)(?<= )(tce\.?$|tce\.?(?=,)|tce\.?(?= #)|terr\.?$|terr\.?(?=,)|terr\.?(?= #)|ter\.?$|ter\.?(?=,)|ter\.?(?= #))	TRUE
\.	TRUE
\,	TRUE
PO Box	(?i)\b((P\.O\.)|(P\. O\.)|(PO BOX)|(POBOX)|(BOX)|(POB)|(P O BOX))\b

  

Jenny,

The level regex is correct. However you have another line that is causing the issue.

You are replacing "," and "." with blank (your last 2 entries)

Thanks very much Nick!  Have deleted the "blank" entry and it all works now. Your patience is appreciated by this RegEx newbie.

Jenny

I'm having trouble getting my dictionary to convert Ave. to Avenue or St. to Street. It is leaving in the period. I am not a programmer so I don't know what to look for :)
Posted By Rhonda Derrick on 02 Jul 2014 01:46 PM
I'm having trouble getting my dictionary to convert Ave. to Avenue or St. to Street. It is leaving in the period. I am not a programmer so I don't know what to look for :)


Can you post the lines from your library that refers to Street and Avenue? Much as Jenny did above, but you don't need to post the whole library - just the lines that are giving you trouble.

Then I'm sure one of us will spot the problem.
Sorry for the late reply, I have been on vacation. Here are the lines from my dictionary for Ave. to Avenue and St. to Street.

(?i)\b(avnue\.?|avn\.?|avenue\.?|avenu\.?|aven\.?|ave\.?|av\.?|Ave.\.?)\b 

(?i)\b(strt\.?|street\.?|str\.?|st\.?|st.\.?)\b 

Any help you can give would be greatly appreciated.


Greeting - I am trying to re-create the long abbreviations from the posted excel spreadsheet and am having some difficulties.  When I test the expression, for example st and I click on the test button I get st

Below is the expression I have.  I get this same issue for all others as well.  Thanks ahead of time. 


(?i)(?<= )(strt\.?$|strt\.?(?=,)|strt\.?(?= #)|street\.?$|street\.?(?=,)|street\.?(?= #)|str\.?$|str\.?(?=,)|str\.?(?= #)|st\.?$|st\.?(?=,)|st\.?(?= #))

 


NEVER MIND - I WAS HAVING A "DUH" moment - it's Friday, happy weekend. 


Greeting!  If anyone can help - I have set up a dictionary using the long address file that was posted; the use regular expressions button is checked and we are on version 2.7.1.6 - it was working since I create the dictionary several months ago, but it stopped working recently. If I try to test it, the incoming text does not change  after clicking Test.  It wasn't working before we installed the latest version of IOM, but I'm not sure if it stopped working after the version prior.   Any ideas?  Thanks ahead of time. 

 

Marcy Mirkin 

Director of Operations 

AFMDA

mmirkin@afmda.org 

Login or Signup to post a comment