Start a new topic

Creating Regions from UK Postcodes with RegEx

Hi there,

 

I’m having trouble using RegEx to convert incoming post codes into a table of regions.

 

I have pasted the dictionary values as they have been inserted but I don’t understand how some of the replacement values are coming out. For example N8 7NX returns as North West despite this replacement value not even having a value that begins with N.

 

Any help is much appreciated.

 

Thanks!

Region Area Regex
London UB.* TRUE
North East DH.* TRUE
North West M.* TRUE
London NW.* TRUE
East CO.* TRUE
London N.* TRUE
London CR.* TRUE
London SW.* TRUE
South West TQ.* TRUE
London W.* TRUE
West Midlands B.* TRUE
East Midlands DE.* TRUE
Wales LL.* TRUE
East Midlands LE.* TRUE
West Midlands WR.* TRUE
London IG.* TRUE
East CB.* TRUE
Yorkshire and Humber S.* TRUE
South East CT.* TRUE
Wales CF.* TRUE
South East BN.* TRUE
East NR.* TRUE
North West L.* TRUE
London HA.* TRUE
London E.* TRUE
South West GL.* TRUE
South East RG.* TRUE
South West BH.* TRUE
South East ME.* TRUE
Yorkshire and Humber BD.* TRUE
Yorkshire and Humber YO.* TRUE
London SE.* TRUE
London BR.* TRUE
Isle of Man IM.* TRUE
Wales NP.* TRUE
North West SK.* TRUE
London EN.* TRUE
East Midlands NN.* TRUE
London EC.* TRUE
London WC.* TRUE
South East OX.* TRUE
London SM.* TRUE
South West BS.* TRUE
East CM.* TRUE
North West WA.* TRUE
Scotland EH.* TRUE
South West TR.* TRUE
South East GU.* TRUE
East SG.* TRUE
East PE.* TRUE
London TW.* TRUE
West Midlands WV.* TRUE
East SS.* TRUE
South West TA.* TRUE
North West BL.* TRUE
London KT.* TRUE
North East NE.* TRUE
North West PR.* TRUE
South East HP.* TRUE
West Midlands CV.* TRUE
North West OL.* TRUE
South West SN.* TRUE
Yorkshire and Humber LA.* TRUE
South East TN.* TRUE
South East SO.* TRUE
South East RH.* TRUE
East LU.* TRUE
London DA.* TRUE
London RM.* TRUE
East AL.* TRUE
North West CH.* TRUE
South West EX.* TRUE
Yorkshire and Humber HX.* TRUE
East Midlands NG.* TRUE
West Midlands TF.* TRUE
South West PL.* TRUE
South West SP.* TRUE
Yorkshire and Humber DN.* TRUE
South East PO.* TRUE
South West DT.* TRUE
Yorkshire and Humber HD.* TRUE
Wales SA.* TRUE
Scotland DG.* TRUE
East MK.* TRUE
West Midlands HR.* TRUE
South West BA.* TRUE
Wales LD.* TRUE
Yorkshire and Humber LS.* TRUE
Scotland ML.* TRUE
Yorkshire and Humber HU.* TRUE
Scotland FK.* TRUE
Yorkshire and Humber HG.* TRUE
North East SR.* TRUE
Scotland KY.* TRUE
Scotland KW.* TRUE
East IP.* TRUE
London WD.* TRUE
Scotland HS.* TRUE
South East SL.* TRUE
North West FY.* TRUE
North East TS.* TRUE
Scotland IV.* TRUE
West Midlands WS.* TRUE
North West CA.* TRUE
Wales SY.* TRUE
Scotland G.* TRUE
Scotland PH.* TRUE
Scotland TD.* TRUE
West Midlands ST.* TRUE
Scotland AB.* TRUE
West Midlands DY.* TRUE
Yorkshire and Humber DL.* TRUE
Yorkshire and Humber WF.* TRUE
East LN.* TRUE
Scotland PA.* TRUE
Scotland KA.* TRUE
Northern Ireland

Hi AC,

I think what may be happening here is that the post code is first translating "N" to "London," then the dictionary picks up the "L" from the beginning of London to translate it to North West. Are you seeing any others follow this pattern?

There are a couple ways to resolve it. Sometimes, just rearranging the order of the entries will work, or you could change the value to match to be the letter followed by any number (rather than the letter followed by any character).

I hope this gives you something to go on, but let us know if you are still stuck.

Amanda Tetanich, bCRE
Associate Product Manager | Omatic Software
I think the easiest way to fix this would be to put a random character at the beginning of your replacement values. I'll use @ for this example. Value to match on UB.* replacment value @London. Next line value to match on DH.* replacement value @North East...and so on. Then on the very last line of the dictionary value to match on @ replacement value -BLANK- (with RegEx turned on) to remove the @ symbol.

Thanks,

John
John - that's a cool idea and would work well here, it seems!
Thanks for your suggestions guys. I really liked the idea of the random character but I couldn't make it work, it was giving me the same problem as before but Amanda's suggestion worked so instead of just UB.* I have UB[0-9] .* and it's feeding back the proper values.
Thinking about it further you'd also need to put ^ at the beginning of all of your values to match on (along with the special character in the replacement value). It was still matching on L because it was somewhere in the string, but ^ would force it to only look at the beginning of the string for a match. As long as it's working one way or the other that's all that matters!
Login or Signup to post a comment