Sunday, December 4, 2011

Programmers' Advent Calendar: #2 Back references with regular expressions

I am a little late writing my advent calendar posts, but I will try to keep up. For this episode, let me tell you about a really useful feature that's available in most search/replace tools (e.g. sed, ruby regular expressions ...).


Imagine this scenario: I am writing a novel whose main character and her sister are called "Sarah" and "Judy" "Williams". There are other characters called "Sarah" and "Judy" however those have other last names. There are also other characters whose last name is "Williams" but with different first names. I always refer to my characters as "FirstName LastName" in my novel.


 Now, my editor and I decided that we will change the sisters names to "Sarah" and "Judy" "Brians". And it's a disaster, because now I have to go and change the names by hand. I cannot replace "Sarah ***" or "Judy ***" by "Sarah Brians" and "Judy Brians" because I will them mess up the other names. I can't also just replace "Williams" by "Brians" because I will again mess up the other names. What do I do? One solution would be replace "Sarah Williams" by "Sarah Brians" then again replace "Judy Williams" by "Judy Brians". However, what if there were 20 characters from the same family? Back references solve this problem. 


 Back references simply select a part of the search term and save it, then I can use this same part in the replace pattern. It is very useful whenever the change location is identified by something that is bigger than the changed part. In this example, I would replace "\(Sarah|Judy\) Williams" by "\1 Brians". \1 will refer back to whatever was between "\(" and "\)". If you had more pairs of "\(" and "\)" then you can refer back to them using "\2" , "\3" ... in the order they were defined. Come back tomorrow for another little useful piece of information. :).

No comments: