When I first had to use regular expressions in Java I made some fairly common mistakes.
Let’s start out with a simple search.
simple string matching
We want to search the string: asdfdfdasdfdfdf for occurences of dfd. I can find it four times in the String.
Let’s evaluate what our little Java program says.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexCoding { public static void main(String[] args) { String source = "asdfdfdasdfdfdf"; Pattern pattern = Pattern.compile("dfd"); Matcher matcher = pattern.matcher(source); int hits = 0; while (matcher.find()) { hits++; } System.out.println(hits); } }
The result should be 2. So either our code is wrong, or the logic works differently than expected. And indeed, it does.
The first important rule of regular expressions in Java is: The search runs from left to right, and if a character has been used in search it will not be reused. So when we see dfdfd we used the first three letters for the match, and only fd remains, which is no match any more.
getting IP addresses out of some input
Let’s pretend you application wants to read the IP addresses out of some garbled texted you got from a service.
First of all we have to define how an IP address can look like. We have 4 pairs of numbers separated by dots ranging from 0 to 255.
Let’s write this as a regular expression. This might not be the best solution, if you have any better or more elegant feel free to leave a comment.
Pattern: \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b (this one is from http://www.regular-expressions.info/examples.html , thanks). This is IPv4 only.
It basically takes the allowed numbers for the first three blocks separated by dot and then quantified by 3. And the final block with the same pattern but without the trailing dot.
So let’s see if it works:
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexCoding { public static void main(String[] args) { String logtext = "asdfesgewg 215.2.125.32 alkejo 234 oij8982jld" + "kja.lkjwech . 24.33.125.234 kadfjeladfjeladkj"; String regexpatter = "\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}" + "(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b"; Pattern pattern = Pattern.compile(regexpatter); Matcher matcher = pattern.matcher(logtext); while (matcher.find()) { System.out.println(logtext.substring(matcher.start(),matcher.end())); } } }
As you can see, it prints the IP addresses found in the string.
You have to take care to escape the \ properly, otherwise you’ll end with an empty result or an invalid escape sequence error.
further reads
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
http://en.wikipedia.org/wiki/Regular_expression
http://www.gamedev.net/community/forums/topic.asp?topic_id=357270

Nico Heid

Latest posts by Nico Heid (see all)
- Raspberry Pi Supply Switch, start and shut down your Pi like your PC - August 24, 2015
- Infinite House of Pancakes – code jam 2015 - July 13, 2015
- google code jam 2014 – magic trick - April 18, 2014