Finding an IP address in text or string with python is a simpler task than in java. Only the regex is not shorter than in the java regex example!
First an example with python: build a RegExp-Object for faster matching and than loop over the result iterator.
import re logText = 'asdfesgewg 215.2.125.32 alkejo 234 oij8982jldkja.lkjwech . 24.33.125.234 kadfjeladfjeladkj' bytePattern = "([01]?\d\d?|2[0-4]\d|25[0-5])" regObj = re.compile("\.".join([bytePattern]*4)) for match in regObj.finditer(logText): print match.group()
A regex like
/\d+\.\d+\.\d+\.\d+/
wont work, because there match “999.999.111.000” too. But for the usage in python – that is it! Using a regular expression is more native in python than in java. Or in javascript or in perl or asp.net…
And how to find it with JavaScript?
It looks like the small python example. Build the RegExp-Object for faster matching and a loop for finding all.
var logText = 'asdfesgewg 215.2.125.32 alkejo 234 oij8982jldkja.lkjwech . 24.33.125.234 kadfjeladfjeladkj'; var regObj = new RegExp("(?:(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])"); while (var result = regObj.exec(logText)) { alert("Matched: `" + result[0]); }
For a more detailed example have a look at experts-exchange.com.
the native playground : perl
$txt = "asdfesgewg 215.2.125.32 alkejo 234 oij8982jldkja.lkjwech . 24.33.125.234 kadfjeladfjeladkj"; while( $txt=~/(([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5]))/g) { print $1."\n"; }
Some regex basics can be found at trap17.com and more regex examples on perl cookbook.
a more complex script: IP – log file statistic
But only finding an IP address in some irregular text is not a common use-case. An apache logfile is well formatted and the IP part can be found directly.
Do a simply split every line and the first part should be the IP address. To check if the first element match the IP pattern, use a ready function (for the example from the socket module). As addition and much more complexer example I will try to find the TOP10 IPs with the request count from the logfile.
import re, socket hits = {} #212.174.187.49 - - [13/Jul/2009:01:06:38 +0200] "GET /index.html HTTP/1.1" 400 335 - "-" "-" "-" try: fp = file("all.log") for line in fp: elements = re.split("\s+", line) try: socket.inet_aton(elements[0]) hits[elements[0]] += 1 except KeyError: hits[elements[0]] = 1 except socket.error: pass # no ip in the starting logline finally: fp.close() #Sorting the IPs with the hit-count ipKeys = hits.keys() ipKeys.sort(lambda a,b: hits[b]-hits[a]) for ip in ipKeys[:10]: print "%10dx %s" % (hits[ip], ip)
The result looks like this and runs 0.7 sec for 10.000 lines logfile (on Intel Atom N270).
1406x 10.72.199.111 1291x 10.214.141.196 937x 10.43.81.243 569x 10.43.205.83 302x 10.235.116.128 260x 10.121.239.210 164x 10.145.232.155 125x 10.106.120.225 113x 10.93.210.194 104x 10.174.153.69
The first block of the real addresses has been replaced with the number 10 for anonymity.
And the same IP script as command line with perl
The python script could made shorter and more ugly. Finding all IPs, sorting it, counting and print the TOP10 IPs.
perl -wlne 'print $1 if /(([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5]))/' all.log |sort|uniq -c | sort -n -r|head -n 10
The perl/shell example needs only 0.2 seconds for the same logfile.
conclusion
I like to use regex and python for getting some statistics, but the usage of some command line unix tool is useful too!

Christian Harms

Latest posts by Christian Harms (see all)
- google code jam 2013 – tic-tac-toe-Tomek solution - April 16, 2013
- Google code jam 2013 – the lawnmower - April 14, 2013
- code puzzles and permutations - April 11, 2013