In the 2009 qualification round there was a simple problem with a nice background story:
After years of study, scientists at Google Labs have discovered an alien language transmitted from a faraway planet. The alien language is very unique in that every word consists of exactly L lowercase letters. Also, there are exactly D words in this language.
Once the dictionary of all the words in the alien language was built, the next breakthrough was to discover that the aliens have been transmitting messages to Earth for the past decade. Unfortunately, these signals are weakened due to the distance between our two planets and some of the words may be misinterpreted. In order to help them decipher these messages, the scientists have asked you to devise an algorithm that will determine the number of possible interpretations for a given pattern.
A pattern consists of exactly L tokens. Each token is either a single lowercase letter (the scientists are very sure that this is the letter) or a group of unique lowercase letters surrounded by parenthesis ( and ). For example: (ab)d(dc) means the first letter is either a or b, the second letter is definitely d and the last letter is either d or c. Therefore, the pattern (ab)d(dc) can stand for either one of these 4 possibilities: add, adc, bdd, bdc.
solution with regexp
We have a set of correct words and patterns which should match the words. How many words are matched by every pattern?
words = [“abc”, “bca”, “dac”, “dbc”, “cba”]
The first pattern “(ab)(bc)(ca)” means, that the first character can be “a” or “b”, the second “b” or “c” and the third “c” or “a”.
The solution should print out how many words in the alien language match the pattern. After so many “match” and “pattern” you know the solution: regular expression! One pattern can be converted into a regular expression:
searchStr = line.replace(“(“, “[“).replace(“)”,”]”)
Here the complete solution uses the filter function again (read more about the functional part of python) to shrink the code:
import sys, re fp = file(sys.argv) #read params (l, d, n) = [int(x) for x in fp.next().split()] #read words words = [fp.next() for x in range(d)] #read pattern for i in range(1, n+1): searchStr = fp.next().replace("(","[").replace(")","]") searchIt = re.compile(searchStr).search print "Case #%d: %d" % (i, len(filter(searchIt, words))) fp.close()
25 words with 10 characters and 10 patterns to check:
time python alien.py alien_small.in > alien_small.out real 0m0.120s user 0m0.108s sys 0m0.012s
5000 words with 15 characters and 500 patterns to check:
time python alien.py alien_large.in > alien_large.out real 0m13.398s user 0m12.821s sys 0m0.316s
I found (longer) solutions in other programming languages, feel free to read and comment them or offer an alternative or even better solution – we will link your article.
- the regexp solution in a long perl version: www.technicalypto.com
- Java/C++ and C# at necessaryandsufficient.net
- typical algorithm can find it in num_chars*num_words*strlen(testcase); at intellitures.com
- my regexp solution in java: anuj-mehta.blogspot.com