Hey, this time I have a litte quiz. The implementation will be in different languages but the task is always the same.
This is also a good interview question to see how the candidate’s skills in different areas are and it’s brief enough for phone interviews, as she can provide the idea on how to solve the problem and does not need to implement code.
We will use bash, some scripting or higher language and database specific solutions.
the task
We have a text file with 13 Million entries, all numbers in the range from 1.000.000 to 100.000.000. Each line has only one number. The file is unordered and we have about 1% duplicate entries. Remove the duplicates.
the task remove the duplicates. the challenge: be efficient and brief
- stay on your shell, by either script or provided tools filter the file
- use a scripting language of your choice to filter the file
- use a high-level language of your choice to filter the file (I’m really looking forward to some Java solutions)
- use sql (by importing the data, or assume you already have the data in a table)
(in the interview it would be: give me one answer per field, in comments it’s enough to post one answer, but the more the better )
There’s no limit on what you’re allowed to do. Just try not to stress the cpu unnecessarily and memory too much.
I’m excited to see your solutions.

Nico Heid

Latest posts by Nico Heid (see all)
- Raspberry Pi Supply Switch, start and shut down your Pi like your PC - August 24, 2015
- Infinite House of Pancakes – code jam 2015 - July 13, 2015
- google code jam 2014 – magic trick - April 18, 2014