Regular Expression To Check For Prime Numbers

The code is in python, but the concept is the same for any language with regular expression support (perl, sed, awk, vim, etc..)

1
2
3
import re
def is_prime(n):
        return re.match(r'^1?$|^(11+?)\1+$', "1" * n) == None;

That’s it :) using a regular expression we can check if a number is prime or not.

Here is a short explanation of why it works (though it will ruin the magic part):

The trick is first to “flatten” the number. i.e., for 7, we’ll pass (“1″ * 7) to the regexp, that is: “1111111″.
The first part of the regexp “^1?$” matches a string with an optional “1″, that is, it matches the empty string, and the string “1″, which are the “flat” representations of 0 and 1 respectively.

The second part is more tricky:
It tries to match a string of “start of line”, then two 1′s or more ^(11+?), then it tries to find one or more times the previous match (\1+). then end of line $.
If you imaging “sticks” instead of the 1′s, then what we did is: represent 7 (for example) with 7 sticks, then try to divide them into two or more groups of two, or two or more groups of three, etc.. in general: try to divide the sticks into two or more groups of “two or more”. Which is exactly the definition of a non-prime number (a.k.a composite number).
So, if the regexp succeeds to match the pattern, the number is not prime. If it fails, the number is prime.

So easy, so elegant ;-)

Note about the ^(11+?): This is almost the same as ^(11+) but, with one difference: using (+?) instead of (+) means “match minimally”, as opposed to “match greedily”.
For example, consider the string “aaaa” and the regexp ‘(a+)\1+’ the regexp will match the string, the value of \1 will be “aa”, since the a+ pattern matched the longest string which allows the regexp to match the string.
The regexp ‘(a+?)\1+’ will also match the string, but the value of \1 will be “a”, because this time (a+?) looked for the minimal match which will allow the regexp to match the string.
In out case, the use of (+?) instead of (+) does not affect correctness (that is, we could use (+) and the regexp will still work correctly). The only difference is related to performance, because our case tries small groups first, while the other case tries big groups first.

More information and explanation in the original article.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Reddit Post to StumbleUpon

Related posts:

  1. Trimming Bash Variables – A Summary
  2. Strip Leading Characters Off A String
  3. Backup Your Facebook Data
  4. Finding The Median Of Two Sorted Arrays
  5. Bash Tip – Separate a Bash Variable From Surrounding Letters

Tags: ,

Saturday, July 24th, 2010 Regular Expressions

Leave a Reply

my email
my photo
Hi,
My name is Amir Watad. I have a BSc. in biomedical engineering from The Biomedical Engineering school , Technion , Israel, and a BSc. in electrical engineering from The Electrical Engineering school , Technion , Israel.
I'm a Verification Engineer in Mellanox Technologies Ltd.
I love Linux, the Command Line and the OpenSource Community.
I used to write Poems (Arabic) when I was able to find time for this.
July 2010
S M T W T F S
« Jun   Aug »
 123
45678910
11121314151617
18192021222324
25262728293031
SEO Powered by Platinum SEO from Techblissonline

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.