Python does not currently have an equivalent to scanf(). Regular expressions are generally more powerful, though also more verbose, than scanf() format strings. The table below offers some more-or-less equivalent mappings between scanf() format tokens and regular expressions.
scanf() Token | Regular Expression |
---|---|
%c |
. |
%5c |
.{5} |
%d |
[-+]\d+ |
%e , %E , %f , %g |
[-+](\d+(\.\d*)?|\d*\.\d+)([eE]\d+)? |
%i |
[-+](0[xX][\dA-Fa-f]+|0[0-7]*|\d+) |
%o |
0[0-7]* |
%s |
\S+ |
%u |
\d+ |
%x , %X |
0[xX][\dA-Fa-f] |
To extract the filename and numbers from a string like
/usr/sbin/sendmail - 0 errors, 4 warnings
you would use a scanf() format like
%s - %d errors, %d warnings
The equivalent regular expression would be
(\S+) - (\d+) errors, (\d+) warnings
If you create regular expressions that require the engine to perform a lot
of backtracking, you may encounter a RuntimeError exception with the message
maximum recursion limit exceeded
. For example,
>>> s = "<" + "that's a very big string!"*1000 + ">" >>> re.match('<.*?>', s) Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/lib/python2.3/sre.py", line 132, in match return _compile(pattern, flags).match(string) RuntimeError: maximum recursion limit exceeded
You can often restructure your regular expression to avoid backtracking. The above regular expression can be recast as <[^>]*>. As a further benefit, such regular expressions will run faster than their backtracking equivalents.
See About this document... for information on suggesting changes.