Home Scripting Perl Tutorial Strings and Regular Expressions
Main Menu
System Verilog
Open Vera
Digital Concepts
Verification Basics
    - Perl Tutorial
        -- Perl Definition
        -- Syntax & Variables
        -- strings
        -- Arrays
        -- Associative Arrays
        -- If & While Loop
        -- File Input
        -- Print Output
        -- Strings and Regular Expressions
        -- Subroutines
        -- Running External Programs
        -- References
        -- Terse Perl
    - Bash
    - Makefile Tutorial
    - Perl Examples
    - Bash Examples
Interview Questions
Computer Architechture
C and C++
AsicGuru Blog
Tags Cloud
Usefull Sites
Know Your IP/Location
Local Information India
Buy Car/Inverter Batteries
Real Estate India
Sports Accessories India
Perl : Strings and Regular Expressions
Share This Articale:

String Processing with Regular Expressions
Perl's most famous strength is in string manipulation with regular expressions. Perl has a million string processing features -- we'll just cover the main ones here. The simple syntax to search for a pattern in a string is...
($string =~ /pattern/)          ## true if the pattern is found somewhere in the string
("binky" =~ /ink/)         ==> TRUE
("binky" =~ /onk/)         ==> FALSE
In the simplest case, the exact characters in the regular expression pattern must occur in the string somewhere. All of the characters in the pattern must be matched, but the pattern does not need to be right at the start or end of the string, and the pattern does not need to use all the characters in the string.

Character Codes

The power of regular expressions is that they can specify patterns, not just fixed characters. First, there are special matching characters...
        a, X, 9 -- ordinary characters just match that character exactly
        . . (a period) -- matches any single character except "\n"
        \w -- (lowercase w) matches a "word" character: a letter or digit [a-zA-Z0-9]
        \W -- (uppercase w) any non word character
        \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\ \f]
        \S -- (uppercase S) any non whitespace character
        \t, \n, -- tab, newline, return
        \d -- decimal digit [0-9]
        \ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can always put a slash in front of it \@ to make sure it is treated just as a character.
"piiig" =~ /p...g/               ==> TRUE        . = any char (except \n)
"piiig" =~ /.../                 ==> TRUE        need not use up the whole string
"piiig" =~ /p....g/              ==> FALSE       must use up the whole pattern (the g is not matched)
"piiig" =~ /p\w\w\wg/            ==> TRUE        \w = any letter or digit
"p123g" =~ /p\d\d\dg/            ==> TRUE        \d = 0..9 digit
The modifier "i" after the last / means the match should be case insensitive...
"PiIIg" =~ /pIiig/               ==> FALSE
"PiIIg" =~ /pIiig/i              ==> TRUE
String interpolation works in regular expression patterns. The variable values are pasted into the expression once before it is evaluated. Characters like * and + continue to have their special meanings in the pattern after interpolation, unless the pattern is bracketed with a \Q..\E. The following examples test if the pattern in $target occurs within brackets < > in $string...
$string =~ /<$target>/                     ## Look for <$target>, '.' '*' keep their
special meanings in $target
$string =~ /<\Q$target\E>/                 ## The \Q..\E puts a backslash in front of every char,
                                           ## so '.' '*' etc. in $target will not have their special meanings
Similar to the \Q..\E form, the quotemeta() function returns a string with every character \
escaped. There is an optional "m" (for "match") that comes before the first /. If the "m" is used, then any character can be used for the delimiter instead of / -- so you could use " or # to delimit the pattern. This is handy if what you are trying to match has a lot of /'s in it. If the delimiter is the single quote (') then interpolation is suppressed. The following expressions are all equivalent...

"piiig" =~ m/piiig/
"piiig" =~ m"piiig"
"piiig" =~ m#piiig#

Control Codes

Things get really interesting when you add in control codes to the regular expression pattern...
        ? -- match 0 or 1 occurrences of the pattern to its left
        * -- 0 or more occurrences of the pattern to its left
        + -- 1 or more occurrences of the pattern to its left
        | -- (vertical bar) logical or -- matches the pattern either on its left or right
        parenthesis ( ) -- group sequences of patterns
        ^ -- matches the start of the string
        $ -- matches the end of the string
Leftmost & Largest
First, Perl tries to find the leftmost match for the pattern, and second it tries to use up as much of the string as possible -- i.e. let + and * use up as many characters as possible.

This Articles is written/submitted by puneet (Puneet Aggarwal). You can also contribute to Asicguru.com. Click here to start

Prev << Print Output

Next >> Subroutines

Sign In
Login with :-
| | |  
  • Bookmark