A Regex (Regular Expression) is a pattern that is used to check whether a given string matches that pattern.
In Ruby, regex patterns are defined between two forward slashes (/pattern/
). For example,
# A regex pattern
/^m.t$/
The above pattern indicates a three-letter string where,
^
- Marks the start of a string (here, the string starts withm
)..
- Indicates any one letter or character$
- marks the end of a string (here, the string ends witht
).
For example strings like "mat"
and "mit"
match the above regex pattern.
However, strings like "mom"
and "magnet"
don't match because these strings are not 3 letter words that start with m
and end with t
.
Ruby Regex Class
Ruby provides the Regex
class to work with regular expressions. You can create a regex pattern using either the literal syntax or the Regexp.new
method.
Syntax
# Using literal syntax
/pattern/
# Using Regexp.new
Regexp.new("pattern")
Here, pattern
is the regular expression pattern you want to match.
Regex Flags in Ruby
Regex flags modify how patterns are interpreted. They are added after the closing slash of the regex.
Some commonly used flags are:
i
- Case-insensitive matchm
- Multiline mode
Let's see an example,
/^m/i
This pattern matches strings that start with "m"
or "M"
.
Example: Ruby Regex
# A regular expression pattern for a five-letter word
# that starts with "a" and ends with "e"
pattern = /^a...e$/
# Test string
word = "apple"
# Check if the word matches the pattern
if word.match?(pattern)
puts "String matches the pattern"
else
puts "String doesn't match the pattern"
end
Output
String matches the pattern
In the above example, we have checked whether the string "apple"
matches the defined regular expression pattern.
The pattern "^a...e$"
indicates any five-letter string starting with a
and ending with e
.
Here, the match?
method returns true
if the string that we pass matches the regex pattern.
If we pass another string like "apache"
, it won't match with pattern
because "apache"
has more than five characters.
Metacharacters
To specify regular expressions, metacharacters are used. Metacharacters are characters that are interpreted in a special way by a regex engine.
Some basic metacharacters are:
Metacharacter | Description | Example |
---|---|---|
[] |
Specifies a set of characters you wish to match. | /[abc]/ |
. |
Specifies any single character (except newline '\n' ). |
/.../ |
^ |
Specifies that the string starts with a certain character. | /^m/ |
$ |
Specifies that the string ends with a certain character. | /y$/ |
* |
Matches zero or more occurrences of the preceding element. | /ca*t/ |
+ |
Matches one or more occurrences of the preceding element. | /ma+t/ |
? |
Matches zero or one occurrence of the preceding element. | /ma?n/ |
{} |
Specifies the range of repetitions of the preceding element. | /a{2}/ |
{} |
Used as an or operator. |
/a|b/ |
() |
Groups sub-patterns in a regular expression. | /(ab)+/ |
Square Brackets: []
[]
specifies a set of characters you wish to match. For example,
/[abc]/
This pattern matches any string that contains any of the given characters: a
, b
, or c
.
Let's check if the following strings match the regex pattern [abc]
.
String | Matched? | Reason |
---|---|---|
"a" |
1 Match | String contains a . |
"ac" |
2 Matches | String contains a and c . |
"jim" |
0 Matches | String doesn't contain any of a , b or c . |
"abc" |
3 Matches | String contains all three - a , b and c . |
Note: You can also specify a range of characters using -
inside square brackets. For example,
[a-e]
is the same as[abcde]
.[0-3]
is the same as[0123]
.
Period
A period specifies any single character (except newline '\n'
). For example,
/.../
The pattern above matches any three-character substring (except newline).
Let's check if the following strings match the regex pattern ...
, which represents a group of three characters.
String | Matched? | Reason |
---|---|---|
"abs" |
1 Match | String contains three letters (a , b , s ). |
"ac" |
0 Match | String doesn't contain three letters. |
"jim" |
1 Match | String contains three letters. |
"abcd" |
1 Match | String contains four letters, out of which the first three will be counted as a match. |
"abcjkl" |
2 Matches | String contains six letters (3+3). |
Caret: ^
The caret symbol ^
specifies that the string starts with a certain character. For example,
/^m/
This pattern matches strings starting with the letter "m"
.
Let's check if the following strings match the regex pattern ^m
.
String | Matched? | Reason |
---|---|---|
"man" |
1 Match | It starts with "m" . |
m |
1 Match | It starts with "m" . |
Man |
0 Matches | It doesn't start with "m" . |
sms |
0 Matches | It doesn't start with "m" . |
Dollar: $
The dollar symbol $
specifies that the string ends with a certain character. For example,
/y$/
This pattern matches strings ending with the letter "y"
Let's check if the following strings match the regex pattern y$
.
String | Matched? | Reason |
---|---|---|
"monday" |
1 Match | It ends with "y" . |
"say" |
1 Match | It ends with "y" . |
"myname" |
0 Matches | It doesn't end with "y" . |
Star: *
The star symbol *
matches zero or more occurrences of the preceding element. For example,
/ca*t/
Here, a
is the element that precedes *
. So, this pattern matches strings that have any number (including zero) of a
in between c
and t
.
Let's check if the following strings match the regex pattern ca*t
.
String | Matched? | Reason |
---|---|---|
"cat" |
1 Match | It has one a between c and t . |
"ct" |
1 Match | It has zero a between c and t . |
"caaaat" |
1 Match | It has three a between c and t . |
"crt" |
0 Matches | It has the letter r (not a ) between c and t . |
"caatcaaat" |
2 Matches | It has a in two places (caat and caaat ). |
Plus: +
The plus symbol +
matches one or more occurrences of the preceding element. For example,
/ma+t/
This pattern matches strings that have one or more occurrences of a
in between m
and t
Let's check if the following strings match the regex pattern ma+t
.
String | Matched? | Reason |
---|---|---|
"mat" |
1 Match | It has one a between m and t . |
"mt" |
0 Match | It doesn't have a between m and t . |
"matemaat" |
2 Matches | It has two matching substrings (mat and maat ). |
"mart" |
0 Matches | It is not followed by t in mart . |
Question Mark: ?
The question mark symbol ?
matches zero or one occurrence of the preceding element. For example,
/ma?n/
This pattern matches strings that have one or zero number of a
in between m
and n
.
Let's check if the following strings match the regex pattern ma?n
.
String | Matched? | Reason |
---|---|---|
"man" |
1 Match | It has one a between m and n . |
"mn" |
1 Match | It has zero number of a between m and n . |
"maaaaan" |
0 Matches | It has more than one a between m and n . |
"woman" |
1 Match | It has one a between m and n . |
Braces: {}
The braces symbol {}
is used to specify the range of repetitions of the preceding element.
/a{2}/
This pattern matches strings that have exactly two a
characters in a row.
Braces {}
can be used as:
# Exactly n times
{n}
# At least n times
{n,}
# Between n and m times
{n, m}
Let's check if the following string examples match the regex pattern /a{2}/
.
String | Matched? | Reason |
---|---|---|
"aa" |
1 Match | It contains exactly two a in a row. |
"abcdaat" |
1 Match | It has two a on left of other character |
Alternation: |
The vertical bar |
is used as an or
operator. For example,
/a|b/
This pattern matches strings that have either a
or b
.
Let's check if the following strings match the regex pattern a|b
.
String | Matched? | Reason |
---|---|---|
"cde" |
0 Matches | It doesn't have either a or b . |
"ade" |
1 Match | There is one a in the string. |
"acdbea" |
3 matches | Matches each individual a or b in the string. |
Parentheses: ()
Parentheses ()
are used to group sub-patterns in a regular expression.
Grouping allows you to apply quantifiers (like +
, *
, or {}
) to the entire group. For example,
/(ab)+/
This pattern matches one or more occurrences of the substring ab
.
Let's check if the following strings match the regex pattern (ab)+
.
String | Matched? | Reason |
---|---|---|
"ab" |
1 Match | It contains one occurrence of "ab" . |
"abab" |
1 Match | It has a contiguous match of two "ab" sequences together. |
"ababab" |
1 Match | It has one long match of three "ab" sequences together. |
"aabb" |
0 Matches | "aa" breaks the valid repeating group. |
Special Sequences
Ruby regex also supports special sequences for common character sets.
Pattern | Description |
---|---|
\d |
Any digit (same as [0-9] ) |
\D |
Any non-digit |
\w |
Word character (letters, digits, underscore) |
\W |
Non-word character |
\s |
Whitespace character |
\S |
Non-whitespace character |
Example: Matching Digits
The \d
sequence matches if the specified characters are one or more digits. For example,
/\d+/
This pattern matches one or more digits.
Let's check if the following string examples match the regex pattern \d+
.
String | Matched? | Reason |
---|---|---|
"123" |
1 Match | The string consists entirely of digits, which is counted together as a single digit. |
"a1b2c3" |
3 Matches | Matches 1, 2, and 3 separately since there are alphabetic characters in between. |
"abc" |
0 Matches | No digits present. |
Escape Sequences
If you want to match special characters literally (like .
or *
), you need to escape them with a backslash \
. For example,
/\./
This matches a literal period .
.
Let's check if the following strings match the regex pattern \.
.
String | Matched? | Reason |
---|---|---|
"hello." |
1 Match | Matches the dot character present at the end of the string. |
"hello" |
0 Matches | No dot present. |
"hi.." |
2 Matches | Two literal dots matched. |
Regex Methods
Ruby provides several methods to work with regex:
Method | Description |
---|---|
=~ |
Returns the index of the first match or nil . |
match |
Returns MatchData object or nil . |
match? (>= 2.4) |
Returns true if match is found and false if it's not. |
scan |
Returns an array of all matching substrings. |
gsub |
Substitutes all matches. |
Example: Using Regex Methods
text = "I have 2 cats and 3 dogs"
# Find first match
# Output: 7
puts text =~ /\d/
# Get all digits
# Output: 2, 3
puts text.scan(/\d/).join(", ")
# Replace digits with '#'
# Output: I have # cats and # dogs
puts text.gsub(/\d/, '#')
Use Cases of Regex
Regular expressions are commonly used in real-world programming tasks, such as:
- Validating input formats (e.g., emails, phone numbers).
- Searching for specific keywords in text.
- Extracting parts of strings (e.g., digits, words).
- Replacing matched patterns with new values.