Regular Expressions Tutorial: From Basics to Advanced Patterns
Regular expressions (regex or regexp) are powerful pattern-matching tools supported by virtually every programming language. Whether you need to validate user input, search through logs, or transform text, regex is an essential skill for developers. This tutorial takes you from basic syntax to advanced techniques with practical examples.
What Are Regular Expressions?
A regular expression is a sequence of characters that defines a search pattern. Regex engines scan text and find substrings that match the specified pattern. Most languages provide built-in regex support โ JavaScript has /pattern/flags, Python has the re module, and Java has java.util.regex.
Basic Metacharacters
Metacharacters are special characters with predefined meanings in regex:
.โ Matches any single character (except newline)^โ Matches the start of a string$โ Matches the end of a string\dโ Matches any digit (equivalent to[0-9])\wโ Matches any word character ([a-zA-Z0-9_])\sโ Matches any whitespace character\bโ Matches a word boundary
// JavaScript examples
"abc123".match(/\d+/); // ["123"]
"hello world".match(/^\w+/); // ["hello"]
"file.txt".match(/.+\.txt$/); // ["file.txt"]
Quantifiers
Quantifiers specify how many times a character or group should repeat:
*โ Zero or more times+โ One or more times?โ Zero or one time (optional){n}โ Exactly n times{n,}โ At least n times{n,m}โ Between n and m times
// Quantifier examples
/colou?r/ // Matches "color" and "colour"
/\d{3}-\d{4}/ // Matches "555-1234"
/a{2,4}/ // Matches "aa", "aaa", or "aaaa"
/https?:\/\// // Matches "http://" and "https://"
Greedy vs Lazy
By default, quantifiers are greedy โ they match as much text as possible. Adding ? after a quantifier makes it lazy (matching as little as possible):
const html = '<b>bold</b> and <i>italic</i>';
html.match(/<.+>/); // Greedy: ["<b>bold</b> and <i>italic</i>"]
html.match(/<.+?>/); // Lazy: ["<b>"]
Character Classes
Square brackets define a set of characters to match:
/[aeiou]/ // Matches any vowel
/[^aeiou]/ // Matches any non-vowel (^ negates inside [])
/[a-zA-Z]/ // Matches any letter
/[0-9a-fA-F]/ // Matches any hexadecimal digit
Groups & Capturing
Parentheses create groups for capturing and applying quantifiers:
// Capturing groups
const datePattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2026-03-12".match(datePattern);
// match[0] = "2026-03-12" (full match)
// match[1] = "2026" (year)
// match[2] = "03" (month)
// match[3] = "12" (day)
// Non-capturing group: (?:...)
/(?:https?|ftp):\/\// // Groups without capturing
// Named groups (ES2018+)
const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const { groups } = "2026-03-12".match(re);
console.log(groups.year); // "2026"
Lookaheads & Lookbehinds
Lookaround assertions match a position without consuming characters:
(?=...)โ Positive lookahead: followed by ...(?!...)โ Negative lookahead: NOT followed by ...(?<=...)โ Positive lookbehind: preceded by ...(?<!...)โ Negative lookbehind: NOT preceded by ...
// Match a number followed by "px"
/\d+(?=px)/.exec("12px 3em"); // ["12"]
// Match a digit NOT followed by a letter
/\d(?![a-z])/g.exec("1a 2 3b"); // Matches "2"
// Match "$" preceded amount (lookbehind)
/(?<=\$)\d+/.exec("Price: $42"); // ["42"]
Flags / Modifiers
Flags change how the regex engine processes the pattern:
gโ Global: find all matches, not just the firstiโ Case-insensitive matchingmโ Multi-line:^/$match line starts/endssโ Dotall:.matches newlines toouโ Unicode: proper Unicode support
/hello/i.test("Hello World"); // true
"aAbBcC".match(/[a-z]/g); // ["a", "b", "c"]
"line1\nline2".match(/^\w+/gm); // ["line1", "line2"]
Real-World Patterns
Email Validation (Simplified)
/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
URL Matching
/https?:\/\/[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()*+,;=]*/
IPv4 Address
/^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/
Password Strength
// At least 8 chars, one uppercase, one lowercase, one digit, one special
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/
(a+)+ that can cause catastrophic backtracking.
Regex in Python
import re
# Search for a pattern
result = re.search(r'\d{4}-\d{2}-\d{2}', 'Date: 2026-03-12')
print(result.group()) # "2026-03-12"
# Find all matches
emails = re.findall(r'[\w.+-]+@[\w-]+\.[\w.]+', text)
# Replace with regex
cleaned = re.sub(r'\s+', ' ', ' too many spaces ')
print(cleaned) # "too many spaces"
# Compile for reuse
pattern = re.compile(r'^[A-Z]{2}\d{4}$')
pattern.match("AB1234") # Match object
r"...") in Python regex to avoid double-escaping backslashes.
๐ ๏ธ Test Your Regex Patterns Online
Open Regex Tool โFurther Reading
- JSON Formatting Guide โ Use regex to search and transform JSON data
- URL Encoding Guide โ URL patterns are common regex targets
- Hash Functions Explained โ Validate hash formats with regex patterns