Regular Expressions Tutorial: From Basics to Advanced Patterns

Regular expressions (regex or regexp) are powerful pattern-matching tools supported by virtually every programming language. Whether you need to validate user input, search through logs, or transform text, regex is an essential skill for developers. This tutorial takes you from basic syntax to advanced techniques with practical examples.

What Are Regular Expressions?

A regular expression is a sequence of characters that defines a search pattern. Regex engines scan text and find substrings that match the specified pattern. Most languages provide built-in regex support โ€” JavaScript has /pattern/flags, Python has the re module, and Java has java.util.regex.

Basic Metacharacters

Metacharacters are special characters with predefined meanings in regex:

  • . โ€” Matches any single character (except newline)
  • ^ โ€” Matches the start of a string
  • $ โ€” Matches the end of a string
  • \d โ€” Matches any digit (equivalent to [0-9])
  • \w โ€” Matches any word character ([a-zA-Z0-9_])
  • \s โ€” Matches any whitespace character
  • \b โ€” Matches a word boundary
// JavaScript examples
"abc123".match(/\d+/);     // ["123"]
"hello world".match(/^\w+/); // ["hello"]
"file.txt".match(/.+\.txt$/); // ["file.txt"]

Quantifiers

Quantifiers specify how many times a character or group should repeat:

  • * โ€” Zero or more times
  • + โ€” One or more times
  • ? โ€” Zero or one time (optional)
  • {n} โ€” Exactly n times
  • {n,} โ€” At least n times
  • {n,m} โ€” Between n and m times
// Quantifier examples
/colou?r/        // Matches "color" and "colour"
/\d{3}-\d{4}/    // Matches "555-1234"
/a{2,4}/         // Matches "aa", "aaa", or "aaaa"
/https?:\/\//    // Matches "http://" and "https://"

Greedy vs Lazy

By default, quantifiers are greedy โ€” they match as much text as possible. Adding ? after a quantifier makes it lazy (matching as little as possible):

const html = '<b>bold</b> and <i>italic</i>';

html.match(/<.+>/);   // Greedy: ["<b>bold</b> and <i>italic</i>"]
html.match(/<.+?>/);  // Lazy:   ["<b>"]

Character Classes

Square brackets define a set of characters to match:

/[aeiou]/        // Matches any vowel
/[^aeiou]/       // Matches any non-vowel (^ negates inside [])
/[a-zA-Z]/       // Matches any letter
/[0-9a-fA-F]/    // Matches any hexadecimal digit

Groups & Capturing

Parentheses create groups for capturing and applying quantifiers:

// Capturing groups
const datePattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2026-03-12".match(datePattern);
// match[0] = "2026-03-12" (full match)
// match[1] = "2026" (year)
// match[2] = "03"   (month)
// match[3] = "12"   (day)

// Non-capturing group: (?:...)
/(?:https?|ftp):\/\//  // Groups without capturing

// Named groups (ES2018+)
const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const { groups } = "2026-03-12".match(re);
console.log(groups.year); // "2026"

Lookaheads & Lookbehinds

Lookaround assertions match a position without consuming characters:

  • (?=...) โ€” Positive lookahead: followed by ...
  • (?!...) โ€” Negative lookahead: NOT followed by ...
  • (?<=...) โ€” Positive lookbehind: preceded by ...
  • (?<!...) โ€” Negative lookbehind: NOT preceded by ...
// Match a number followed by "px"
/\d+(?=px)/.exec("12px 3em"); // ["12"]

// Match a digit NOT followed by a letter
/\d(?![a-z])/g.exec("1a 2 3b"); // Matches "2"

// Match "$" preceded amount (lookbehind)
/(?<=\$)\d+/.exec("Price: $42"); // ["42"]

Flags / Modifiers

Flags change how the regex engine processes the pattern:

  • g โ€” Global: find all matches, not just the first
  • i โ€” Case-insensitive matching
  • m โ€” Multi-line: ^/$ match line starts/ends
  • s โ€” Dotall: . matches newlines too
  • u โ€” Unicode: proper Unicode support
/hello/i.test("Hello World"); // true
"aAbBcC".match(/[a-z]/g);    // ["a", "b", "c"]
"line1\nline2".match(/^\w+/gm); // ["line1", "line2"]

Real-World Patterns

Email Validation (Simplified)

/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

URL Matching

/https?:\/\/[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()*+,;=]*/

IPv4 Address

/^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/

Password Strength

// At least 8 chars, one uppercase, one lowercase, one digit, one special
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/
โš ๏ธ Regex and security: Never rely solely on client-side regex for input validation. Always validate on the server as well. Also beware of ReDoS (Regular Expression Denial of Service) โ€” avoid nested quantifiers like (a+)+ that can cause catastrophic backtracking.

Regex in Python

import re

# Search for a pattern
result = re.search(r'\d{4}-\d{2}-\d{2}', 'Date: 2026-03-12')
print(result.group())  # "2026-03-12"

# Find all matches
emails = re.findall(r'[\w.+-]+@[\w-]+\.[\w.]+', text)

# Replace with regex
cleaned = re.sub(r'\s+', ' ', '  too   many   spaces  ')
print(cleaned)  # "too many spaces"

# Compile for reuse
pattern = re.compile(r'^[A-Z]{2}\d{4}$')
pattern.match("AB1234")  # Match object
๐Ÿ’ก Tip: Use raw strings (r"...") in Python regex to avoid double-escaping backslashes.

๐Ÿ› ๏ธ Test Your Regex Patterns Online

Open Regex Tool โ†’

Further Reading