Regex Cheat Sheet

Regular expressions are a universal pattern-matching syntax built into every major programming language, text editor, and command-line tool. This reference sheet is designed to be scanned, not read — keep it open next to your editor.

Character Classes & Metacharacters

A character class matches exactly one character from a defined set. Metacharacters are symbols with special meaning inside a pattern.

Pattern	Matches	Notes
`abc`	Literal “abc” in sequence	Case-sensitive by default
`.`	Any character except newline	Matches newline with `re.DOTALL` / `/s` flag
`\d`	Any digit `[0-9]`	Unicode digits included with `re.UNICODE`
`\D`	Any non-digit	Inverse of `\d`
`\w`	Word character `[a-zA-Z0-9_]`	`re.UNICODE` expands to Unicode word chars
`\W`	Non-word character	Inverse of `\w`
`\s`	Whitespace: space, tab, `\n`, `\r`, `\f`
`\S`	Non-whitespace	Inverse of `\s`
`[abc]`	One of: a, b, or c	Most metacharacters lose their meaning inside `[…]`
`[^abc]`	Any character except a, b, or c	Negated class
`[a-z]`	Any lowercase letter	Range syntax
`[A-Z]`	Any uppercase letter
`[0-9]`	Any digit (same as `\d`)
`[a-zA-Z0-9_]`	Any word character (same as `\w`)
`\t`	Tab character
`\n`	Newline character
`\r`	Carriage return
`\\`	Literal backslash	Must also escape in non-raw strings

import re

re.findall(r'\d+', 'Order 42, item 7')
## => ['42', '7']

re.findall(r'\w+', 'hello_world 123')
## => ['hello_world', '123']

re.findall(r'[aeiou]+', 'beautiful')
## => ['eau', 'i', 'u']

re.findall(r'[^a-zA-Z0-9]+', 'hello, world! 42')
## => [', ', '! ']

Anchors & Boundaries

Anchors assert a position in the string — they consume no characters.

Pattern	Position Matched	Notes
`^`	Start of string	Start of each line with `re.MULTILINE` / `m` flag
`$`	End of string	End of each line with `re.MULTILINE` / `m` flag
`\b`	Word boundary	Between `\w` and `\W`, or at string edge
`\B`	Non-word boundary	Inside a continuous run of word characters
`\A`	Absolute start of string	Unaffected by `re.MULTILINE`
`\Z`	Absolute end of string	Unaffected by `re.MULTILINE`

text = "apple\nbanana\napricot"

re.findall(r'^a\w+', text, re.MULTILINE)
## => ['apple', 'apricot']

re.findall(r'\bcat\b', 'the cat in scatter')
## => ['cat']  — 'cat' inside 'scatter' is not matched

re.findall(r'\Aapple', text, re.MULTILINE)
## => ['apple']  — only the absolute start of the string

bool(re.fullmatch(r'\d{5}', '90210'))   ## => True
bool(re.fullmatch(r'\d{5}', 'ABC'))     ## => False

Quantifiers: Greedy and Lazy

Quantifiers control how many times the preceding element repeats. Greedy quantifiers consume as much input as possible; lazy (non-greedy) quantifiers consume as little as possible.

Quantifier	Meaning	Mode
`*`	0 or more	Greedy
`+`	1 or more	Greedy
`?`	0 or 1	Greedy
`{n}`	Exactly n times	Greedy
`{n,}`	n or more times	Greedy
`{n,m}`	Between n and m times (inclusive)	Greedy
`*?`	0 or more	Lazy
`+?`	1 or more	Lazy
`??`	0 or 1	Lazy
`{n,m}?`	Between n and m	Lazy

text = '<b>bold</b> and <i>italic</i>'

re.findall(r'<.+>', text)
## => ['<b>bold</b> and <i>italic</i>']   — greedy, one giant match

re.findall(r'<.+?>', text)
## => ['<b>', '</b>', '<i>', '</i>']   — lazy, each tag separately

re.findall(r'\b\d{2,4}\b', 'a 5 b 12 c 1234 d 99999')
## => ['12', '1234']

re.findall(r'https?://\S+', 'Visit http://a.com or https://b.com')
## => ['http://a.com', 'https://b.com']

Groups, Capturing & Backreferences

Groups let you apply quantifiers to multi-character sequences, capture submatches, and refer back to earlier matches in the same pattern or in a replacement string.

Pattern	Description
`(abc)`	Capturing group — saves the matched text
`(?:abc)`	Non-capturing group — groups without saving
`(?P<name>abc)`	Named capturing group — Python/PCRE syntax
`(?<name>abc)`	Named capturing group — JavaScript ES2018/PCRE2
`a\|b`	Alternation — match “a” or “b”
`\1`	Backreference to group 1 (by number)
`\g<1>`	Backreference to group 1 in `re.sub` replacement
`(?P=name)`	Named backreference — Python
`\k<name>`	Named backreference — JavaScript/PCRE2

pattern = r'(?P<protocol>https?)://(?P<domain>[^/]+)(?P<path>/[^\s]*)?'
m = re.match(pattern, 'https://devnook.dev/guides/')
m.group('protocol')  ## 'https'
m.group('domain')    ## 'devnook.dev'
m.group('path')      ## '/guides/'

re.findall(r'\b(\w+)\s+\1\b', 'the the quick brown fox fox')
## => ['the', 'fox']   — repeated words

re.findall(r'colo(?:u|)r', 'colour and color')
## => ['colour', 'color']   — alternation in non-capturing group

re.sub(r'(\w+)\s+(\w+)', r'\2 \1', 'John Smith')
## => 'Smith John'   — swap groups in replacement

Lookaheads & Lookbehinds

Lookarounds are zero-width assertions — they check what surrounds the current position without consuming any characters.

Pattern	Type	What it asserts
`(?=abc)`	Positive lookahead	Current position is followed by “abc”
`(?!abc)`	Negative lookahead	Current position is NOT followed by “abc”
`(?<=abc)`	Positive lookbehind	Current position is preceded by “abc”
`(?<!abc)`	Negative lookbehind	Current position is NOT preceded by “abc”

re.findall(r'\d+(?=px)', '12px 5em 100px 3rem')
## => ['12', '100']   — digits followed by 'px', 'px' not captured

re.findall(r'new(?!line)', 'newline and new feature')
## => ['new']   — 'new' not followed by 'line'

re.findall(r'(?<=name=)\w+', 'id=42 name=alice role=admin')
## => ['alice']

re.findall(r'(?<!no )error', 'no error here; another error exists')
## => ['error']   — only the second occurrence

re.findall(r'(?<=\$)\d+(?=\s*USD)', '$100 USD and $50 EUR')
## => ['100']   — combined lookahead + lookbehind

Regex Flags

Flags (also called modifiers) alter how the engine interprets the entire pattern. Combine multiple flags with | in Python, or use inline (?imsx) syntax to embed them inside the pattern itself.

Flag	Python constant	Python inline	JavaScript	What it changes
Case-insensitive	`re.IGNORECASE` / `re.I`	`(?i)`	`i`	Letters match any case
Multiline	`re.MULTILINE` / `re.M`	`(?m)`	`m`	`^`/`$` match line starts and ends
Dot-all	`re.DOTALL` / `re.S`	`(?s)`	`s`	`.` matches `\n` too
Verbose	`re.VERBOSE` / `re.X`	`(?x)`	—	Whitespace and `#` comments ignored
Global	—	—	`g`	Find all matches, not just the first
Unicode	`re.UNICODE` / `re.U`	`(?u)`	`u`	Full Unicode property matching
ASCII	`re.ASCII` / `re.A`	`(?a)`	—	Force `\w`, `\d`, `\s` to ASCII-only
Sticky	—	—	`y`	Match only at `lastIndex` position

re.findall(r'(?i)hello', 'Hello HELLO hello')
## => ['Hello', 'HELLO', 'hello']   — inline case-insensitive flag

text = "Error: disk full\nerror: timeout\nWARN: retrying"
re.findall(r'^error:.+$', text, re.IGNORECASE | re.MULTILINE)
## => ['Error: disk full', 'error: timeout']

date_re = re.compile(r"""
    (?P<year>\d{4})            (?# 4-digit year)
    -
    (?P<month>0[1-9]|1[0-2])  (?# month 01-12)
    -
    (?P<day>0[1-9]|[12]\d|3[01])  (?# day 01-31)
""", re.VERBOSE)

date_re.match('2026-06-09').groupdict()
## => {'year': '2026', 'month': '06', 'day': '09'}

'Hello HELLO hello'.match(/hello/gi);
// => ['Hello', 'HELLO', 'hello']

const { groups: { year, month, day } } =
  '2026-06-09'.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
// year='2026', month='06', day='09'

const sticky = /\d+/y;
sticky.lastIndex = 7;
sticky.exec('Order: 42');
// => ['42']  — matched exactly at index 7

Common Regex Patterns

Production-ready patterns for frequent validation and extraction tasks. Paste them into the Java Regex Tester — Free Online Tool to verify matches before shipping.

Use Case	Pattern
Email address	`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`
URL (HTTP/HTTPS)	`https?://[^\s/$.?#].[^\s]*`
IPv4 address	`\b(?:\d{1,3}\.){3}\d{1,3}\b`
IPv6 (simplified)	`(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}`
Date YYYY-MM-DD	`\d{4}-(?:0[1-9]\|1[0-2])-(?:0[1-9]\|[12]\d\|3[01])`
US phone number	`$?\d{3}$?[-.\s]\d{3}[-.\s]\d{4}`
Hex colour	`#(?:[0-9a-fA-F]{3}){1,2}\b`
Slug (URL-safe)	`^[a-z0-9]+(?:-[a-z0-9]+)*$`
Positive integer	`^[1-9]\d*$`
HTML tag (basic)	`<([a-z][a-z0-9])\b[^>]>.*?</\1>`
UUID v4	`[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}`
Semantic version	`\bv?\d+\.\d+\.\d+\b`
JWT token	`^[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+$`
Whitespace-only string	`^\s*$`
C-style block comment	`/\[\s\S]?\*/`

import re

email_re = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
email_re.match('user@example.com')  ## => Match object
email_re.match('not-an-email')      ## => None

css = "color: #fff; background: #1a2b3c; border: 1px solid #aabbcc;"
re.findall(r'#(?:[0-9a-fA-F]{3}){1,2}\b', css)
## => ['#fff', '#1a2b3c', '#aabbcc']

changelog = "Released v1.2.3; fixed bug from v1.2.1; target is v2.0.0"
re.findall(r'\bv?\d+\.\d+\.\d+\b', changelog)
## => ['v1.2.3', 'v1.2.1', 'v2.0.0']

Regex in Python: re Module Reference

The Python re module is the standard library’s full regex API. Always use raw strings (r'…') for patterns to avoid double-escaping backslashes.

Function / Method	What it returns	Notes
`re.search(pat, s)`	First match anywhere	Returns `None` if no match
`re.match(pat, s)`	Match at start of string only	Does not scan the whole string
`re.fullmatch(pat, s)`	Match spanning entire string	Strictest option for validation
`re.findall(pat, s)`	List of all non-overlapping matches	Returns list of strings or tuples
`re.finditer(pat, s)`	Iterator of Match objects	Use when you need `.start()` / `.end()`
`re.sub(pat, repl, s)`	String with substitutions	`repl` can be a string or callable
`re.subn(pat, repl, s)`	`(new_string, count)` tuple	Count = number of replacements made
`re.split(pat, s)`	List of substrings	Capturing groups appear in the result
`re.compile(pat, flags)`	Compiled Pattern object	Reuse for performance-critical loops
`m.group(n)`	Captured group n (0 = full match)	`None` if group did not participate
`m.groups()`	All captured groups as tuple
`m.groupdict()`	Named groups as `{name: value}` dict
`m.start()` / `m.end()`	Start / end position in string	Integer index
`m.span()`	`(start, end)` tuple	Equivalent to `(m.start(), m.end())`

import re

text = "2026-06-09: Released version 2.1.4"

for m in re.finditer(r'\d+', text):
    print(f"'{m.group()}' at {m.span()}")
## '2026' at (0, 4)
## '06' at (5, 7)  ... etc.

def bump(m):
    return str(int(m.group()) + 1)

re.sub(r'\b\d+\b', bump, 'a=1 b=2 c=3')
## => 'a=2 b=3 c=4'

re.split(r'[,;\s]+', 'one, two;three  four')
## => ['one', 'two', 'three', 'four']

slug_re = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
[s for s in ['hello-world', 'Bad Slug', 'ok-123'] if slug_re.match(s)]
## => ['hello-world', 'ok-123']

For more Python string operations that complement regex, see Python String Methods Cheat Sheet: split, join, replace & More.

Regex in JavaScript

JavaScript regex uses literal syntax /pattern/flags or new RegExp('pattern', 'flags'). The MDN Regular Expressions guide covers every detail of the spec.

Method	Called on	What it does
`str.match(re)`	String	First match (or all with `/g`), returns array or `null`
`str.matchAll(re)`	String	Iterator of all Match objects — requires `/g` flag
`str.search(re)`	String	Index of first match, or `-1`
`str.replace(re, sub)`	String	Replaces first match (or all with `/g`)
`str.replaceAll(re, sub)`	String	Replaces all matches — requires `/g` or a string
`str.split(re)`	String	Splits on each match, returns array
`re.test(str)`	RegExp	`true` if the pattern matches anywhere
`re.exec(str)`	RegExp	Next match object (stateful with `/g` or `/y`)

// Named groups (ES2018+) with destructuring
const dateRe = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const { groups: { year, month, day } } = '2026-06-09'.match(dateRe);
// year='2026', month='06', day='09'

// replaceAll with a transform function (requires /g)
const result = 'a=1, b=2, c=3'.replace(/(\w+)=(\d+)/g, (_, k, v) => `${k}=${+v * 10}`);
// => 'a=10, b=20, c=30'

// matchAll: iterate all matches and extract groups
const str = 'cat bat hat mat';
const matches = [...str.matchAll(/(?<word>[cbhm]at)/g)];
matches.map(m => m.groups.word);
// => ['cat', 'bat', 'hat', 'mat']

// test() for fast boolean validation
const emailRe = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
emailRe.test('user@example.com');  // true
emailRe.test('not-an-email');      // false

For a deeper look at JavaScript’s native regex capabilities, see What is Regex Pattern Checking in JavaScript?. If you use regex in shell scripts or with CLI tools like grep, sed, and awk, the Linux Commands Cheat Sheet has the flags and syntax for POSIX and extended regex modes.

Escaping Special Characters

These characters carry special meaning in regex syntax and must be escaped with a backslash when you want to match them literally.

Character	Escaped Form	Normal Meaning in Regex
`.`	`\.`	Match any character
`*`	`\*`	0-or-more quantifier
`+`	`\+`	1-or-more quantifier
`?`	`\?`	0-or-1 quantifier / lazy modifier
`(`	`\(`	Open capturing group
`)`	`\)`	Close capturing group
`[`	`\[`	Open character class
`]`	`\]`	Close character class
`{`	`\{`	Open repetition count
`}`	`\}`	Close repetition count
`^`	`\^`	Start anchor / negation inside `[…]`
`$`	`\$`	End anchor
`\|`	`\\\|`	Alternation operator
`\\`	`\\\\`	Backslash itself
`/`	`\/`	Pattern delimiter in JavaScript literals

import re

re.escape('3.14 * x^2')
## => '3\\.14\\ \\*\\ x\\^2'

pattern = re.compile(re.escape('3.14 * x^2'))
pattern.search("result: 3.14 * x^2 + 1")
## => Match object — safe to use with user-supplied text

def is_valid_regex(s):
    try:
        re.compile(s)
        return True
    except re.error:
        return False

is_valid_regex(r'\d+')       ## => True
is_valid_regex(r'[unclosed') ## => False

Regex Cheat Sheet

Character Classes & Metacharacters

Anchors & Boundaries

Quantifiers: Greedy and Lazy

Groups, Capturing & Backreferences

Lookaheads & Lookbehinds

Regex Flags

Common Regex Patterns

Regex in Python: re Module Reference

Regex in JavaScript

Escaping Special Characters

Related Posts