Regular Expressions

Why This Matters - Pattern Matching as a Production Superpower

💡 Real-world Context: Regular expressions are the unsung heroes of production systems. From validating user input in registration forms to parsing log files for security monitoring, from routing URLs in web frameworks to extracting data from text streams - pattern matching silently powers countless critical operations that keep applications running smoothly.

⚠️ Production Reality: Poor regex usage causes real system failures:

  • Security Vulnerabilities: Catastrophic backtracking allowing DoS attacks (ReDoS)
  • Performance Disasters: Regular expressions that take exponential time on certain inputs
  • Data Loss: Malformed patterns silently dropping valid data during extraction
  • Maintenance Hell: Complex patterns that no one can understand or modify
  • False Positives/Negatives: Validation patterns that block legitimate users or allow malicious input
  • Integration Failures: Incompatible regex flavors breaking when migrating between systems

Go's RE2-based regexp package provides linear-time guarantees, preventing catastrophic backtracking and making regex safe for production use.

Learning Objectives

By the end of this article, you will:

  • Master regex syntax with production-ready patterns for common use cases
  • Implement efficient compilation and caching strategies for high-performance systems
  • Build validation systems with clear error messages and edge case handling
  • Create extractors for structured data parsing
  • Understand Go's regex limitations and when to use alternatives
  • Develop security-aware pattern matching that prevents ReDoS attacks
  • Build maintainable regex with documentation and testing strategies
  • Integrate regex with real-world applications efficiently

Core Concepts - Understanding the Pattern Language

The Foundation: Literals, Wildcards, and Quantifiers

Regular expressions are a declarative language for describing text patterns. Think of them as blueprints for text matching rather than imperative code that searches character by character.

 1package main
 2
 3import (
 4    "fmt"
 5    "regexp"
 6)
 7
 8func main() {
 9    // Literals match exact characters
10    literalRe := regexp.MustCompile(`hello`)
11    fmt.Printf("Literal 'hello' in 'hello world': %v\n", literalRe.MatchString("hello world"))
12    fmt.Printf("Literal 'hello' in 'Hello world': %v\n", literalRe.MatchString("Hello world"))
13
14    // Wildcard . matches any single character except newline
15    wildcardRe := regexp.MustCompile(`h.llo`)
16    fmt.Printf("\nWildcard 'h.llo' in 'hallo': %v\n", wildcardRe.MatchString("hallo"))
17    fmt.Printf("Wildcard 'h.llo' in 'h?llo': %v\n", wildcardRe.MatchString("h?llo"))
18    fmt.Printf("Wildcard 'h.llo' in 'hello': %v\n", wildcardRe.MatchString("hello"))
19
20    // Quantifiers specify repetition
21    // * = zero or more, + = one or more, ? = zero or one, {n,m} = between n and m
22    quantifierRe := regexp.MustCompile(`colou?r`)
23    fmt.Printf("\nQuantifier 'colou?r' matches 'color': %v\n", quantifierRe.MatchString("color"))
24    fmt.Printf("Quantifier 'colou?r' matches 'colour': %v\n", quantifierRe.MatchString("colour"))
25    fmt.Printf("Quantifier 'colou?r' matches 'colouur': %v\n", quantifierRe.MatchString("colouur"))
26
27    // Plus quantifier (one or more)
28    plusRe := regexp.MustCompile(`go+gle`)
29    fmt.Printf("\nPlus 'go+gle' matches 'gogle': %v\n", plusRe.MatchString("gogle"))
30    fmt.Printf("Plus 'go+gle' matches 'google': %v\n", plusRe.MatchString("google"))
31    fmt.Printf("Plus 'go+gle' matches 'gooogle': %v\n", plusRe.MatchString("gooogle"))
32
33    // Star quantifier (zero or more)
34    starRe := regexp.MustCompile(`go*gle`)
35    fmt.Printf("\nStar 'go*gle' matches 'ggle': %v\n", starRe.MatchString("ggle"))
36    fmt.Printf("Star 'go*gle' matches 'google': %v\n", starRe.MatchString("google"))
37
38    // Specific repetition counts
39    countRe := regexp.MustCompile(`\d{3}-\d{4}`)
40    fmt.Printf("\nCount pattern '\\d{3}-\\d{4}' matches '555-1234': %v\n", countRe.MatchString("555-1234"))
41    fmt.Printf("Count pattern '\\d{3}-\\d{4}' matches '55-1234': %v\n", countRe.MatchString("55-1234"))
42
43    // Character classes match sets of characters
44    classRe := regexp.MustCompile(`[aeiou]`)
45    fmt.Printf("\nVowel class in 'hello': %v\n", classRe.MatchString("hello"))
46    fmt.Printf("Vowel class in 'rhythm': %v\n", classRe.MatchString("rhythm"))
47
48    // Negated character classes
49    negatedRe := regexp.MustCompile(`[^aeiou]`)
50    fmt.Printf("\nNon-vowel class in 'hello': %v\n", negatedRe.MatchString("hello"))
51}
52// run

💡 Key Insight: Regular expressions are greedy by default - they match as much as possible while still allowing the overall pattern to match. This can cause unexpected behavior if not carefully controlled.

Character Classes and Predefined Sets

Character classes are one of the most powerful features in regex, allowing you to match specific sets of characters efficiently.

 1package main
 2
 3import (
 4    "fmt"
 5    "regexp"
 6)
 7
 8func demonstrateCharacterClasses() {
 9    fmt.Println("=== Predefined Character Classes ===")
10
11    // \d = digits [0-9]
12    digitRe := regexp.MustCompile(`\d+`)
13    fmt.Printf("Digits in 'abc123def456': %v\n", digitRe.FindAllString("abc123def456", -1))
14
15    // \D = non-digits [^0-9]
16    nonDigitRe := regexp.MustCompile(`\D+`)
17    fmt.Printf("Non-digits in 'abc123def456': %v\n", nonDigitRe.FindAllString("abc123def456", -1))
18
19    // \w = word characters [A-Za-z0-9_]
20    wordRe := regexp.MustCompile(`\w+`)
21    fmt.Printf("Words in 'hello_world 123!': %v\n", wordRe.FindAllString("hello_world 123!", -1))
22
23    // \W = non-word characters [^A-Za-z0-9_]
24    nonWordRe := regexp.MustCompile(`\W+`)
25    fmt.Printf("Non-words in 'hello_world 123!': %v\n", nonWordRe.FindAllString("hello_world 123!", -1))
26
27    // \s = whitespace [ \t\n\r\f]
28    spaceRe := regexp.MustCompile(`\s+`)
29    fmt.Printf("Whitespace in 'hello   world\\n': %v\n", spaceRe.FindAllString("hello   world\n", -1))
30
31    // \S = non-whitespace [^ \t\n\r\f]
32    nonSpaceRe := regexp.MustCompile(`\S+`)
33    fmt.Printf("Non-whitespace in 'hello   world': %v\n", nonSpaceRe.FindAllString("hello   world", -1))
34
35    fmt.Println("\n=== Custom Character Classes ===")
36
37    // Range-based classes
38    lowerRe := regexp.MustCompile(`[a-z]+`)
39    fmt.Printf("Lowercase in 'Hello World': %v\n", lowerRe.FindAllString("Hello World", -1))
40
41    upperRe := regexp.MustCompile(`[A-Z]+`)
42    fmt.Printf("Uppercase in 'Hello World': %v\n", upperRe.FindAllString("Hello World", -1))
43
44    // Multiple ranges
45    alphanumericRe := regexp.MustCompile(`[A-Za-z0-9]+`)
46    fmt.Printf("Alphanumeric in 'Test123!': %v\n", alphanumericRe.FindAllString("Test123!", -1))
47
48    // Negated classes
49    notVowelRe := regexp.MustCompile(`[^aeiouAEIOU]`)
50    fmt.Printf("First non-vowel in 'apple': %s\n", notVowelRe.FindString("apple"))
51
52    // Special characters in classes (need escaping)
53    specialRe := regexp.MustCompile(`[\[\]{}()]`)
54    fmt.Printf("Brackets in 'test[123]{456}': %v\n", specialRe.FindAllString("test[123]{456}", -1))
55}
56
57func main() {
58    demonstrateCharacterClasses()
59}
60// run

💡 Production Pattern: Use character classes instead of alternation when possible. [abc] is more efficient than (a|b|c).

Anchors and Word Boundaries

Anchors don't match characters - they match positions in the string. This is crucial for precise pattern matching.

🎯 Production Pattern: Use anchors to prevent partial matches that could cause security issues.

 1package main
 2
 3import (
 4    "fmt"
 5    "regexp"
 6)
 7
 8func demonstrateAnchors() {
 9    fmt.Println("=== Start and End Anchors ===")
10
11    // Without anchors - matches anywhere
12    partialRe := regexp.MustCompile(`admin`)
13    text := "useradmin123"
14    fmt.Printf("Partial match '%s' in '%s': %v\n", "admin", text, partialRe.MatchString(text))
15
16    // Start anchor ^ - matches at beginning
17    startRe := regexp.MustCompile(`^admin`)
18    fmt.Printf("Start anchor '^admin' in '%s': %v\n", text, startRe.MatchString(text))
19    fmt.Printf("Start anchor '^admin' in 'admin123': %v\n", startRe.MatchString("admin123"))
20
21    // End anchor $ - matches at end
22    endRe := regexp.MustCompile(`admin$`)
23    fmt.Printf("End anchor 'admin$' in '%s': %v\n", text, endRe.MatchString(text))
24    fmt.Printf("End anchor 'admin$' in 'useradmin': %v\n", endRe.MatchString("useradmin"))
25
26    // Both anchors - exact match only
27    exactRe := regexp.MustCompile(`^admin$`)
28    fmt.Printf("Exact match '^admin$' in '%s': %v\n", text, exactRe.MatchString(text))
29    fmt.Printf("Exact match '^admin$' in 'admin': %v\n", exactRe.MatchString("admin"))
30
31    fmt.Println("\n=== Word Boundaries ===")
32
33    // Word boundaries - matches whole words only
34    wordRe := regexp.MustCompile(`\badmin\b`)
35    text2 := "admin user and superadmin"
36    matches := wordRe.FindAllString(text2, -1)
37    fmt.Printf("Word boundary '\\badmin\\b' in '%s': %v\n", text2, matches)
38
39    // Extract whole words
40    wholeWordRe := regexp.MustCompile(`\b\w+\b`)
41    text3 := "hello, world! how are you?"
42    words := wholeWordRe.FindAllString(text3, -1)
43    fmt.Printf("All words in '%s': %v\n", text3, words)
44
45    fmt.Println("\n=== Email Validation with Anchors ===")
46
47    // Email validation with word boundaries
48    emailRe := regexp.MustCompile(`\b[\w.%+-]+@[\w.-]+\.[a-z]{2,}\b`)
49    text4 := "Contact user@example.com or admin@test.org for help"
50    emails := emailRe.FindAllString(text4, -1)
51    fmt.Printf("Emails found in '%s': %v\n", text4, emails)
52
53    // Without word boundaries (can match partial emails)
54    badEmailRe := regexp.MustCompile(`[\w.%+-]+@[\w.-]+\.[a-z]{2,}`)
55    text5 := "email:user@example.com,admin@test.org"
56    badEmails := badEmailRe.FindAllString(text5, -1)
57    fmt.Printf("Partial email matches: %v\n", badEmails)
58}
59
60func main() {
61    demonstrateAnchors()
62}
63// run

⚠️ Security Warning: Always use anchors or word boundaries for validation to prevent partial matches that could bypass security checks. For example, ^admin$ ensures exact match, while admin would match "administrator", "admins", "useradmin", etc.

Capture Groups and Extraction

Capture groups are parentheses that not only group patterns but also capture the matched text for extraction.

 1package main
 2
 3import (
 4    "fmt"
 5    "regexp"
 6)
 7
 8func demonstrateCaptureGroups() {
 9    fmt.Println("=== Basic Capture Groups ===")
10
11    // Simple capture
12    dateRe := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
13    date := "Today is 2024-03-15"
14    matches := dateRe.FindStringSubmatch(date)
15    fmt.Printf("Full match: %s\n", matches[0])
16    fmt.Printf("Year: %s, Month: %s, Day: %s\n", matches[1], matches[2], matches[3])
17
18    fmt.Println("\n=== Named Capture Groups ===")
19
20    // Named groups for clarity
21    namedDateRe := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
22    namedMatches := namedDateRe.FindStringSubmatch(date)
23    names := namedDateRe.SubexpNames()
24
25    result := make(map[string]string)
26    for i, name := range names {
27        if i > 0 && name != "" {
28            result[name] = namedMatches[i]
29        }
30    }
31
32    fmt.Printf("Named captures: %v\n", result)
33
34    fmt.Println("\n=== Non-Capturing Groups ===")
35
36    // Non-capturing groups (?:...) for grouping without capture
37    urlRe := regexp.MustCompile(`^(?:https?://)?([^/]+)(/.*)?$`)
38    url := "https://example.com/path/to/page"
39    urlMatches := urlRe.FindStringSubmatch(url)
40    fmt.Printf("Domain: %s\n", urlMatches[1])
41    if len(urlMatches) > 2 {
42        fmt.Printf("Path: %s\n", urlMatches[2])
43    }
44
45    fmt.Println("\n=== Multiple Matches with Groups ===")
46
47    // Find all emails with user and domain parts
48    emailRe := regexp.MustCompile(`(\w+)@([\w.]+)`)
49    text := "Contact: john@example.com or jane@test.org"
50    allMatches := emailRe.FindAllStringSubmatch(text, -1)
51
52    for i, match := range allMatches {
53        fmt.Printf("Email %d: Full=%s, User=%s, Domain=%s\n",
54            i+1, match[0], match[1], match[2])
55    }
56
57    fmt.Println("\n=== Extracting Structured Data ===")
58
59    // Parse log entries
60    logRe := regexp.MustCompile(`(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<level>\w+)\] (?P<message>.*)`)
61    logLine := "2024-03-15 10:30:45 [ERROR] Database connection failed"
62    logMatches := logRe.FindStringSubmatch(logLine)
63    logNames := logRe.SubexpNames()
64
65    logEntry := make(map[string]string)
66    for i, name := range logNames {
67        if i > 0 && name != "" && i < len(logMatches) {
68            logEntry[name] = logMatches[i]
69        }
70    }
71
72    fmt.Printf("Parsed log entry:\n")
73    fmt.Printf("  Timestamp: %s\n", logEntry["timestamp"])
74    fmt.Printf("  Level: %s\n", logEntry["level"])
75    fmt.Printf("  Message: %s\n", logEntry["message"])
76}
77
78func main() {
79    demonstrateCaptureGroups()
80}
81// run

💡 Production Pattern: Use named capture groups (?P<name>...) for complex patterns to make code maintainable and self-documenting.

Practical Examples - From Patterns to Production Code

Email Validation with Clear Error Messages

🎯 Production Pattern: Create comprehensive validation that provides specific feedback for common mistakes.

  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "strings"
  7)
  8
  9type EmailValidator struct {
 10    pattern *regexp.Regexp
 11}
 12
 13func NewEmailValidator() *EmailValidator {
 14    // Pattern explanation:
 15    // ^                  - Start of string
 16    // [\w.%+-]+          - Local part: word chars, dots, percents, pluses, hyphens
 17    // @                  - Literal @
 18    // [\w.-]+            - Domain: word chars, dots, hyphens
 19    // \.                 - Literal dot before TLD
 20    // [a-z]{2,}         - TLD: 2+ letters
 21    // $                  - End of string
 22    pattern := regexp.MustCompile(`^[\w.%+-]+@[\w.-]+\.[a-z]{2,}$`)
 23
 24    return &EmailValidator{pattern: pattern}
 25}
 26
 27func (ev *EmailValidator) ValidateWithFeedback(email string) (bool, []string) {
 28    var errors []string
 29
 30    // Normalize
 31    email = strings.TrimSpace(strings.ToLower(email))
 32
 33    // Length checks first
 34    if len(email) < 5 {
 35        errors = append(errors, "Email too short (minimum 5 characters)")
 36    }
 37    if len(email) > 254 {
 38        errors = append(errors, "Email too long (maximum 254 characters)")
 39    }
 40
 41    // Character checks
 42    atCount := strings.Count(email, "@")
 43    if atCount == 0 {
 44        errors = append(errors, "Email must contain @ symbol")
 45    } else if atCount > 1 {
 46        errors = append(errors, "Email must contain exactly one @ symbol")
 47    }
 48
 49    // Format validation
 50    if !ev.pattern.MatchString(email) {
 51        errors = append(errors, "Invalid email format")
 52
 53        // Specific format issues
 54        if regexp.MustCompile(`\.\.`).MatchString(email) {
 55            errors = append(errors, "Cannot contain consecutive dots")
 56        }
 57        if regexp.MustCompile(`^\.`).MatchString(email) {
 58            errors = append(errors, "Cannot start with a dot")
 59        }
 60        if regexp.MustCompile(`\.$`).MatchString(email) {
 61            errors = append(errors, "Cannot end with a dot")
 62        }
 63        if regexp.MustCompile(`@\.`).MatchString(email) || regexp.MustCompile(`\.@`).MatchString(email) {
 64            errors = append(errors, "Dot cannot be adjacent to @ symbol")
 65        }
 66
 67        // Check for spaces
 68        if strings.Contains(email, " ") {
 69            errors = append(errors, "Email cannot contain spaces")
 70        }
 71
 72        // Check for invalid characters
 73        if regexp.MustCompile(`[^A-Za-z0-9._%+\-@]`).MatchString(email) {
 74            errors = append(errors, "Email contains invalid characters")
 75        }
 76    }
 77
 78    // Domain-specific checks
 79    if strings.Contains(email, "@") {
 80        parts := strings.Split(email, "@")
 81        domain := parts[1]
 82
 83        // Check for valid TLD
 84        if !regexp.MustCompile(`\.[a-z]{2,}$`).MatchString(domain) {
 85            errors = append(errors, "Domain must have a valid top-level domain (e.g., .com, .org)")
 86        }
 87
 88        // Check local part length
 89        localPart := parts[0]
 90        if len(localPart) > 64 {
 91            errors = append(errors, "Local part (before @) exceeds 64 characters")
 92        }
 93        if len(localPart) == 0 {
 94            errors = append(errors, "Local part (before @) cannot be empty")
 95        }
 96    }
 97
 98    return len(errors) == 0, errors
 99}
100
101func main() {
102    validator := NewEmailValidator()
103
104    testEmails := []string{
105        "valid@example.com",
106        "user.name+tag@example.co.uk",
107        "invalid",                   // Missing @ and domain
108        "user@",                    // Missing domain
109        "@example.com",             // Missing local part
110        "user..name@example.com",   // Consecutive dots
111        "user@.example.com",        // Starts with dot
112        ".user@example.com",        // Local part starts with dot
113        "user@example",             // Missing TLD
114        "user name@example.com",    // Contains space
115        "user@@example.com",        // Multiple @ symbols
116        "a@b.c",                   // Too short domain
117        strings.Repeat("a", 65) + "@example.com", // Local part too long
118    }
119
120    for _, email := range testEmails {
121        fmt.Printf("\nEmail: %s\n", email)
122        valid, errors := validator.ValidateWithFeedback(email)
123        if valid {
124            fmt.Println("  ✓ Valid")
125        } else {
126            fmt.Println("  ✗ Invalid:")
127            for _, err := range errors {
128                fmt.Printf("    - %s\n", err)
129            }
130        }
131    }
132}
133// run

Log File Parser with Multiple Formats

🎯 Production Pattern: Build flexible parsers that handle multiple log formats without breaking on unknown patterns.

  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "strings"
  7    "time"
  8)
  9
 10type LogEntry struct {
 11    Timestamp time.Time
 12    Level     string
 13    Message   string
 14    Component string
 15    UserID    string
 16    RequestID string
 17}
 18
 19type LogParser struct {
 20    patterns []*regexp.Regexp
 21}
 22
 23func NewLogParser() *LogParser {
 24    // Multiple log formats commonly found in production
 25    patterns := []string{
 26        // Apache/Nginx style: IP - - [timestamp] "method path" status size
 27        `^(?P<ip>[\d\.]+) - - \[(?P<timestamp>[^\]]+)\] "(?P<method>\w+) (?P<path>[^"]+)" (?P<status>\d+) (?P<size>\d+)$`,
 28
 29        // JSON style: {"timestamp":"...","level":"...","message":"..."}
 30        `^{.*?"timestamp":"(?P<timestamp>[^"]+)".*?"level":"(?P<level>[^"]+)".*?"message":"(?P<message>[^"]+)".*}$`,
 31
 32        // Structured style: timestamp [level] component: message
 33        `^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<level>\w+)\] (?P<component>\w+): (?P<message>.*)$`,
 34
 35        // Application style: timestamp|level|user|request_id|message
 36        `^(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[^|]*)\|(?P<level>\w+)\|(?P<user>[^|]*)\|(?P<request_id>[^|]*)\|(?P<message>.*)$`,
 37
 38        // Syslog style: timestamp hostname service[pid]: message
 39        `^(?P<timestamp>\w{3}\s+\d{1,2} \d{2}:\d{2}:\d{2}) (?P<hostname>\S+) (?P<service>\w+)\[(?P<pid>\d+)\]: (?P<message>.*)$`,
 40    }
 41
 42    compiled := make([]*regexp.Regexp, len(patterns))
 43    for i, pattern := range patterns {
 44        compiled[i] = regexp.MustCompile(pattern)
 45    }
 46
 47    return &LogParser{patterns: compiled}
 48}
 49
 50func (lp *LogParser) Parse(line string) (*LogEntry, error) {
 51    for _, re := range lp.patterns {
 52        matches := re.FindStringSubmatch(line)
 53        if matches == nil {
 54            continue
 55        }
 56
 57        entry := &LogEntry{}
 58        names := re.SubexpNames()
 59
 60        for i, name := range names {
 61            if i == 0 || name == "" {
 62                continue
 63            }
 64
 65            value := matches[i]
 66            switch name {
 67            case "timestamp":
 68                // Try different timestamp formats
 69                formats := []string{
 70                    "2006-01-02 15:04:05",
 71                    "2006-01-02T15:04:05Z07:00",
 72                    time.RFC3339,
 73                    "02/Jan/2006:15:04:05 -0700",
 74                    "Jan  2 15:04:05",
 75                }
 76                for _, format := range formats {
 77                    if t, err := time.Parse(format, value); err == nil {
 78                        entry.Timestamp = t
 79                        break
 80                    }
 81                }
 82                if entry.Timestamp.IsZero() {
 83                    entry.Timestamp = time.Now() // Fallback
 84                }
 85            case "level":
 86                entry.Level = strings.ToUpper(value)
 87            case "message":
 88                entry.Message = value
 89            case "component", "service":
 90                entry.Component = value
 91            case "user":
 92                entry.UserID = value
 93            case "request_id":
 94                entry.RequestID = value
 95            }
 96        }
 97
 98        return entry, nil
 99    }
100
101    // No pattern matched - create basic entry
102    return &LogEntry{
103        Timestamp: time.Now(),
104        Level:     "UNKNOWN",
105        Message:   line,
106    }, fmt.Errorf("no pattern matched")
107}
108
109func main() {
110    parser := NewLogParser()
111
112    logLines := []string{
113        `192.168.1.1 - - [15/Oct/2024:10:30:45 +0000] "GET /api/users" 200 1234`,
114        `{"timestamp":"2024-10-15T10:30:45Z","level":"INFO","message":"User logged in"}`,
115        `2024-10-15 10:30:45 [INFO] auth: User authentication successful`,
116        `2024-10-15T10:30:45.123Z|INFO|user123|req_456|Payment processed successfully`,
117        `Oct 15 10:30:45 server app[12345]: Database connection established`,
118        `unstructured log line that doesn't match any pattern`,
119    }
120
121    for i, line := range logLines {
122        fmt.Printf("Line %d: %s\n", i+1, line)
123        entry, err := parser.Parse(line)
124        if err != nil {
125            fmt.Printf("  Parse warning: %v\n", err)
126        }
127
128        fmt.Printf("  Timestamp: %s\n", entry.Timestamp.Format("2006-01-02 15:04:05"))
129        fmt.Printf("  Level: %s\n", entry.Level)
130        if entry.Component != "" {
131            fmt.Printf("  Component: %s\n", entry.Component)
132        }
133        if entry.UserID != "" {
134            fmt.Printf("  User ID: %s\n", entry.UserID)
135        }
136        if entry.RequestID != "" {
137            fmt.Printf("  Request ID: %s\n", entry.RequestID)
138        }
139        fmt.Printf("  Message: %s\n", entry.Message)
140        fmt.Println()
141    }
142}
143// run

Security-Focused URL Parser

🎯 Production Pattern: Extract and validate URLs with security considerations to prevent SSRF and injection attacks.

  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "strconv"
  7    "strings"
  8)
  9
 10type URLInfo struct {
 11    Original string
 12    Protocol string
 13    Host     string
 14    Port     string
 15    Path     string
 16    Query    string
 17    Fragment string
 18    IsValid  bool
 19    Warnings []string
 20}
 21
 22type SecureURLParser struct {
 23    allowedProtocols []string
 24    blockedHosts     []string
 25    maxURLLength     int
 26}
 27
 28func NewSecureURLParser() *SecureURLParser {
 29    return &SecureURLParser{
 30        allowedProtocols: []string{"http", "https"},
 31        blockedHosts:     []string{"localhost", "127.0.0.1", "0.0.0.0", "169.254.169.254"}, // Including AWS metadata service
 32        maxURLLength:     2048,
 33    }
 34}
 35
 36func (sup *SecureURLParser) Parse(inputURL string) URLInfo {
 37    info := URLInfo{
 38        Original: inputURL,
 39        IsValid:  true,
 40        Warnings: []string{},
 41    }
 42
 43    // Length check
 44    if len(inputURL) > sup.maxURLLength {
 45        info.IsValid = false
 46        info.Warnings = append(info.Warnings, fmt.Sprintf("URL exceeds maximum length of %d characters", sup.maxURLLength))
 47        return info
 48    }
 49
 50    if len(inputURL) == 0 {
 51        info.IsValid = false
 52        info.Warnings = append(info.Warnings, "URL is empty")
 53        return info
 54    }
 55
 56    // Basic URL pattern with named groups
 57    urlPattern := regexp.MustCompile(`^(?P<protocol>https?)://(?P<host>[^:/\s]+)(?::(?P<port>\d+))?(?P<path>/[^\s?#]*)?(?:\?(?P<query>[^\s#]*))?(?:#(?P<fragment>[\w-]+))?$`)
 58
 59    matches := urlPattern.FindStringSubmatch(inputURL)
 60    if matches == nil {
 61        info.IsValid = false
 62        info.Warnings = append(info.Warnings, "Invalid URL format - must be http:// or https:// with valid structure")
 63        return info
 64    }
 65
 66    names := urlPattern.SubexpNames()
 67    for i, name := range names {
 68        if i == 0 || name == "" {
 69            continue
 70        }
 71
 72        value := matches[i]
 73        switch name {
 74        case "protocol":
 75            info.Protocol = value
 76            // Check allowed protocols
 77            allowed := false
 78            for _, protocol := range sup.allowedProtocols {
 79                if strings.EqualFold(value, protocol) {
 80                    allowed = true
 81                    break
 82                }
 83            }
 84            if !allowed {
 85                info.IsValid = false
 86                info.Warnings = append(info.Warnings, fmt.Sprintf("Protocol '%s' not allowed (only http, https)", value))
 87            }
 88
 89        case "host":
 90            info.Host = strings.ToLower(value)
 91
 92            // Check blocked hosts (SSRF protection)
 93            for _, blocked := range sup.blockedHosts {
 94                if strings.Contains(info.Host, blocked) {
 95                    info.IsValid = false
 96                    info.Warnings = append(info.Warnings, fmt.Sprintf("Host '%s' is blocked (potential SSRF vulnerability)", blocked))
 97                }
 98            }
 99
100            // Check for IP address ranges (private networks)
101            if sup.isPrivateIP(info.Host) {
102                info.IsValid = false
103                info.Warnings = append(info.Warnings, "Private IP addresses are blocked")
104            }
105
106            // Check for suspicious patterns
107            if strings.Contains(value, "..") {
108                info.IsValid = false
109                info.Warnings = append(info.Warnings, "Host contains path traversal pattern (..)")
110            }
111
112            // Validate hostname format
113            if !regexp.MustCompile(`^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)*$`).MatchString(info.Host) {
114                info.IsValid = false
115                info.Warnings = append(info.Warnings, "Invalid hostname format")
116            }
117
118        case "port":
119            info.Port = value
120            if value != "" {
121                port, err := strconv.Atoi(value)
122                if err != nil || port < 1 || port > 65535 {
123                    info.IsValid = false
124                    info.Warnings = append(info.Warnings, fmt.Sprintf("Invalid port: %s (must be 1-65535)", value))
125                }
126            }
127
128        case "path":
129            info.Path = value
130            // Security checks for path
131            if strings.Contains(value, "..") {
132                info.IsValid = false
133                info.Warnings = append(info.Warnings, "Path contains directory traversal (..)")
134            }
135            if regexp.MustCompile(`<script|javascript:|on\w+=`).MatchString(strings.ToLower(value)) {
136                info.IsValid = false
137                info.Warnings = append(info.Warnings, "Path contains potential XSS payload")
138            }
139            // Check for null bytes
140            if strings.Contains(value, "\x00") {
141                info.IsValid = false
142                info.Warnings = append(info.Warnings, "Path contains null byte")
143            }
144
145        case "query":
146            info.Query = value
147            // Check query string for common injection patterns
148            if regexp.MustCompile(`<script|javascript:|on\w+=`).MatchString(strings.ToLower(value)) {
149                info.IsValid = false
150                info.Warnings = append(info.Warnings, "Query string contains potential XSS payload")
151            }
152
153        case "fragment":
154            info.Fragment = value
155        }
156    }
157
158    return info
159}
160
161func (sup *SecureURLParser) isPrivateIP(host string) bool {
162    // Check for common private IP ranges
163    privateRanges := []string{
164        "10.",
165        "172.16.", "172.17.", "172.18.", "172.19.", "172.20.", "172.21.", "172.22.", "172.23.",
166        "172.24.", "172.25.", "172.26.", "172.27.", "172.28.", "172.29.", "172.30.", "172.31.",
167        "192.168.",
168    }
169
170    for _, prefix := range privateRanges {
171        if strings.HasPrefix(host, prefix) {
172            return true
173        }
174    }
175
176    return false
177}
178
179func main() {
180    parser := NewSecureURLParser()
181
182    testURLs := []string{
183        "https://example.com/path/to/resource",
184        "http://localhost:8080/admin",
185        "https://192.168.1.1/internal",
186        "https://example.com/../../etc/passwd",
187        "https://example.com/path?param=<script>alert('xss')</script>",
188        "https://169.254.169.254/latest/meta-data/", // AWS metadata
189        "https://example.com:99999/invalid-port",
190        "ftp://example.com/file",
191        strings.Repeat("a", 3000) + ".com",
192        "https://valid-domain.com/safe/path?id=123",
193    }
194
195    for _, testURL := range testURLs {
196        fmt.Printf("Parsing: %s\n", testURL)
197        info := parser.Parse(testURL)
198
199        if info.Protocol != "" {
200            fmt.Printf("  Protocol: %s\n", info.Protocol)
201        }
202        if info.Host != "" {
203            fmt.Printf("  Host: %s\n", info.Host)
204        }
205        if info.Port != "" {
206            fmt.Printf("  Port: %s\n", info.Port)
207        }
208        if info.Path != "" {
209            fmt.Printf("  Path: %s\n", info.Path)
210        }
211        fmt.Printf("  Valid: %v\n", info.IsValid)
212
213        if len(info.Warnings) > 0 {
214            fmt.Println("  Warnings:")
215            for _, warning := range info.Warnings {
216                fmt.Printf("    ⚠ %s\n", warning)
217            }
218        }
219        fmt.Println()
220    }
221}
222// run

Common Patterns and Pitfalls

Performance Optimization: Compile Once, Use Many Times

⚠️ Critical Performance Issue: Compiling regex patterns is expensive. Never compile in hot paths.

  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "sync"
  7    "time"
  8)
  9
 10// BAD: Compiling regex in loop
 11func badValidation(emails []string) []bool {
 12    results := make([]bool, len(emails))
 13
 14    start := time.Now()
 15    for i, email := range emails {
 16        // Compiles regex EVERY iteration - very slow!
 17        re := regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
 18        results[i] = re.MatchString(email)
 19    }
 20    duration := time.Since(start)
 21
 22    fmt.Printf("BAD approach: %v for %d emails\n", duration, len(emails))
 23    return results
 24}
 25
 26// GOOD: Pre-compile regex
 27func goodValidation(emails []string) []bool {
 28    // Compile once
 29    emailRegex := regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
 30    results := make([]bool, len(emails))
 31
 32    start := time.Now()
 33    for i, email := range emails {
 34        results[i] = emailRegex.MatchString(email) // Reuse compiled regex
 35    }
 36    duration := time.Since(start)
 37
 38    fmt.Printf("GOOD approach: %v for %d emails\n", duration, len(emails))
 39    return results
 40}
 41
 42// Production-ready validator with caching
 43type RegexCache struct {
 44    cache map[string]*regexp.Regexp
 45    mu    sync.RWMutex
 46}
 47
 48func NewRegexCache() *RegexCache {
 49    return &RegexCache{
 50        cache: make(map[string]*regexp.Regexp),
 51    }
 52}
 53
 54func (rc *RegexCache) Get(pattern string) (*regexp.Regexp, error) {
 55    // Check cache with read lock
 56    rc.mu.RLock()
 57    if re, exists := rc.cache[pattern]; exists {
 58        rc.mu.RUnlock()
 59        return re, nil
 60    }
 61    rc.mu.RUnlock()
 62
 63    // Compile and cache with write lock
 64    rc.mu.Lock()
 65    defer rc.mu.Unlock()
 66
 67    // Double-check after acquiring write lock
 68    if re, exists := rc.cache[pattern]; exists {
 69        return re, nil
 70    }
 71
 72    re, err := regexp.Compile(pattern)
 73    if err != nil {
 74        return nil, fmt.Errorf("invalid regex pattern '%s': %w", pattern, err)
 75    }
 76
 77    rc.cache[pattern] = re
 78    return re, nil
 79}
 80
 81// Best practice: Use package-level variables for common patterns
 82var (
 83    emailRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
 84    phoneRegex = regexp.MustCompile(`^\+?1?\d{10}$`)
 85    urlRegex   = regexp.MustCompile(`^https?://[^\s]+$`)
 86)
 87
 88func bestPracticeValidation(emails []string) []bool {
 89    results := make([]bool, len(emails))
 90
 91    start := time.Now()
 92    for i, email := range emails {
 93        results[i] = emailRegex.MatchString(email)
 94    }
 95    duration := time.Since(start)
 96
 97    fmt.Printf("BEST approach (package-level): %v for %d emails\n", duration, len(emails))
 98    return results
 99}
100
101func main() {
102    // Generate test emails
103    emails := make([]string, 1000)
104    for i := range emails {
105        emails[i] = fmt.Sprintf("user%d@example.com", i)
106    }
107
108    // Benchmark approaches
109    fmt.Println("=== Performance Comparison ===\n")
110
111    badValidation(emails[:10]) // Only 10 to avoid slowness
112    goodValidation(emails)
113    bestPracticeValidation(emails)
114
115    // Demonstrate cache usage
116    fmt.Println("\n=== Regex Cache Example ===")
117    cache := NewRegexCache()
118
119    patterns := []string{
120        `^\+?1?\d{10}$`,
121        `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`,
122        `^\+?1?\d{10}$`, // Duplicate - will use cached version
123    }
124
125    for i, pattern := range patterns {
126        re, err := cache.Get(pattern)
127        if err != nil {
128            fmt.Printf("Pattern %d: Error - %v\n", i+1, err)
129        } else {
130            fmt.Printf("Pattern %d: Compiled successfully (from cache: %v)\n",
131                i+1, i == 2) // Third pattern is duplicate
132            _ = re
133        }
134    }
135}
136// run

💡 Production Pattern:

  1. Compile regex at package initialization time
  2. Use sync.Once for lazy initialization if needed
  3. Implement a cache for dynamic patterns
  4. Never compile regex in request handlers or loops

Avoiding Catastrophic Backtracking

⚠️ Security Critical: Certain regex patterns can cause exponential time complexity, leading to DoS attacks (ReDoS).

  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "strings"
  7    "time"
  8)
  9
 10// Demonstrate catastrophic backtracking (or lack thereof in Go's RE2)
 11func demonstrateBacktracking() {
 12    fmt.Println("=== Go's RE2 Linear-Time Guarantee ===\n")
 13
 14    // Patterns that WOULD be vulnerable in other regex engines
 15    vulnerablePatterns := []struct {
 16        name    string
 17        pattern string
 18    }{
 19        {"Nested quantifiers", `^(a+)+$`},
 20        {"Alternation overlap", `^(.*|a.*|ab.*)$`},
 21        {"Repeated groups", `^(a|a)*$`},
 22        {"Complex nesting", `^(a+)+(b+)+(c+)+$`},
 23    }
 24
 25    testInputs := []string{
 26        strings.Repeat("a", 20),
 27        strings.Repeat("a", 30),
 28        strings.Repeat("a", 40) + "b", // Won't match
 29    }
 30
 31    for _, vp := range vulnerablePatterns {
 32        fmt.Printf("Testing '%s' pattern: %s\n", vp.name, vp.pattern)
 33        re := regexp.MustCompile(vp.pattern)
 34
 35        for _, input := range testInputs {
 36            start := time.Now()
 37            result := re.MatchString(input)
 38            duration := time.Since(start)
 39
 40            fmt.Printf("  Input length %d: match=%v, time=%v\n",
 41                len(input), result, duration)
 42        }
 43        fmt.Println()
 44    }
 45}
 46
 47// Safe pattern alternatives
 48func demonstrateSafeAlternatives() {
 49    fmt.Println("=== Safe Pattern Alternatives ===\n")
 50
 51    examples := []struct {
 52        name   string
 53        unsafe string
 54        safe   string
 55        test   string
 56    }{
 57        {
 58            name:   "Repeated groups",
 59            unsafe: `(a+)*`,
 60            safe:   `a*`,
 61            test:   "aaaaaaaaaa",
 62        },
 63        {
 64            name:   "Alternation with overlap",
 65            unsafe: `(.*|a.*|ab.*)`,
 66            safe:   `(ab.*|a.*|.*)`,
 67            test:   "abcdefghij",
 68        },
 69        {
 70            name:   "Wildcard repetition",
 71            unsafe: `.+.+`,
 72            safe:   `.{2,}`,
 73            test:   "hello world",
 74        },
 75        {
 76            name:   "Optional repetition",
 77            unsafe: `(x+)?(x+)?x`,
 78            safe:   `x+`,
 79            test:   "xxxxxxxxxx",
 80        },
 81    }
 82
 83    for _, ex := range examples {
 84        fmt.Printf("%s:\n", ex.name)
 85        fmt.Printf("  Unsafe pattern: %s\n", ex.unsafe)
 86        fmt.Printf("  Safe pattern:   %s\n", ex.safe)
 87
 88        unsafeRe := regexp.MustCompile(ex.unsafe)
 89        safeRe := regexp.MustCompile(ex.safe)
 90
 91        // Test with progressively longer strings
 92        for length := 10; length <= 50; length += 10 {
 93            testStr := strings.Repeat(ex.test[:1], length)
 94
 95            start := time.Now()
 96            unsafeRe.MatchString(testStr)
 97            unsafeDuration := time.Since(start)
 98
 99            start = time.Now()
100            safeRe.MatchString(testStr)
101            safeDuration := time.Since(start)
102
103            fmt.Printf("  Length %d: unsafe=%v, safe=%v\n",
104                length, unsafeDuration, safeDuration)
105        }
106        fmt.Println()
107    }
108}
109
110// Best practices for safe patterns
111func demonstrateBestPractices() {
112    fmt.Println("=== Best Practices for Safe Patterns ===\n")
113
114    tips := []struct {
115        tip     string
116        bad     string
117        good    string
118        example string
119    }{
120        {
121            tip:     "Use specific quantifiers instead of +/*",
122            bad:     `\w+`,
123            good:    `\w{1,50}`,
124            example: "username",
125        },
126        {
127            tip:     "Avoid nested quantifiers",
128            bad:     `(a+)+`,
129            good:    `a+`,
130            example: "aaaa",
131        },
132        {
133            tip:     "Use character classes instead of alternation",
134            bad:     `(a|b|c|d)`,
135            good:    `[a-d]`,
136            example: "abcd",
137        },
138        {
139            tip:     "Be specific with wildcards",
140            bad:     `.*`,
141            good:    `[^\n]*`,
142            example: "text\nmore",
143        },
144        {
145            tip:     "Anchor patterns when possible",
146            bad:     `\d+`,
147            good:    `^\d+$`,
148            example: "12345",
149        },
150    }
151
152    for i, tip := range tips {
153        fmt.Printf("%d. %s\n", i+1, tip.tip)
154        fmt.Printf("   Bad:  %s\n", tip.bad)
155        fmt.Printf("   Good: %s\n", tip.good)
156
157        goodRe := regexp.MustCompile(tip.good)
158        fmt.Printf("   Example '%s' matches: %v\n\n", tip.example, goodRe.MatchString(tip.example))
159    }
160}
161
162func main() {
163    demonstrateBacktracking()
164    demonstrateSafeAlternatives()
165    demonstrateBestPractices()
166}
167// run

💡 Go's Safety: Go's regexp package uses the RE2 engine, which guarantees linear time complexity. However, poorly written patterns can still be slow - just not exponentially slow like in other engines.

Greedy vs Non-Greedy Matching

Understanding quantifier behavior is crucial for correct pattern matching.

 1package main
 2
 3import (
 4    "fmt"
 5    "regexp"
 6)
 7
 8func demonstrateGreedy() {
 9    fmt.Println("=== Greedy Matching (default) ===\n")
10
11    // Greedy quantifiers match as much as possible
12    greedyRe := regexp.MustCompile(`<.*>`)
13    html := "<div>content</div><span>more</span>"
14
15    match := greedyRe.FindString(html)
16    fmt.Printf("Pattern: %s\n", `<.*>`)
17    fmt.Printf("Text: %s\n", html)
18    fmt.Printf("Greedy match: %s\n", match)
19    fmt.Printf("  → Matched from first < to last >\n\n")
20
21    fmt.Println("=== Non-Greedy Matching ===\n")
22
23    // Non-greedy quantifiers (? suffix) match as little as possible
24    nonGreedyRe := regexp.MustCompile(`<.*?>`)
25    matches := nonGreedyRe.FindAllString(html, -1)
26
27    fmt.Printf("Pattern: %s\n", `<.*?>`)
28    fmt.Printf("Text: %s\n", html)
29    fmt.Printf("Non-greedy matches: %v\n", matches)
30    fmt.Printf("  → Matched smallest possible strings\n\n")
31
32    fmt.Println("=== Practical Examples ===\n")
33
34    // Example 1: Extracting quoted strings
35    text1 := `He said "hello" and she said "goodbye"`
36
37    greedyQuote := regexp.MustCompile(`".*"`)
38    nonGreedyQuote := regexp.MustCompile(`".*?"`)
39
40    fmt.Printf("Text: %s\n", text1)
41    fmt.Printf("Greedy \".*\":     %s\n", greedyQuote.FindString(text1))
42    fmt.Printf("Non-greedy \".*?\": %v\n\n", nonGreedyQuote.FindAllString(text1, -1))
43
44    // Example 2: HTML tag extraction
45    html2 := "<b>bold</b> and <i>italic</i>"
46
47    greedyTag := regexp.MustCompile(`<.+>`)
48    nonGreedyTag := regexp.MustCompile(`<.+?>`)
49
50    fmt.Printf("HTML: %s\n", html2)
51    fmt.Printf("Greedy <.+>:     %s\n", greedyTag.FindString(html2))
52    fmt.Printf("Non-greedy <.+?>: %v\n\n", nonGreedyTag.FindAllString(html2, -1))
53
54    // Example 3: Path extraction
55    path := "/users/123/posts/456/comments/789"
56
57    greedyPath := regexp.MustCompile(`/\w+/\d+`)
58    nonGreedyPath := regexp.MustCompile(`/\w+?/\d+`)
59
60    fmt.Printf("Path: %s\n", path)
61    fmt.Printf("Greedy pattern:     %v\n", greedyPath.FindAllString(path, -1))
62    fmt.Printf("Non-greedy pattern: %v\n\n", nonGreedyPath.FindAllString(path, -1))
63
64    fmt.Println("=== Quantifier Reference ===")
65    fmt.Println("  *    = greedy (0 or more)")
66    fmt.Println("  *?   = non-greedy (0 or more)")
67    fmt.Println("  +    = greedy (1 or more)")
68    fmt.Println("  +?   = non-greedy (1 or more)")
69    fmt.Println("  ?    = greedy (0 or 1)")
70    fmt.Println("  ??   = non-greedy (0 or 1)")
71    fmt.Println("  {n,m} = greedy (n to m)")
72    fmt.Println("  {n,m}? = non-greedy (n to m)")
73}
74
75func main() {
76    demonstrateGreedy()
77}
78// run

💡 Production Pattern: Use non-greedy quantifiers when extracting content between delimiters to avoid matching too much.

Integration and Mastery

Production-Ready Input Validation Framework

🎯 Production Pattern: Create a comprehensive validation system that handles multiple input types with consistent error handling.

  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "strings"
  7)
  8
  9type ValidationError struct {
 10    Field   string
 11    Value   string
 12    Message string
 13    Code    string
 14}
 15
 16type ValidationRule struct {
 17    Pattern   *regexp.Regexp
 18    Required  bool
 19    MinLength int
 20    MaxLength int
 21    Message   string
 22    Code      string
 23}
 24
 25type Validator struct {
 26    rules map[string]*ValidationRule
 27}
 28
 29func NewValidator() *Validator {
 30    return &Validator{
 31        rules: make(map[string]*ValidationRule),
 32    }
 33}
 34
 35func (v *Validator) AddRule(field, pattern string, options map[string]interface{}) error {
 36    re, err := regexp.Compile(pattern)
 37    if err != nil {
 38        return fmt.Errorf("invalid pattern for field '%s': %w", field, err)
 39    }
 40
 41    rule := &ValidationRule{
 42        Pattern:  re,
 43        Required: true,
 44        Code:     "INVALID_FORMAT",
 45    }
 46
 47    // Apply options
 48    for key, value := range options {
 49        switch key {
 50        case "required":
 51            rule.Required = value.(bool)
 52        case "min_length":
 53            rule.MinLength = value.(int)
 54        case "max_length":
 55            rule.MaxLength = value.(int)
 56        case "message":
 57            rule.Message = value.(string)
 58        case "code":
 59            rule.Code = value.(string)
 60        }
 61    }
 62
 63    v.rules[field] = rule
 64    return nil
 65}
 66
 67func (v *Validator) Validate(data map[string]string) []ValidationError {
 68    var errors []ValidationError
 69
 70    // Validate present fields
 71    for field, value := range data {
 72        rule, exists := v.rules[field]
 73        if !exists {
 74            continue
 75        }
 76
 77        errors = append(errors, v.validateField(field, value, rule)...)
 78    }
 79
 80    // Check for missing required fields
 81    for field, rule := range v.rules {
 82        if rule.Required {
 83            if _, exists := data[field]; !exists {
 84                errors = append(errors, ValidationError{
 85                    Field:   field,
 86                    Value:   "",
 87                    Message: fmt.Sprintf("%s is required", field),
 88                    Code:    "REQUIRED",
 89                })
 90            }
 91        }
 92    }
 93
 94    return errors
 95}
 96
 97func (v *Validator) validateField(field, value string, rule *ValidationRule) []ValidationError {
 98    var errors []ValidationError
 99
100    // Trim whitespace
101    value = strings.TrimSpace(value)
102
103    // Required check
104    if rule.Required && value == "" {
105        errors = append(errors, ValidationError{
106            Field:   field,
107            Value:   value,
108            Message: fmt.Sprintf("%s is required", field),
109            Code:    "REQUIRED",
110        })
111        return errors // No point checking other rules
112    }
113
114    // Skip other validations if empty and not required
115    if value == "" && !rule.Required {
116        return errors
117    }
118
119    // Length checks
120    if rule.MinLength > 0 && len(value) < rule.MinLength {
121        errors = append(errors, ValidationError{
122            Field:   field,
123            Value:   value,
124            Message: fmt.Sprintf("%s must be at least %d characters", field, rule.MinLength),
125            Code:    "TOO_SHORT",
126        })
127    }
128
129    if rule.MaxLength > 0 && len(value) > rule.MaxLength {
130        errors = append(errors, ValidationError{
131            Field:   field,
132            Value:   value,
133            Message: fmt.Sprintf("%s must be at most %d characters", field, rule.MaxLength),
134            Code:    "TOO_LONG",
135        })
136    }
137
138    // Pattern validation
139    if !rule.Pattern.MatchString(value) {
140        message := rule.Message
141        if message == "" {
142            message = fmt.Sprintf("%s has invalid format", field)
143        }
144
145        errors = append(errors, ValidationError{
146            Field:   field,
147            Value:   value,
148            Message: message,
149            Code:    rule.Code,
150        })
151    }
152
153    return errors
154}
155
156func main() {
157    // Create validator with common patterns
158    validator := NewValidator()
159
160    // Add validation rules
161    validator.AddRule("email", `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`, map[string]interface{}{
162        "required":   true,
163        "max_length": 254,
164        "message":    "Please enter a valid email address",
165        "code":       "INVALID_EMAIL",
166    })
167
168    validator.AddRule("username", `^[a-zA-Z0-9_-]{3,16}$`, map[string]interface{}{
169        "required":   true,
170        "min_length": 3,
171        "max_length": 16,
172        "message":    "Username must be 3-16 alphanumeric characters, hyphens, or underscores",
173        "code":       "INVALID_USERNAME",
174    })
175
176    validator.AddRule("phone", `^\+?[1-9]\d{1,14}$`, map[string]interface{}{
177        "required": false,
178        "message":  "Please enter a valid international phone number",
179        "code":     "INVALID_PHONE",
180    })
181
182    validator.AddRule("password", `^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$`, map[string]interface{}{
183        "required":   true,
184        "min_length": 8,
185        "message":    "Password must be at least 8 characters with uppercase, lowercase, number, and special character",
186        "code":       "WEAK_PASSWORD",
187    })
188
189    validator.AddRule("zipcode", `^\d{5}(-\d{4})?$`, map[string]interface{}{
190        "required": false,
191        "message":  "ZIP code must be 5 digits or 5+4 format",
192        "code":     "INVALID_ZIPCODE",
193    })
194
195    // Test validation
196    testCases := []map[string]string{
197        {
198            "email":    "user@example.com",
199            "username": "validuser123",
200            "phone":    "+1234567890",
201            "password": "SecurePass123!",
202            "zipcode":  "12345-6789",
203        },
204        {
205            "email":    "invalid-email",
206            "username": "ab",
207            "phone":    "123",
208            "password": "weak",
209        },
210        {
211            "username": "toolongusername12345",
212            "password": "NoNumber!",
213        },
214        {}, // Missing required fields
215    }
216
217    for i, testCase := range testCases {
218        fmt.Printf("=== Test Case %d ===\n", i+1)
219        fmt.Printf("Input: %v\n", testCase)
220
221        errors := validator.Validate(testCase)
222
223        if len(errors) == 0 {
224            fmt.Println("✓ All validations passed")
225        } else {
226            fmt.Printf("✗ %d validation error(s):\n", len(errors))
227            for _, err := range errors {
228                fmt.Printf("  Field: %s\n", err.Field)
229                if err.Value != "" {
230                    fmt.Printf("  Value: %s\n", err.Value)
231                }
232                fmt.Printf("  Error: %s\n", err.Message)
233                fmt.Printf("  Code: %s\n", err.Code)
234                fmt.Println()
235            }
236        }
237        fmt.Println()
238    }
239}
240// run

Text Processing Pipeline with Regex

Build a complete text processing system demonstrating real-world regex integration.

  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "strings"
  7)
  8
  9type TextProcessor struct {
 10    linkRe      *regexp.Regexp
 11    emailRe     *regexp.Regexp
 12    hashtagRe   *regexp.Regexp
 13    mentionRe   *regexp.Regexp
 14    codeBlockRe *regexp.Regexp
 15}
 16
 17func NewTextProcessor() *TextProcessor {
 18    return &TextProcessor{
 19        linkRe:      regexp.MustCompile(`https?://[^\s]+`),
 20        emailRe:     regexp.MustCompile(`\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b`),
 21        hashtagRe:   regexp.MustCompile(`#\w+`),
 22        mentionRe:   regexp.MustCompile(`@\w+`),
 23        codeBlockRe: regexp.MustCompile("`([^`]+)`"),
 24    }
 25}
 26
 27// Extract all entities from text
 28func (tp *TextProcessor) ExtractEntities(text string) map[string][]string {
 29    return map[string][]string{
 30        "links":      tp.linkRe.FindAllString(text, -1),
 31        "emails":     tp.emailRe.FindAllString(text, -1),
 32        "hashtags":   tp.hashtagRe.FindAllString(text, -1),
 33        "mentions":   tp.mentionRe.FindAllString(text, -1),
 34        "code":       tp.extractCodeBlocks(text),
 35    }
 36}
 37
 38func (tp *TextProcessor) extractCodeBlocks(text string) []string {
 39    matches := tp.codeBlockRe.FindAllStringSubmatch(text, -1)
 40    result := make([]string, len(matches))
 41    for i, match := range matches {
 42        result[i] = match[1]
 43    }
 44    return result
 45}
 46
 47// Sanitize text by removing potentially harmful content
 48func (tp *TextProcessor) Sanitize(text string) string {
 49    // Remove script tags
 50    scriptRe := regexp.MustCompile(`(?i)<script[^>]*>.*?</script>`)
 51    text = scriptRe.ReplaceAllString(text, "")
 52
 53    // Remove event handlers
 54    eventRe := regexp.MustCompile(`(?i)\son\w+\s*=\s*["'][^"']*["']`)
 55    text = eventRe.ReplaceAllString(text, "")
 56
 57    // Remove javascript: URLs
 58    jsUrlRe := regexp.MustCompile(`(?i)javascript:`)
 59    text = jsUrlRe.ReplaceAllString(text, "")
 60
 61    return strings.TrimSpace(text)
 62}
 63
 64// Format text by converting entities to HTML
 65func (tp *TextProcessor) FormatHTML(text string) string {
 66    // Convert links to anchor tags
 67    text = tp.linkRe.ReplaceAllStringFunc(text, func(link string) string {
 68        return fmt.Sprintf(`<a href="%s" target="_blank">%s</a>`, link, link)
 69    })
 70
 71    // Convert code blocks to code tags
 72    text = tp.codeBlockRe.ReplaceAllStringFunc(text, func(code string) string {
 73        inner := tp.codeBlockRe.FindStringSubmatch(code)[1]
 74        return fmt.Sprintf("<code>%s</code>", inner)
 75    })
 76
 77    // Convert hashtags to links
 78    text = tp.hashtagRe.ReplaceAllStringFunc(text, func(hashtag string) string {
 79        tag := strings.TrimPrefix(hashtag, "#")
 80        return fmt.Sprintf(`<a href="/tags/%s">%s</a>`, tag, hashtag)
 81    })
 82
 83    // Convert mentions to links
 84    text = tp.mentionRe.ReplaceAllStringFunc(text, func(mention string) string {
 85        username := strings.TrimPrefix(mention, "@")
 86        return fmt.Sprintf(`<a href="/users/%s">%s</a>`, username, mention)
 87    })
 88
 89    return text
 90}
 91
 92// Highlight search terms in text
 93func (tp *TextProcessor) Highlight(text, searchTerm string) string {
 94    if searchTerm == "" {
 95        return text
 96    }
 97
 98    // Escape special regex characters in search term
 99    escapedTerm := regexp.QuoteMeta(searchTerm)
100    highlightRe := regexp.MustCompile(`(?i)(` + escapedTerm + `)`)
101
102    return highlightRe.ReplaceAllString(text, "<mark>$1</mark>")
103}
104
105// Extract summary (first N words)
106func (tp *TextProcessor) ExtractSummary(text string, maxWords int) string {
107    // Remove extra whitespace
108    text = regexp.MustCompile(`\s+`).ReplaceAllString(text, " ")
109    text = strings.TrimSpace(text)
110
111    // Split into words
112    words := strings.Fields(text)
113
114    if len(words) <= maxWords {
115        return text
116    }
117
118    return strings.Join(words[:maxWords], " ") + "..."
119}
120
121func main() {
122    processor := NewTextProcessor()
123
124    // Example text
125    text := `Check out https://example.com for more info!
126Contact support@example.com or visit our site.
127Use code: \`npm install package\` to get started.
128Follow @johndoe and use #golang for questions.
129<script>alert('xss')</script>
130Visit javascript:void(0) for nothing.`
131
132    fmt.Println("=== Original Text ===")
133    fmt.Println(text)
134
135    fmt.Println("\n=== Extracted Entities ===")
136    entities := processor.ExtractEntities(text)
137    for entityType, values := range entities {
138        if len(values) > 0 {
139            fmt.Printf("%s: %v\n", entityType, values)
140        }
141    }
142
143    fmt.Println("\n=== Sanitized Text ===")
144    sanitized := processor.Sanitize(text)
145    fmt.Println(sanitized)
146
147    fmt.Println("\n=== Formatted HTML ===")
148    formatted := processor.FormatHTML(sanitized)
149    fmt.Println(formatted)
150
151    fmt.Println("\n=== With Highlighting (search: 'golang') ===")
152    highlighted := processor.Highlight(formatted, "golang")
153    fmt.Println(highlighted)
154
155    fmt.Println("\n=== Summary (15 words) ===")
156    summary := processor.ExtractSummary(sanitized, 15)
157    fmt.Println(summary)
158}
159// run

Practice Exercises

The exercises from the original article are preserved below with enhanced learning objectives and real-world context, plus two new exercises.

Exercise 1: Advanced Log Parser with Performance Optimization

Learning Objectives: Master complex pattern matching, implement multi-format parsing strategies, and handle structured data extraction with regular expressions while optimizing for high-volume log processing.

Real-World Context: Log parsing is fundamental to observability and monitoring systems. Tools like Splunk, ELK Stack, and Fluentd process millions of log entries daily to extract meaningful insights. This exercise teaches you patterns used in production log analysis for troubleshooting, security monitoring, and business intelligence.

Difficulty: Advanced | Time Estimate: 60 minutes

Write a log parser that extracts structured information from log lines with multiple formats while handling edge cases, malformed entries, and performance optimization for high-volume processing.

Solution
  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "time"
  7)
  8
  9type LogEntry struct {
 10    Timestamp time.Time
 11    Level     string
 12    Service   string
 13    Message   string
 14    RequestID string
 15    UserID    string
 16}
 17
 18type LogParser struct {
 19    patterns []*regexp.Regexp
 20}
 21
 22func NewLogParser() *LogParser {
 23    patterns := []string{
 24        // Format 1: 2024-03-15T10:30:45Z [INFO] service=api msg="Request received" request_id=abc123 user_id=456
 25        `(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z) \[(?P<level>\w+)\] service=(?P<service>\w+) msg="(?P<message>[^"]*)"(?: request_id=(?P<request_id>\w+))?(?: user_id=(?P<user_id>\w+))?`,
 26
 27        // Format 2: 2024-03-15 10:30:45 INFO [api] Request received (rid:abc123, uid:456)
 28        `(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<level>\w+) \[(?P<service>\w+)\] (?P<message>.*?)(?: \(rid:(?P<request_id>\w+)(?:, uid:(?P<user_id>\w+))?\))?$`,
 29    }
 30
 31    compiled := make([]*regexp.Regexp, len(patterns))
 32    for i, pattern := range patterns {
 33        compiled[i] = regexp.MustCompile(pattern)
 34    }
 35
 36    return &LogParser{patterns: compiled}
 37}
 38
 39func (lp *LogParser) Parse(line string) (*LogEntry, error) {
 40    for _, re := range lp.patterns {
 41        matches := re.FindStringSubmatch(line)
 42        if matches == nil {
 43            continue
 44        }
 45
 46        names := re.SubexpNames()
 47        entry := &LogEntry{}
 48
 49        for i, name := range names {
 50            if i == 0 || name == "" {
 51                continue
 52            }
 53
 54            value := matches[i]
 55
 56            switch name {
 57            case "timestamp":
 58                // Try different time formats
 59                formats := []string{
 60                    "2006-01-02T15:04:05Z",
 61                    "2006-01-02 15:04:05",
 62                }
 63                for _, format := range formats {
 64                    if t, err := time.Parse(format, value); err == nil {
 65                        entry.Timestamp = t
 66                        break
 67                    }
 68                }
 69            case "level":
 70                entry.Level = value
 71            case "service":
 72                entry.Service = value
 73            case "message":
 74                entry.Message = value
 75            case "request_id":
 76                entry.RequestID = value
 77            case "user_id":
 78                entry.UserID = value
 79            }
 80        }
 81
 82        return entry, nil
 83    }
 84
 85    return nil, fmt.Errorf("no pattern matched")
 86}
 87
 88func main() {
 89    parser := NewLogParser()
 90
 91    logs := []string{
 92        `2024-03-15T10:30:45Z [INFO] service=api msg="Request received" request_id=abc123 user_id=456`,
 93        `2024-03-15T10:30:46Z [ERROR] service=db msg="Connection failed"`,
 94        `2024-03-15 10:30:47 INFO [api] Request received (rid:xyz789, uid:123)`,
 95        `2024-03-15 10:30:48 WARN [cache] Cache miss`,
 96    }
 97
 98    for _, log := range logs {
 99        entry, err := parser.Parse(log)
100        if err != nil {
101            fmt.Printf("Failed to parse: %s\n", log)
102            continue
103        }
104
105        fmt.Printf("Parsed: %+v\n", entry)
106    }
107}

Exercise 2: URL Router with Pattern Matching

Learning Objectives: Build routing systems, implement parameter extraction, and understand pattern matching in web frameworks.

Real-World Context: URL routing is the foundation of all web frameworks, from Express.js to Ruby on Rails to Go's Gin and Chi. Understanding how routing works under the hood helps you build more efficient APIs and debug routing issues. This exercise reveals patterns used in production web servers handling millions of requests.

Difficulty: Intermediate | Time Estimate: 45 minutes

Create a simple URL router using regex for path pattern matching while supporting parameter extraction, middleware integration, and performance optimization for high-traffic applications.

Solution
  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6)
  7
  8type Route struct {
  9    Pattern *regexp.Regexp
 10    Names   []string
 11    Handler func(map[string]string)
 12}
 13
 14type Router struct {
 15    routes []Route
 16}
 17
 18func NewRouter() *Router {
 19    return &Router{routes: make([]Route, 0)}
 20}
 21
 22func (r *Router) AddRoute(pattern string, handler func(map[string]string)) error {
 23    // Convert pattern to regex
 24    // /users/:id -> /users/(?P<id>[^/]+)
 25    // /posts/:id/comments/:cid -> /posts/(?P<id>[^/]+)/comments/(?P<cid>[^/]+)
 26
 27    regexPattern := regexp.MustCompile(`:(\w+)`).ReplaceAllString(pattern, `(?P<$1>[^/]+)`)
 28    regexPattern = "^" + regexPattern + "$"
 29
 30    re, err := regexp.Compile(regexPattern)
 31    if err != nil {
 32        return err
 33    }
 34
 35    route := Route{
 36        Pattern: re,
 37        Names:   re.SubexpNames(),
 38        Handler: handler,
 39    }
 40
 41    r.routes = append(r.routes, route)
 42    return nil
 43}
 44
 45func (r *Router) Match(path string) bool {
 46    for _, route := range r.routes {
 47        matches := route.Pattern.FindStringSubmatch(path)
 48        if matches == nil {
 49            continue
 50        }
 51
 52        // Extract parameters
 53        params := make(map[string]string)
 54        for i, name := range route.Names {
 55            if i > 0 && name != "" {
 56                params[name] = matches[i]
 57            }
 58        }
 59
 60        // Call handler
 61        route.Handler(params)
 62        return true
 63    }
 64
 65    return false
 66}
 67
 68func main() {
 69    router := NewRouter()
 70
 71    // Define routes
 72    router.AddRoute("/users/:id", func(params map[string]string) {
 73        fmt.Printf("User handler: ID=%s\n", params["id"])
 74    })
 75
 76    router.AddRoute("/posts/:id/comments/:cid", func(params map[string]string) {
 77        fmt.Printf("Comment handler: Post ID=%s, Comment ID=%s\n",
 78            params["id"], params["cid"])
 79    })
 80
 81    router.AddRoute("/api/v:version/:resource", func(params map[string]string) {
 82        fmt.Printf("API handler: Version=%s, Resource=%s\n",
 83            params["version"], params["resource"])
 84    })
 85
 86    // Test routes
 87    paths := []string{
 88        "/users/123",
 89        "/posts/456/comments/789",
 90        "/api/v2/products",
 91        "/not/found",
 92    }
 93
 94    for _, path := range paths {
 95        fmt.Printf("\nMatching: %s\n", path)
 96        if !router.Match(path) {
 97            fmt.Println("  No route matched")
 98        }
 99    }
100}

Exercise 3: Email Validator with Custom Rules

Learning Objectives: Implement advanced validation patterns, handle business rules with regex, and build robust input validation systems.

Real-World Context: Email validation is critical for user registration systems, marketing campaigns, and communication platforms. From preventing spam to ensuring deliverability, proper email validation saves businesses millions in lost revenue and prevents security issues. This exercise teaches production-level validation patterns used by services like Mailchimp and SendGrid.

Difficulty: Advanced | Time Estimate: 55 minutes

Build an email validator that enforces custom rules beyond basic format validation while handling internationalization, disposable email detection, and domain validation.

Requirements:

  • Valid email format
  • Blacklist certain domains
  • Require corporate email addresses
  • Check for disposable email providers
  • Validate TLD length and format
Solution with Explanation
  1package main
  2
  3import (
  4	"fmt"
  5	"regexp"
  6	"strings"
  7)
  8
  9// EmailValidator validates email addresses with custom rules
 10type EmailValidator struct {
 11	emailPattern       *regexp.Regexp
 12	blacklistedDomains map[string]bool
 13	allowedDomains     map[string]bool
 14	disposableDomains  map[string]bool
 15	requireCorporate   bool
 16}
 17
 18type ValidationResult struct {
 19	Valid  bool
 20	Errors []string
 21}
 22
 23func NewEmailValidator(requireCorporate bool) *EmailValidator {
 24	return &EmailValidator{
 25		// RFC 5322 compliant email pattern
 26		emailPattern: regexp.MustCompile(`^[a-zA-Z0-9.!#$%&'*+/=?^_` + "`" + `{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$`),
 27		blacklistedDomains: map[string]bool{
 28			"spam.com":       true,
 29			"mailinator.com": true,
 30			"tempmail.com":   true,
 31		},
 32		allowedDomains: map[string]bool{
 33			"company.com": true,
 34			"corp.com":    true,
 35		},
 36		disposableDomains: map[string]bool{
 37			"guerrillamail.com": true,
 38			"10minutemail.com":  true,
 39			"throwaway.email":   true,
 40		},
 41		requireCorporate: requireCorporate,
 42	}
 43}
 44
 45func (v *EmailValidator) Validate(email string) ValidationResult {
 46	result := ValidationResult{Valid: true, Errors: []string{}}
 47
 48	email = strings.TrimSpace(strings.ToLower(email))
 49
 50	// Rule 1: Basic format validation
 51	if !v.emailPattern.MatchString(email) {
 52		result.Valid = false
 53		result.Errors = append(result.Errors, "Invalid email format")
 54		return result // No point checking further if format is wrong
 55	}
 56
 57	// Extract domain
 58	parts := strings.Split(email, "@")
 59	if len(parts) != 2 {
 60		result.Valid = false
 61		result.Errors = append(result.Errors, "Invalid email structure")
 62		return result
 63	}
 64
 65	domain := parts[1]
 66
 67	// Rule 2: Check blacklisted domains
 68	if v.blacklistedDomains[domain] {
 69		result.Valid = false
 70		result.Errors = append(result.Errors, fmt.Sprintf("Domain '%s' is blacklisted", domain))
 71	}
 72
 73	// Rule 3: Check disposable email providers
 74	if v.disposableDomains[domain] {
 75		result.Valid = false
 76		result.Errors = append(result.Errors, "Disposable email addresses are not allowed")
 77	}
 78
 79	// Rule 4: Corporate email requirement
 80	if v.requireCorporate && !v.allowedDomains[domain] {
 81		result.Valid = false
 82		result.Errors = append(result.Errors, "Only corporate email addresses are allowed")
 83	}
 84
 85	// Rule 5: Validate TLD
 86	tldPattern := regexp.MustCompile(`\.([a-z]{2,})$`)
 87	matches := tldPattern.FindStringSubmatch(domain)
 88	if len(matches) < 2 {
 89		result.Valid = false
 90		result.Errors = append(result.Errors, "Invalid top-level domain")
 91	} else {
 92		tld := matches[1]
 93		if len(tld) < 2 || len(tld) > 6 {
 94			result.Valid = false
 95			result.Errors = append(result.Errors, "TLD must be between 2 and 6 characters")
 96		}
 97	}
 98
 99	// Rule 6: Check for common typos in popular domains
100	commonDomains := map[string]string{
101		"gmial.com":   "gmail.com",
102		"gmai.com":    "gmail.com",
103		"yahooo.com":  "yahoo.com",
104		"hotmial.com": "hotmail.com",
105	}
106
107	if correctDomain, found := commonDomains[domain]; found {
108		result.Valid = false
109		result.Errors = append(result.Errors,
110			fmt.Sprintf("Did you mean '%s' instead of '%s'?", correctDomain, domain))
111	}
112
113	// Rule 7: Check local part length
114	localPart := parts[0]
115	if len(localPart) > 64 {
116		result.Valid = false
117		result.Errors = append(result.Errors, "Local part exceeds 64 characters")
118	}
119
120	// Rule 8: Check for consecutive dots
121	if strings.Contains(email, "..") {
122		result.Valid = false
123		result.Errors = append(result.Errors, "Consecutive dots are not allowed")
124	}
125
126	return result
127}
128
129func main() {
130	// Test without corporate requirement
131	validator := NewEmailValidator(false)
132
133	testEmails := []string{
134		"user@company.com",
135		"test@gmail.com",
136		"invalid@spam.com",
137		"user@10minutemail.com",
138		"bad..email@domain.com",
139		"toolonglocalpartaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaart@domain.com",
140		"user@gmial.com",
141		"valid.user+tag@example.co.uk",
142		"not-an-email",
143		"user@domain.c",
144	}
145
146	fmt.Println("=== Email Validation (No Corporate Requirement) ===")
147	for _, email := range testEmails {
148		result := validator.Validate(email)
149		fmt.Printf("\nEmail: %s\n", email)
150		if result.Valid {
151			fmt.Println("  Status: ✓ Valid")
152		} else {
153			fmt.Println("  Status: ✗ Invalid")
154			for _, err := range result.Errors {
155				fmt.Printf("    - %s\n", err)
156			}
157		}
158	}
159
160	// Test with corporate requirement
161	fmt.Println("\n=== Email Validation (Corporate Requirement) ===")
162	corporateValidator := NewEmailValidator(true)
163
164	corporateEmails := []string{
165		"employee@company.com",
166		"user@gmail.com",
167		"contractor@corp.com",
168	}
169
170	for _, email := range corporateEmails {
171		result := corporateValidator.Validate(email)
172		fmt.Printf("\nEmail: %s\n", email)
173		if result.Valid {
174			fmt.Println("  Status: ✓ Valid")
175		} else {
176			fmt.Println("  Status: ✗ Invalid")
177			for _, err := range result.Errors {
178				fmt.Printf("    - %s\n", err)
179			}
180		}
181	}
182}

Explanation:

This email validator demonstrates several regex techniques and validation patterns:

  1. RFC-Compliant Pattern: Uses a comprehensive regex that handles most valid email formats including special characters, hyphens in domains, and multi-part TLDs.

  2. Domain Extraction: Splits email to separately validate the local part and domain.

  3. TLD Validation: Uses regex to extract and validate top-level domain length.

  4. Multi-Rule Validation: Applies multiple validation rules in sequence, collecting all errors rather than failing on the first one.

  5. Practical Checks:

    • Blacklist/whitelist domain checking
    • Disposable email detection
    • Common typo detection
    • Consecutive dot prevention
    • Length constraints
  6. User-Friendly Error Messages: Provides specific error messages for each validation failure, including suggestions for common typos.

This pattern is useful for production email validation where you need more than basic format checking, such as preventing spam signups, enforcing corporate email policies, or improving user experience by catching common mistakes.

Exercise 4: Phone Number Validator (NEW)

Learning Objectives: Handle international phone number formats, implement flexible validation for multiple countries, and manage formatting variations.

Real-World Context: Phone number validation is essential for user verification, SMS notifications, and communication systems. From e-commerce checkouts to two-factor authentication, proper phone validation prevents failed deliveries and improves user experience. This exercise teaches patterns used by Twilio, AWS SNS, and other messaging platforms.

Difficulty: Intermediate | Time Estimate: 40 minutes

Build a phone number validator that supports multiple international formats (US, UK, Germany, Japan) with proper country code handling and format normalization.

Requirements:

  • Support multiple country formats
  • Validate country codes
  • Normalize phone numbers to E.164 format
  • Detect and extract country/area codes
Solution
  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "strings"
  7)
  8
  9type PhoneFormat struct {
 10    Country string
 11    Pattern *regexp.Regexp
 12    Example string
 13}
 14
 15type PhoneValidator struct {
 16    formats []PhoneFormat
 17}
 18
 19func NewPhoneValidator() *PhoneValidator {
 20    formats := []PhoneFormat{
 21        {
 22            Country: "US",
 23            Pattern: regexp.MustCompile(`^\+?1?[-.\s]?\(?([2-9]\d{2})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$`),
 24            Example: "+1 (555) 123-4567",
 25        },
 26        {
 27            Country: "UK",
 28            Pattern: regexp.MustCompile(`^\+?44[-.\s]?(\d{4})[-.\s]?(\d{6})$`),
 29            Example: "+44 7700 900123",
 30        },
 31        {
 32            Country: "Germany",
 33            Pattern: regexp.MustCompile(`^\+?49[-.\s]?(\d{3})[-.\s]?(\d{7,8})$`),
 34            Example: "+49 30 12345678",
 35        },
 36        {
 37            Country: "Japan",
 38            Pattern: regexp.MustCompile(`^\+?81[-.\s]?(\d{1,4})[-.\s]?(\d{1,4})[-.\s]?(\d{4})$`),
 39            Example: "+81 3-1234-5678",
 40        },
 41    }
 42
 43    return &PhoneValidator{formats: formats}
 44}
 45
 46func (pv *PhoneValidator) Validate(phone string) (bool, string, string) {
 47    // Remove common separators
 48    cleaned := strings.Map(func(r rune) rune {
 49        if r == ' ' || r == '-' || r == '.' || r == '(' || r == ')' {
 50            return -1
 51        }
 52        return r
 53    }, phone)
 54
 55    for _, format := range pv.formats {
 56        if format.Pattern.MatchString(phone) {
 57            // Extract E.164 format
 58            e164 := pv.toE164(phone, format)
 59            return true, format.Country, e164
 60        }
 61    }
 62
 63    return false, "", ""
 64}
 65
 66func (pv *PhoneValidator) toE164(phone string, format PhoneFormat) string {
 67    // Extract digits only
 68    digitsOnly := regexp.MustCompile(`\d+`).FindAllString(phone, -1)
 69    digits := strings.Join(digitsOnly, "")
 70
 71    // Add country code if missing
 72    if !strings.HasPrefix(digits, "+") {
 73        switch format.Country {
 74        case "US":
 75            if len(digits) == 10 {
 76                digits = "1" + digits
 77            }
 78        case "UK":
 79            if !strings.HasPrefix(digits, "44") {
 80                digits = "44" + digits
 81            }
 82        case "Germany":
 83            if !strings.HasPrefix(digits, "49") {
 84                digits = "49" + digits
 85            }
 86        case "Japan":
 87            if !strings.HasPrefix(digits, "81") {
 88                digits = "81" + digits
 89            }
 90        }
 91    }
 92
 93    return "+" + digits
 94}
 95
 96func main() {
 97    validator := NewPhoneValidator()
 98
 99    testPhones := []string{
100        "+1 (555) 123-4567",
101        "555-123-4567",
102        "+44 7700 900123",
103        "+49 30 12345678",
104        "+81 3-1234-5678",
105        "invalid-phone",
106        "123",
107    }
108
109    for _, phone := range testPhones {
110        valid, country, e164 := validator.Validate(phone)
111        fmt.Printf("Phone: %-20s Valid: %-5v", phone, valid)
112        if valid {
113            fmt.Printf("Country: %-10s E.164: %s", country, e164)
114        }
115        fmt.Println()
116    }
117}

Exercise 5: Data Extraction from Mixed Format Documents (NEW)

Learning Objectives: Build complex extraction patterns, handle multiple data formats in single documents, implement robust error handling for malformed data.

Real-World Context: Real-world documents often contain mixed formats - invoices with dates, amounts, and reference numbers; contracts with parties, dates, and clauses; or resumes with contact info, dates, and skills. This exercise teaches patterns used by document processing systems like DocuSign, legal tech platforms, and HR systems.

Difficulty: Advanced | Time Estimate: 65 minutes

Create a document parser that extracts structured data from text containing multiple types of information (dates, amounts, emails, phone numbers, reference codes) while handling formatting inconsistencies and validation.

Requirements:

  • Extract multiple entity types from single text
  • Handle various date formats
  • Parse currency amounts with symbols
  • Extract and validate reference codes
  • Build structured output from unstructured text
Solution
  1package main
  2
  3import (
  4    "fmt"
  5    "regexp"
  6    "strings"
  7    "time"
  8)
  9
 10type DocumentData struct {
 11    Dates      []string
 12    Amounts    []string
 13    Emails     []string
 14    Phones     []string
 15    References []string
 16    Names      []string
 17}
 18
 19type DocumentParser struct {
 20    dateRe      *regexp.Regexp
 21    amountRe    *regexp.Regexp
 22    emailRe     *regexp.Regexp
 23    phoneRe     *regexp.Regexp
 24    refRe       *regexp.Regexp
 25    nameRe      *regexp.Regexp
 26}
 27
 28func NewDocumentParser() *DocumentParser {
 29    return &DocumentParser{
 30        // Date patterns: MM/DD/YYYY, YYYY-MM-DD, Month DD, YYYY
 31        dateRe: regexp.MustCompile(`\b(\d{1,2}/\d{1,2}/\d{4}|\d{4}-\d{2}-\d{2}|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* \d{1,2},? \d{4})\b`),
 32
 33        // Amount patterns: $1,234.56, USD 1234.56, €1.234,56
 34        amountRe: regexp.MustCompile(`(?:USD|EUR|GBP|\$|€|£)\s*[\d,]+\.?\d*`),
 35
 36        // Email pattern
 37        emailRe: regexp.MustCompile(`\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b`),
 38
 39        // Phone patterns
 40        phoneRe: regexp.MustCompile(`\+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}`),
 41
 42        // Reference codes: INV-12345, REF#98765, ORD-2024-001
 43        refRe: regexp.MustCompile(`\b(?:INV|REF|ORD|PO|ID)[-#]?\d{3,}[-]?\d*\b`),
 44
 45        // Name patterns: Mr./Ms./Dr. FirstName LastName
 46        nameRe: regexp.MustCompile(`\b(?:Mr\.|Ms\.|Mrs\.|Dr\.) [A-Z][a-z]+ [A-Z][a-z]+\b`),
 47    }
 48}
 49
 50func (dp *DocumentParser) Parse(text string) DocumentData {
 51    return DocumentData{
 52        Dates:      dp.extractDates(text),
 53        Amounts:    dp.amountRe.FindAllString(text, -1),
 54        Emails:     dp.emailRe.FindAllString(text, -1),
 55        Phones:     dp.extractPhones(text),
 56        References: dp.refRe.FindAllString(text, -1),
 57        Names:      dp.nameRe.FindAllString(text, -1),
 58    }
 59}
 60
 61func (dp *DocumentParser) extractDates(text string) []string {
 62    matches := dp.dateRe.FindAllString(text, -1)
 63    validated := make([]string, 0, len(matches))
 64
 65    for _, match := range matches {
 66        // Try to parse to validate it's a real date
 67        formats := []string{
 68            "01/02/2006",
 69            "2006-01-02",
 70            "Jan 2, 2006",
 71            "January 2, 2006",
 72        }
 73
 74        for _, format := range formats {
 75            if _, err := time.Parse(format, match); err == nil {
 76                validated = append(validated, match)
 77                break
 78            }
 79        }
 80    }
 81
 82    return validated
 83}
 84
 85func (dp *DocumentParser) extractPhones(text string) []string {
 86    matches := dp.phoneRe.FindAllString(text, -1)
 87    filtered := make([]string, 0, len(matches))
 88
 89    for _, match := range matches {
 90        // Filter out things that look like phone numbers but aren't
 91        // (e.g., dates, amounts)
 92        digits := regexp.MustCompile(`\d`).FindAllString(match, -1)
 93        if len(digits) >= 7 && len(digits) <= 15 {
 94            filtered = append(filtered, match)
 95        }
 96    }
 97
 98    return filtered
 99}
100
101func (dp *DocumentParser) FormatOutput(data DocumentData) string {
102    var sb strings.Builder
103
104    sb.WriteString("=== Extracted Document Data ===\n\n")
105
106    if len(data.Dates) > 0 {
107        sb.WriteString("Dates:\n")
108        for _, date := range data.Dates {
109            sb.WriteString(fmt.Sprintf("  - %s\n", date))
110        }
111        sb.WriteString("\n")
112    }
113
114    if len(data.Amounts) > 0 {
115        sb.WriteString("Amounts:\n")
116        for _, amount := range data.Amounts {
117            sb.WriteString(fmt.Sprintf("  - %s\n", amount))
118        }
119        sb.WriteString("\n")
120    }
121
122    if len(data.Emails) > 0 {
123        sb.WriteString("Emails:\n")
124        for _, email := range data.Emails {
125            sb.WriteString(fmt.Sprintf("  - %s\n", email))
126        }
127        sb.WriteString("\n")
128    }
129
130    if len(data.Phones) > 0 {
131        sb.WriteString("Phones:\n")
132        for _, phone := range data.Phones {
133            sb.WriteString(fmt.Sprintf("  - %s\n", phone))
134        }
135        sb.WriteString("\n")
136    }
137
138    if len(data.References) > 0 {
139        sb.WriteString("Reference Codes:\n")
140        for _, ref := range data.References {
141            sb.WriteString(fmt.Sprintf("  - %s\n", ref))
142        }
143        sb.WriteString("\n")
144    }
145
146    if len(data.Names) > 0 {
147        sb.WriteString("Names:\n")
148        for _, name := range data.Names {
149            sb.WriteString(fmt.Sprintf("  - %s\n", name))
150        }
151        sb.WriteString("\n")
152    }
153
154    return sb.String()
155}
156
157func main() {
158    parser := NewDocumentParser()
159
160    // Sample document text
161    document := `
162INVOICE
163
164Invoice Number: INV-2024-12345
165Date: March 15, 2024
166Due Date: 04/15/2024
167
168Bill To:
169Dr. Jane Smith
170Email: jane.smith@example.com
171Phone: +1 (555) 123-4567
172
173Ship To:
174Mr. John Doe
175Email: john.doe@company.com
176Phone: +44 20 7123 4567
177
178Items:
179- Product A: $1,234.56
180- Product B: USD 987.65
181- Service C: €456.78
182
183Total Amount: $2,679.99
184
185Please reference order number ORD-2024-001 in your payment.
186For questions, contact support@example.com or call 1-800-555-0199.
187
188Payment due by 2024-04-15.
189Thank you for your business!
190`
191
192    data := parser.Parse(document)
193    output := parser.FormatOutput(data)
194
195    fmt.Println(output)
196
197    // Additional analysis
198    fmt.Println("=== Analysis ===")
199    fmt.Printf("Total entities extracted: %d\n",
200        len(data.Dates)+len(data.Amounts)+len(data.Emails)+
201            len(data.Phones)+len(data.References)+len(data.Names))
202}

Summary

💡 Key Takeaways:

  • Compile Once, Use Many Times - Pattern compilation is expensive, do it at startup or cache it
  • Anchors Are Essential - Use ^ and $ to match complete strings, not substrings, for security
  • Simple Beats Complex - A 95% solution that runs fast beats a 100% solution that crawls
  • RE2 Limitations Matter - No lookahead/lookbehind, but guaranteed linear time prevents ReDoS
  • Security First - Always validate inputs and prevent ReDoS attacks with safe patterns
  • Named Groups Aid Maintenance - Use (?P<name>...) for complex patterns to improve readability
  • Test Edge Cases - Regex bugs often surface with unusual input like empty strings, Unicode, or very long text

⚠️ Production Considerations:

  • Cache Compiled Patterns - Never compile regex in hot paths or request handlers
  • Validate User Patterns - Never compile regex patterns from untrusted input without validation
  • Profile Performance - Regex can be a bottleneck in high-throughput systems, measure before optimizing
  • Document Complex Patterns - Future you will thank present you for explaining what (?:(?:[^"\\]|\\.)*) means
  • Test Edge Cases - Regex bugs often surface with unusual input - test empty strings, Unicode, very long strings
  • Security Audits - Review regex patterns during security audits for potential DoS vectors
  • Monitor Performance - Track regex execution time in production to catch performance regressions

Real-world Wisdom: Regular expressions are like power tools - incredibly useful when used appropriately, but dangerous when misapplied. The famous quote "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems" exists for a reason. Use regex for pattern matching and validation, but reach for proper parsers when you're dealing with structured data like JSON, XML, or programming languages.

When to use regex:

  • Validating user input (emails, phone numbers, usernames)
  • Text searching and simple extraction
  • Log file analysis and filtering
  • Simple data transformation and replacement
  • URL routing and pattern matching
  • Quick prototypes and one-off scripts

When NOT to use regex:

  • Parsing HTML/XML (use proper parsers like golang.org/x/net/html)
  • Complex programming language parsing (use go/parser or similar)
  • When performance is critical and simpler alternatives exist (strings package)
  • For highly nested or recursive structures (regex isn't designed for this)
  • When the pattern becomes unreadable and unmaintainable
  • Configuration file parsing (use structured formats like JSON, YAML, TOML)

Next Steps:

  • Advanced Patterns: Study advanced techniques like conditional patterns and subroutines (if your regex engine supports them)
  • Performance Optimization: Learn to measure and optimize regex performance with benchmarks
  • Security Patterns: Master secure validation and ReDoS prevention techniques
  • Alternative Libraries: Explore specialized libraries for complex parsing tasks
  • Testing Strategies: Build comprehensive test suites for regex patterns with edge cases
  • Documentation Tools: Learn to use tools like regex101.com for pattern documentation and sharing

Mastering Go's regexp package transforms you from a developer who struggles with text processing to one who builds robust, performant pattern matching systems that handle real-world complexity safely and efficiently. Remember: the best regex is often the one you didn't write - if a simpler solution exists using the strings package or structured parsing, prefer that.