Network Programming

Think of network programming as the digital equivalent of the postal system. Just as mail needs addresses, routes, and delivery methods, network applications need protocols, sockets, and data transmission strategies. Go's net package provides everything you need to build everything from simple client-server applications to complex distributed systems.

In today's connected world, every application is a network application. Whether you're building a web service, mobile app backend, or IoT device, understanding network programming is essential. Go makes network programming accessible with its simple yet powerful abstractions that handle the complexity while giving you control when you need it.

💡 Key Takeaway: Network programming in Go is like building with LEGOs—you have simple, well-defined pieces that can be combined to create complex, robust systems.

Introduction to Network Programming

Understanding network programming requires grasping several foundational concepts. At its core, network communication follows a layered model where each layer provides specific services to the layers above it. While you don't need to memorize every detail of the OSI model, understanding that TCP/IP operates at different layers helps you choose the right abstraction for your needs.

The Network Stack: When you write net.Dial("tcp", "example.com:80"), Go handles DNS resolution, TCP connection establishment, and socket management—all the complex details that would take hundreds of lines to implement manually. This is the power of Go's networking abstractions: they hide complexity without sacrificing control.

Why Go Excels at Network Programming: Go was designed at Google to build networked systems. Its concurrency primitives (goroutines and channels) make handling thousands of simultaneous connections natural. The standard library is production-ready, battle-tested in services handling billions of requests daily.

Before diving into code, let's understand the building blocks. Think of network programming as learning a new language—once you understand the grammar and vocabulary, you can express complex ideas fluently.

Go's networking capabilities are built on a few core packages:

  • net: Core networking primitives
  • net/http: HTTP client and server
  • net/url: URL parsing and manipulation
  • net/textproto: Text-based protocol utilities

Basic Network Concepts

The OSI and TCP/IP Models

Understanding Network Layers: Network communication is organized into layers, each providing services to the layer above. The OSI model defines seven layers, but the simpler TCP/IP model (which better reflects reality) has four: Link, Internet, Transport, and Application.

Application Layer: This is where your Go code operates. HTTP, FTP, SMTP, DNS—these are application-layer protocols. Go's net/http and net packages provide application-layer functionality.

Transport Layer: TCP and UDP operate here, providing end-to-end communication between applications. TCP adds reliability, ordering, and flow control. UDP is minimal—just port numbers and checksums.

Internet Layer: IP handles routing packets across networks. IPv4 uses 32-bit addresses; IPv6 uses 128-bit addresses. Go abstracts IPv4/IPv6 differences—most code works with both transparently.

Link Layer: Ethernet, WiFi, and other technologies operate here. Go's net package abstracts link-layer details—you usually don't need to worry about MAC addresses or Ethernet frames.

Address Resolution and Port Numbers

Port Numbers: Ports identify specific applications on a host. Well-known ports (0-1023) are reserved for standard services: 80 for HTTP, 443 for HTTPS, 22 for SSH. Registered ports (1024-49151) are assigned by IANA. Dynamic ports (49152-65535) are available for temporary use.

Socket Pairs: A TCP connection is uniquely identified by four values: source IP, source port, destination IP, destination port. This tuple allows multiple connections between the same hosts—each gets a different source port.

Address Resolution: DNS isn't the only name resolution mechanism. Hosts files (/etc/hosts) provide static mappings. mDNS (multicast DNS) enables local network discovery without DNS servers. Go's resolver uses the system's configuration, checking hosts files before DNS.

Network Byte Order

Endianness: Different CPUs store multi-byte integers differently. Network protocols use big-endian (most significant byte first), called network byte order. Go's encoding/binary package handles conversions. The binary.BigEndian type represents network byte order.

Why This Matters: When implementing binary protocols, you must use network byte order for interoperability. A 16-bit port number must be sent with the high byte first, regardless of your CPU's native format.

Let's start with the fundamentals. Network addresses are like street addresses—they tell us where to send data, and different address types serve different purposes. Understanding these basics will make everything else clearer.

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6    "time"
 7)
 8
 9// run
10func main() {
11    // Network address parsing
12    addr, err := net.ResolveTCPAddr("tcp", "localhost:8080")
13    if err != nil {
14        fmt.Println("Error:", err)
15        return
16    }
17    fmt.Printf("Network: %s, Address: %s\n", addr.Network(), addr.String())
18
19    // IP address parsing
20    ip := net.ParseIP("192.168.1.1")
21    if ip != nil {
22        fmt.Printf("IP: %s, Is IPv4: %t\n", ip, ip.To4() != nil)
23    }
24
25    // Network interface information
26    interfaces, _ := net.Interfaces()
27    fmt.Println("\nNetwork Interfaces:")
28    for _, iface := range interfaces {
29        fmt.Printf("  %s\n", iface.Name, iface.Flags)
30    }
31
32    // Timeout operations
33    timeout := 5 * time.Second
34    fmt.Printf("\nDefault timeout: %s\n", timeout)
35}

Network Addresses Deep Dive: Every network communication needs an address—a way to identify the destination. In Go, addresses are more than just strings; they're typed structures that validate format and provide useful methods. The net.TCPAddr, net.UDPAddr, and net.UnixAddr types give you compile-time safety and runtime flexibility.

When you parse an address like localhost:8080, Go resolves the hostname, validates the port number, and creates a structure that can be used with various network operations. This abstraction means you can write code that works with IPv4, IPv6, and Unix domain sockets without changing your core logic.

Network Types in Go

⚠️ Important: Understanding network types is crucial for building efficient applications. Using the wrong network type can lead to performance issues or security vulnerabilities.

Go supports multiple network types, each designed for specific use cases:

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6)
 7
 8// run
 9func main() {
10    // Different network types
11    networks := []string{
12        "tcp",   // TCP/IPv4 and IPv6
13        "tcp4",  // TCP/IPv4 only
14        "tcp6",  // TCP/IPv6 only
15        "udp",   // UDP/IPv4 and IPv6
16        "udp4",  // UDP/IPv4 only
17        "udp6",  // UDP/IPv6 only
18        "ip",    // Raw IP packets
19        "unix",  // Unix domain sockets
20    }
21
22    fmt.Println("Supported Network Types:")
23    for _, network := range networks {
24        fmt.Printf("  - %s\n", network)
25    }
26
27    // IP address classification
28    testIPs := []string{
29        "127.0.0.1",
30        "::1",
31        "192.168.1.1",
32        "8.8.8.8",
33        "2001:4860:4860::8888",
34    }
35
36    fmt.Println("\nIP Address Classification:")
37    for _, ipStr := range testIPs {
38        ip := net.ParseIP(ipStr)
39        if ip == nil {
40            continue
41        }
42        fmt.Printf("  %s: ", ipStr)
43        if ip.IsLoopback() {
44            fmt.Print("Loopback ")
45        }
46        if ip.IsPrivate() {
47            fmt.Print("Private ")
48        }
49        if ip.To4() != nil {
50            fmt.Print("IPv4")
51        } else {
52            fmt.Print("IPv6")
53        }
54        fmt.Println()
55    }
56}

Network Error Handling

Network operations can fail in numerous ways, and robust applications must handle these failures gracefully. Understanding common network errors helps you build resilient applications.

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6    "os"
 7    "syscall"
 8)
 9
10// run
11func main() {
12    // Demonstrate different network error types
13    demonstrateNetworkErrors()
14}
15
16func demonstrateNetworkErrors() {
17    // Connection refused error
18    conn, err := net.Dial("tcp", "localhost:99999")
19    if err != nil {
20        if opErr, ok := err.(*net.OpError); ok {
21            fmt.Printf("Operation: %s\n", opErr.Op)
22            fmt.Printf("Network: %s\n", opErr.Net)
23            fmt.Printf("Error: %v\n", opErr.Err)
24            
25            // Check specific error types
26            if opErr.Timeout() {
27                fmt.Println("This was a timeout error")
28            }
29            if opErr.Temporary() {
30                fmt.Println("This is a temporary error, retry may succeed")
31            }
32        }
33    }
34    if conn != nil {
35        conn.Close()
36    }
37    
38    // DNS errors
39    _, err = net.LookupHost("this-domain-does-not-exist-12345.com")
40    if err != nil {
41        if dnsErr, ok := err.(*net.DNSError); ok {
42            fmt.Printf("\nDNS Error: %s\n", dnsErr.Name)
43            fmt.Printf("Temporary: %t, Timeout: %t\n", dnsErr.IsTemporary, dnsErr.IsTimeout)
44        }
45    }
46}

Common Network Errors:

  • Connection Refused: Target port isn't listening
  • Network Unreachable: Routing problem or network down
  • Host Unreachable: Target host not responding
  • Timeout: Operation took too long
  • Connection Reset: Remote end closed abruptly
  • DNS Failure: Hostname resolution failed

Network Interface Information

Understanding your system's network interfaces is crucial for debugging and for applications that need to listen on specific interfaces.

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6)
 7
 8// run
 9func main() {
10    // Get all network interfaces
11    interfaces, err := net.Interfaces()
12    if err != nil {
13        fmt.Println("Error:", err)
14        return
15    }
16    
17    fmt.Println("Network Interfaces:")
18    for _, iface := range interfaces {
19        fmt.Printf("\nInterface: %s\n", iface.Name)
20        fmt.Printf("  MTU: %d\n", iface.MTU)
21        fmt.Printf("  Hardware Address: %s\n", iface.HardwareAddr)
22        fmt.Printf("  Flags: %s\n", iface.Flags)
23        
24        // Get addresses for this interface
25        addrs, err := iface.Addrs()
26        if err != nil {
27            continue
28        }
29        
30        for _, addr := range addrs {
31            fmt.Printf("  Address: %s\n", addr)
32        }
33        
34        // Check interface status
35        if iface.Flags&net.FlagUp != 0 {
36            fmt.Println("  Status: UP")
37        }
38        if iface.Flags&net.FlagLoopback != 0 {
39            fmt.Println("  Type: Loopback")
40        }
41    }
42}

TCP Server and Client

TCP (Transmission Control Protocol) is the workhorse of the internet. When you load a webpage, send an email, or transfer a file, TCP is usually involved. It provides reliable, ordered, and error-checked delivery of data between applications.

Why TCP Matters: Unlike UDP, TCP guarantees that your data arrives intact and in order. It handles packet loss, retransmission, flow control, and congestion control automatically. This makes TCP perfect for applications where data integrity is crucial: databases, file transfers, HTTP, SSH, and most network protocols.

The Three-Way Handshake: Every TCP connection begins with a three-way handshake (SYN, SYN-ACK, ACK). Go's net.Dial handles this automatically, but understanding it helps you debug connection issues and set appropriate timeouts.

Now let's move from theory to practice. TCP is like registered mail—it guarantees delivery, maintains order, and ensures data integrity. This reliability makes TCP perfect for most applications where data must arrive correctly and in order.

Think of a TCP connection as a phone conversation—you establish a connection, exchange messages back and forth, and then hang up. The connection ensures that messages arrive in the order they were sent and that none are lost along the way.

Basic TCP Server

TCP Connection Lifecycle: Understanding the lifecycle of a TCP connection helps you write better network code. A connection goes through several states: CLOSED → SYN_SENT → ESTABLISHED → FIN_WAIT → TIME_WAIT → CLOSED. Go abstracts this complexity, but knowing it helps with debugging.

Socket Basics: At the OS level, sockets are file descriptors. When you call net.Listen, Go creates a socket, binds it to an address, and listens for connections. Each accepted connection gets its own socket. This multiplexing is handled by the OS kernel, making Go servers highly efficient.

Concurrent Client Handling: The power of Go's networking comes from goroutines. Each client can have its own goroutine without the overhead of threads. This enables servers to handle thousands or even millions of concurrent connections—something that would be impractical with traditional thread-per-client models.

Let's start with a simple echo server that demonstrates the fundamental TCP concepts. This server listens for connections, receives messages, and echoes them back in uppercase.

 1package main
 2
 3import (
 4    "bufio"
 5    "fmt"
 6    "net"
 7    "strings"
 8    "time"
 9)
10
11// run
12func main() {
13    // Start server in background
14    go startEchoServer()
15
16    // Give server time to start
17    time.Sleep(100 * time.Millisecond)
18
19    // Test the server
20    testClient()
21}
22
23func startEchoServer() {
24    listener, err := net.Listen("tcp", "localhost:8080")
25    if err != nil {
26        fmt.Println("Server error:", err)
27        return
28    }
29    defer listener.Close()
30
31    fmt.Println("Echo server listening on localhost:8080")
32
33    // Accept one connection for demo
34    conn, err := listener.Accept()
35    if err != nil {
36        fmt.Println("Accept error:", err)
37        return
38    }
39
40    handleConnection(conn)
41}
42
43func handleConnection(conn net.Conn) {
44    defer conn.Close()
45
46    fmt.Printf("Client connected: %s\n", conn.RemoteAddr())
47
48    scanner := bufio.NewScanner(conn)
49    for scanner.Scan() {
50        msg := scanner.Text()
51        fmt.Printf("Received: %s\n", msg)
52
53        // Echo back
54        response := strings.ToUpper(msg) + "\n"
55        conn.Write([]byte(response))
56
57        if msg == "quit" {
58            break
59        }
60    }
61
62    fmt.Println("Client disconnected")
63}
64
65func testClient() {
66    conn, err := net.Dial("tcp", "localhost:8080")
67    if err != nil {
68        fmt.Println("Client error:", err)
69        return
70    }
71    defer conn.Close()
72
73    messages := []string{"hello", "world", "quit"}
74
75    for _, msg := range messages {
76        fmt.Printf("Sending: %s\n", msg)
77        fmt.Fprintf(conn, "%s\n", msg)
78
79        response, _ := bufio.NewReader(conn).ReadString('\n')
80        fmt.Printf("Response: %s", response)
81    }
82}

TCP State Machine and Connection Management

Understanding TCP's internal state machine helps you write more robust network code and debug connection issues effectively.

The Eleven States: TCP connections transition through states: CLOSED, LISTEN, SYN-SENT, SYN-RECEIVED, ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT. Each state has specific meanings and transitions.

ESTABLISHED State: This is the normal operating state where data can be transferred bidirectionally. Most of a connection's lifetime is spent here. Both sides can send and receive data freely.

TIME-WAIT State: After closing a connection, it enters TIME-WAIT for 2*MSL (Maximum Segment Lifetime, typically 2-4 minutes). This prevents old duplicate packets from interfering with new connections using the same port. High-traffic servers can exhaust ports due to TIME-WAIT connections—this is why connection pooling matters.

Half-Close: TCP supports half-close—one direction closed while the other remains open. Go's net.TCPConn has CloseRead() and CloseWrite() for this. Half-close is useful when you've finished sending but want to receive acknowledgments or remaining data.

TCP Options: Beyond basic headers, TCP supports options for features like: window scaling (for high-bandwidth connections), timestamps (for RTT measurement), selective acknowledgment (SACK) for efficient retransmission, and maximum segment size (MSS) negotiation.

TCP Flow Control and Congestion Control

Network Security Best Practices

Network security is critical for production applications. Even if your application doesn't handle sensitive data directly, it's part of a larger system that might.

Input Validation: Never trust network input. Validate sizes, formats, and content. A malicious client could send gigabyte messages to exhaust memory, or crafted data to exploit parsing bugs.

Rate Limiting: Protect against DOS attacks by limiting requests per IP. Use token buckets or sliding windows. Go's golang.org/x/time/rate package provides excellent primitives.

TLS Everywhere: Use TLS (HTTPS) for all external communication. Let's Encrypt provides free certificates. Go's crypto/tls package makes TLS easy—often just changing "tcp" to "tls" in dial/listen calls.

Certificate Validation: Always validate TLS certificates. Don't skip verification in production (a common debugging hack that becomes a security hole). Pin certificates for critical services.

Timeout All Operations: Attackers can slowloris—open connections and send data very slowly to exhaust resources. Timeouts protect against this. Set read/write deadlines on all connections.

Resource Limits: Limit concurrent connections, request sizes, request rates. Monitor resource usage. Implement graceful degradation when limits are approached.

Network Testing Strategies

Unit Testing with net.Pipe: For testing protocol implementations without actual network, use net.Pipe(). It creates connected in-memory connections perfect for testing.

Loopback Testing: Tests can listen on localhost and connect to themselves. This tests real network stack but avoids external dependencies.

Table-Driven Tests: Network code has many edge cases—test them systematically with table-driven tests covering: normal operation, timeouts, connection loss, malformed data, large messages, concurrent operations.

Race Detection: Network code is inherently concurrent. Always run tests with race detector (go test -race). Catches subtle concurrency bugs that might only appear under load.

Load Testing: Functional tests aren't enough—test under load to find bottlenecks, memory leaks, and race conditions that only appear with many concurrent connections.

Network Monitoring and Observability

Metrics to Track: Request rate (requests/second), error rate (errors/second), latency (p50, p90, p99), connection count (active connections), throughput (bytes/second).

Structured Logging: Log connection establishment, errors, slow requests. Use structured logging (JSON) for easy parsing. Include request IDs for tracing.

Distributed Tracing: For microservices, implement tracing (OpenTelemetry, Zipkin, Jaeger). Traces show request flow through services, identifying bottlenecks.

Health Checks: Expose health check endpoints. Kubernetes and load balancers use these to route traffic only to healthy instances.

Connection Metrics: Track connection pool statistics—active connections, wait time, creation rate. These reveal bottlenecks and sizing issues.

Flow Control: Prevents fast senders from overwhelming slow receivers. TCP uses a sliding window—the receiver advertises how much buffer space it has, and the sender won't exceed this. Go handles this automatically, but understanding it explains why sending can block.

Congestion Control: Prevents senders from overwhelming the network. TCP uses algorithms like Reno, Cubic, and BBR to detect congestion (via packet loss or RTT increases) and reduce send rate accordingly. This is why network throughput varies—TCP constantly adapts to conditions.

Nagle's Algorithm: Combines small writes into larger packets to reduce overhead. Can add latency for interactive applications. Disable with SetNoDelay(true) when you need immediate transmission. SSH and Telnet disable Nagle for responsive interaction.

TCP Fast Open: Extension allowing data in the initial SYN packet, reducing handshake latency by one RTT. Requires OS and remote support. Go's standard library doesn't expose TFO directly, but the OS may use it transparently.

Concurrent TCP Server

A real-world TCP server needs to handle multiple clients simultaneously. Think of this as a receptionist handling multiple phone calls—you need a system to manage multiple conversations without mixing them up. In Go, goroutines make this surprisingly elegant.

A production TCP server handles multiple connections concurrently:

  1package main
  2
  3import (
  4    "bufio"
  5    "context"
  6    "fmt"
  7    "net"
  8    "sync"
  9    "sync/atomic"
 10    "time"
 11)
 12
 13// run
 14func main() {
 15    server := NewTCPServer("localhost:9090")
 16
 17    ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
 18    defer cancel()
 19
 20    // Run server
 21    go func() {
 22        if err := server.Start(ctx); err != nil {
 23            fmt.Println("Server error:", err)
 24        }
 25    }()
 26
 27    // Give server time to start
 28    time.Sleep(100 * time.Millisecond)
 29
 30    // Simulate multiple clients
 31    var wg sync.WaitGroup
 32    for i := 0; i < 3; i++ {
 33        wg.Add(1)
 34        go func(id int) {
 35            defer wg.Done()
 36            simulateClient(id)
 37        }(i)
 38    }
 39
 40    wg.Wait()
 41    time.Sleep(100 * time.Millisecond)
 42
 43    stats := server.Stats()
 44    fmt.Printf("\nServer Statistics:\n")
 45    fmt.Printf("  Total Connections: %d\n", stats.TotalConnections)
 46    fmt.Printf("  Active Connections: %d\n", stats.ActiveConnections)
 47    fmt.Printf("  Messages Processed: %d\n", stats.MessagesProcessed)
 48}
 49
 50type TCPServer struct {
 51    addr              string
 52    listener          net.Listener
 53    wg                sync.WaitGroup
 54    totalConnections  atomic.Int64
 55    activeConnections atomic.Int64
 56    messagesProcessed atomic.Int64
 57}
 58
 59type ServerStats struct {
 60    TotalConnections  int64
 61    ActiveConnections int64
 62    MessagesProcessed int64
 63}
 64
 65func NewTCPServer(addr string) *TCPServer {
 66    return &TCPServer{addr: addr}
 67}
 68
 69func Start(ctx context.Context) error {
 70    var err error
 71    s.listener, err = net.Listen("tcp", s.addr)
 72    if err != nil {
 73        return err
 74    }
 75    defer s.listener.Close()
 76
 77    fmt.Printf("TCP Server started on %s\n", s.addr)
 78
 79    // Accept connections until context is cancelled
 80    go func() {
 81        <-ctx.Done()
 82        s.listener.Close()
 83    }()
 84
 85    for {
 86        conn, err := s.listener.Accept()
 87        if err != nil {
 88            // Check if shutdown was requested
 89            select {
 90            case <-ctx.Done():
 91                break
 92            default:
 93                fmt.Println("Accept error:", err)
 94                continue
 95            }
 96            break
 97        }
 98
 99        s.totalConnections.Add(1)
100        s.activeConnections.Add(1)
101
102        s.wg.Add(1)
103        go s.handleConnection(conn)
104    }
105
106    // Wait for all connections to finish
107    s.wg.Wait()
108    return nil
109}
110
111func handleConnection(conn net.Conn) {
112    defer s.wg.Done()
113    defer conn.Close()
114    defer s.activeConnections.Add(-1)
115
116    // Set read timeout
117    conn.SetReadDeadline(time.Now().Add(5 * time.Second))
118
119    scanner := bufio.NewScanner(conn)
120    for scanner.Scan() {
121        s.messagesProcessed.Add(1)
122        msg := scanner.Text()
123
124        // Process message
125        response := fmt.Sprintf("Processed: %s\n", msg)
126        conn.Write([]byte(response))
127
128        // Reset deadline
129        conn.SetReadDeadline(time.Now().Add(5 * time.Second))
130    }
131}
132
133func Stats() ServerStats {
134    return ServerStats{
135        TotalConnections:  s.totalConnections.Load(),
136        ActiveConnections: s.activeConnections.Load(),
137        MessagesProcessed: s.messagesProcessed.Load(),
138    }
139}
140
141func simulateClient(id int) {
142    conn, err := net.Dial("tcp", "localhost:9090")
143    if err != nil {
144        fmt.Printf("Client %d error: %v\n", id, err)
145        return
146    }
147    defer conn.Close()
148
149    msg := fmt.Sprintf("Message from client %d", id)
150    fmt.Fprintf(conn, "%s\n", msg)
151
152    response, _ := bufio.NewReader(conn).ReadString('\n')
153    fmt.Printf("Client %d received: %s", id, response)
154}

TCP Client with Retry Logic

Network connections are unreliable—servers go down, networks fail, and timeouts occur. A production client needs to be resilient, like a determined delivery driver who tries multiple routes and times to deliver a package.

⚠️ Important: Network failures are the rule, not the exception. Always implement retry logic with exponential backoff to handle temporary failures gracefully.

Production clients need robust error handling and retry mechanisms:

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6    "time"
 7)
 8
 9// run
10func main() {
11    client := NewTCPClient(RetryConfig{
12        MaxRetries:  3,
13        InitialWait: 100 * time.Millisecond,
14        MaxWait:     2 * time.Second,
15        Multiplier:  2.0,
16    })
17
18    // Try to connect to a server
19    conn, err := client.Connect("localhost:8888")
20    if err != nil {
21        fmt.Printf("Failed to connect: %v\n", err)
22    } else {
23        fmt.Println("Connected successfully")
24        conn.Close()
25    }
26}
27
28type RetryConfig struct {
29    MaxRetries  int
30    InitialWait time.Duration
31    MaxWait     time.Duration
32    Multiplier  float64
33}
34
35type TCPClient struct {
36    config RetryConfig
37}
38
39func NewTCPClient(config RetryConfig) *TCPClient {
40    return &TCPClient{config: config}
41}
42
43func Connect(addr string) {
44    var conn net.Conn
45    var err error
46
47    wait := c.config.InitialWait
48
49    for attempt := 0; attempt <= c.config.MaxRetries; attempt++ {
50        if attempt > 0 {
51            fmt.Printf("Retry attempt %d/%d...\n",
52                attempt, c.config.MaxRetries, wait)
53            time.Sleep(wait)
54
55            // Exponential backoff
56            wait = time.Duration(float64(wait) * c.config.Multiplier)
57            if wait > c.config.MaxWait {
58                wait = c.config.MaxWait
59            }
60        }
61
62        fmt.Printf("Connecting to %s...\n", addr)
63        conn, err = net.DialTimeout("tcp", addr, 2*time.Second)
64        if err == nil {
65            return conn, nil
66        }
67
68        fmt.Printf("Connection failed: %v\n", err)
69    }
70
71    return nil, fmt.Errorf("failed after %d attempts: %w", c.config.MaxRetries+1, err)
72}
73
74func ConnectWithContext(addr string, timeout time.Duration) {
75    var d net.Dialer
76    d.Timeout = timeout
77
78    return d.Dial("tcp", addr)
79}

Binary Protocol TCP Server

While text protocols are human-readable, binary protocols are more efficient. Think of this as the difference between writing a letter in English versus using a secret code—binary protocols use the most compact representation possible.

Real-world Example

Database protocols like PostgreSQL and MySQL use binary protocols for efficiency. Redis, the popular in-memory database, uses a simple binary protocol that's both fast and easy to implement.

Example of a TCP server using a binary protocol:

  1package main
  2
  3import (
  4    "encoding/binary"
  5    "fmt"
  6    "io"
  7    "net"
  8    "time"
  9)
 10
 11// run
 12func main() {
 13    // Start binary protocol server
 14    go startBinaryServer()
 15
 16    time.Sleep(100 * time.Millisecond)
 17
 18    // Test client
 19    testBinaryClient()
 20}
 21
 22// Message format:
 23// [4 bytes: length][1 byte: type][N bytes: payload]
 24type Message struct {
 25    Type    byte
 26    Payload []byte
 27}
 28
 29const (
 30    MsgTypeEcho    byte = 1
 31    MsgTypeReverse byte = 2
 32    MsgTypeQuit    byte = 3
 33)
 34
 35func startBinaryServer() {
 36    listener, err := net.Listen("tcp", "localhost:7070")
 37    if err != nil {
 38        fmt.Println("Server error:", err)
 39        return
 40    }
 41    defer listener.Close()
 42
 43    fmt.Println("Binary protocol server listening on localhost:7070")
 44
 45    conn, err := listener.Accept()
 46    if err != nil {
 47        return
 48    }
 49
 50    handleBinaryConnection(conn)
 51}
 52
 53func handleBinaryConnection(conn net.Conn) {
 54    defer conn.Close()
 55
 56    for {
 57        msg, err := readMessage(conn)
 58        if err != nil {
 59            if err != io.EOF {
 60                fmt.Println("Read error:", err)
 61            }
 62            return
 63        }
 64
 65        fmt.Printf("Received message type %d: %s\n", msg.Type, string(msg.Payload))
 66
 67        var response Message
 68        switch msg.Type {
 69        case MsgTypeEcho:
 70            response = Message{Type: MsgTypeEcho, Payload: msg.Payload}
 71        case MsgTypeReverse:
 72            reversed := reverseBytes(msg.Payload)
 73            response = Message{Type: MsgTypeReverse, Payload: reversed}
 74        case MsgTypeQuit:
 75            fmt.Println("Quit message received")
 76            return
 77        default:
 78            fmt.Printf("Unknown message type: %d\n", msg.Type)
 79            continue
 80        }
 81
 82        if err := writeMessage(conn, response); err != nil {
 83            fmt.Println("Write error:", err)
 84            return
 85        }
 86    }
 87}
 88
 89func readMessage(conn net.Conn) {
 90    // Read length
 91    var length uint32
 92    if err := binary.Read(conn, binary.BigEndian, &length); err != nil {
 93        return Message{}, err
 94    }
 95
 96    // Read type
 97    var msgType byte
 98    if err := binary.Read(conn, binary.BigEndian, &msgType); err != nil {
 99        return Message{}, err
100    }
101
102    // Read payload
103    payload := make([]byte, length)
104    if _, err := io.ReadFull(conn, payload); err != nil {
105        return Message{}, err
106    }
107
108    return Message{Type: msgType, Payload: payload}, nil
109}
110
111func writeMessage(conn net.Conn, msg Message) error {
112    // Write length
113    length := uint32(len(msg.Payload))
114    if err := binary.Write(conn, binary.BigEndian, length); err != nil {
115        return err
116    }
117
118    // Write type
119    if err := binary.Write(conn, binary.BigEndian, msg.Type); err != nil {
120        return err
121    }
122
123    // Write payload
124    _, err := conn.Write(msg.Payload)
125    return err
126}
127
128func reverseBytes(b []byte) []byte {
129    result := make([]byte, len(b))
130    for i := 0; i < len(b); i++ {
131        result[i] = b[len(b)-1-i]
132    }
133    return result
134}
135
136func testBinaryClient() {
137    conn, err := net.Dial("tcp", "localhost:7070")
138    if err != nil {
139        fmt.Println("Client error:", err)
140        return
141    }
142    defer conn.Close()
143
144    // Send echo message
145    writeMessage(conn, Message{Type: MsgTypeEcho, Payload: []byte("Hello")})
146    response, _ := readMessage(conn)
147    fmt.Printf("Echo response: %s\n", string(response.Payload))
148
149    // Send reverse message
150    writeMessage(conn, Message{Type: MsgTypeReverse, Payload: []byte("World")})
151    response, _ = readMessage(conn)
152    fmt.Printf("Reverse response: %s\n", string(response.Payload))
153
154    // Send quit
155    writeMessage(conn, Message{Type: MsgTypeQuit, Payload: nil})
156}

UDP Communication

When Speed Trumps Reliability: UDP (User Datagram Protocol) is connectionless—like sending a postcard instead of registered mail. You send data and hope it arrives, but you get no confirmation and no guarantees about order or delivery. This sounds risky, but for certain applications, it's exactly what you need.

Use Cases for UDP: Real-time video streaming can tolerate occasional frame loss but can't tolerate delays from retransmission. Online gaming needs low latency more than perfect reliability—a dropped position update isn't worth the delay of retransmission. DNS queries are simple request-response pairs where UDP's overhead savings matter.

UDP's Hidden Power: Without connection state, a single UDP server can efficiently handle millions of clients. Without retransmission logic, UDP has minimal latency. For broadcast and multicast scenarios, UDP is often the only practical choice.

If TCP is like a phone conversation, UDP is like sending postcards—fast, lightweight, but with no guarantee of delivery or order. UDP is connectionless and provides no guarantees about delivery, ordering, or duplicate protection. It's ideal for applications where low latency is more important than reliability.

When to Use UDP vs TCP

  • UDP: Gaming, video streaming, voice calls, DNS lookups
  • TCP: Web browsing, email, file transfers

Basic UDP Server and Client

UDP Performance Characteristics: UDP has lower latency than TCP because there's no connection setup and no acknowledgment overhead. For a local network, UDP can achieve sub-millisecond latency, while TCP typically requires several milliseconds for the handshake alone.

UDP Packet Size Considerations: UDP packets larger than the MTU (typically 1500 bytes for Ethernet) get fragmented at the IP layer. Fragmentation can cause problems: if any fragment is lost, the entire packet is lost. For maximum reliability, keep UDP packets under 1472 bytes (1500 - 20 IP header - 8 UDP header).

MTU Discovery: The Path MTU (PMTU) can vary across networks. While TCP handles this automatically, UDP applications may need to implement their own MTU discovery or use conservative packet sizes. For internet applications, 512 bytes is safe; 1024 bytes is usually safe; 1472 bytes is optimal for most networks.

Let's explore UDP with a simple echo server. Notice how much simpler the code is compared to TCP—there's no need to maintain connections or handle the complexity of reliable delivery.

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6    "time"
 7)
 8
 9// run
10func main() {
11    // Start UDP server
12    go startUDPServer()
13
14    time.Sleep(100 * time.Millisecond)
15
16    // Test UDP client
17    testUDPClient()
18}
19
20func startUDPServer() {
21    addr, err := net.ResolveUDPAddr("udp", "localhost:6060")
22    if err != nil {
23        fmt.Println("Resolve error:", err)
24        return
25    }
26
27    conn, err := net.ListenUDP("udp", addr)
28    if err != nil {
29        fmt.Println("Listen error:", err)
30        return
31    }
32    defer conn.Close()
33
34    fmt.Println("UDP server listening on localhost:6060")
35
36    buffer := make([]byte, 1024)
37
38    // Handle a few packets for demo
39    for i := 0; i < 5; i++ {
40        n, clientAddr, err := conn.ReadFromUDP(buffer)
41        if err != nil {
42            fmt.Println("Read error:", err)
43            continue
44        }
45
46        msg := string(buffer[:n])
47        fmt.Printf("Received from %s: %s\n", clientAddr, msg)
48
49        // Echo back
50        response := fmt.Sprintf("Echo: %s", msg)
51        conn.WriteToUDP([]byte(response), clientAddr)
52    }
53}
54
55func testUDPClient() {
56    addr, err := net.ResolveUDPAddr("udp", "localhost:6060")
57    if err != nil {
58        fmt.Println("Resolve error:", err)
59        return
60    }
61
62    conn, err := net.DialUDP("udp", nil, addr)
63    if err != nil {
64        fmt.Println("Dial error:", err)
65        return
66    }
67    defer conn.Close()
68
69    messages := []string{"Hello", "UDP", "World"}
70
71    for _, msg := range messages {
72        fmt.Printf("Sending: %s\n", msg)
73        conn.Write([]byte(msg))
74
75        // Read response
76        buffer := make([]byte, 1024)
77        conn.SetReadDeadline(time.Now().Add(1 * time.Second))
78        n, err := conn.Read(buffer)
79        if err != nil {
80            fmt.Println("Read error:", err)
81            continue
82        }
83
84        fmt.Printf("Response: %s\n", string(buffer[:n]))
85    }
86}

UDP Multicast

Multicast is like a radio broadcast—one sender transmits, and multiple receivers can tune in to listen. This is incredibly efficient for applications like stock quote feeds, video streaming, or service discovery where you need to send the same data to many recipients.

Real-world Example

Video conferencing apps often use multicast for screen sharing. The host's screen data is sent once, and all participants receive it simultaneously, reducing bandwidth usage compared to sending individual streams to each participant.

UDP supports multicast for sending data to multiple receivers:

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6    "time"
 7)
 8
 9// run
10func main() {
11    // Start multicast listeners
12    go startMulticastListener("Listener-1")
13    go startMulticastListener("Listener-2")
14
15    time.Sleep(200 * time.Millisecond)
16
17    // Send multicast messages
18    sendMulticast()
19}
20
21const (
22    multicastAddr = "224.0.0.1:9999"
23)
24
25func startMulticastListener(name string) {
26    addr, err := net.ResolveUDPAddr("udp", multicastAddr)
27    if err != nil {
28        fmt.Printf("%s: Resolve error: %v\n", name, err)
29        return
30    }
31
32    conn, err := net.ListenMulticastUDP("udp", nil, addr)
33    if err != nil {
34        fmt.Printf("%s: Listen error: %v\n", name, err)
35        return
36    }
37    defer conn.Close()
38
39    fmt.Printf("%s: Listening on %s\n", name, multicastAddr)
40
41    buffer := make([]byte, 1024)
42    conn.SetReadDeadline(time.Now().Add(2 * time.Second))
43
44    for {
45        n, src, err := conn.ReadFromUDP(buffer)
46        if err != nil {
47            break
48        }
49
50        msg := string(buffer[:n])
51        fmt.Printf("%s: Received from %s: %s\n", name, src, msg)
52    }
53}
54
55func sendMulticast() {
56    addr, err := net.ResolveUDPAddr("udp", multicastAddr)
57    if err != nil {
58        fmt.Println("Resolve error:", err)
59        return
60    }
61
62    conn, err := net.DialUDP("udp", nil, addr)
63    if err != nil {
64        fmt.Println("Dial error:", err)
65        return
66    }
67    defer conn.Close()
68
69    messages := []string{"Multicast Message 1", "Multicast Message 2"}
70
71    for _, msg := range messages {
72        fmt.Printf("Broadcasting: %s\n", msg)
73        conn.Write([]byte(msg))
74        time.Sleep(100 * time.Millisecond)
75    }
76}

Reliable UDP with Acknowledgments

Sometimes you need the speed of UDP with the reliability of TCP. This is like certified mail—you send a postcard but require a return receipt to confirm delivery. By adding acknowledgments, we can build reliability on top of UDP's lightweight foundation.

⚠️ Important: Building reliable protocols is complex! Before implementing your own, consider whether TCP meets your needs. Only implement custom reliability when you have very specific requirements.

Implementing reliability on top of UDP:

  1package main
  2
  3import (
  4    "encoding/binary"
  5    "fmt"
  6    "net"
  7    "sync"
  8    "time"
  9)
 10
 11// run
 12func main() {
 13    // Start reliable UDP server
 14    go startReliableUDPServer()
 15
 16    time.Sleep(100 * time.Millisecond)
 17
 18    // Test reliable client
 19    testReliableUDPClient()
 20}
 21
 22type ReliablePacket struct {
 23    SeqNum  uint32
 24    Ack     bool
 25    Payload []byte
 26}
 27
 28func encodePacket(pkt ReliablePacket) []byte {
 29    buf := make([]byte, 5+len(pkt.Payload))
 30    binary.BigEndian.PutUint32(buf[0:4], pkt.SeqNum)
 31    if pkt.Ack {
 32        buf[4] = 1
 33    } else {
 34        buf[4] = 0
 35    }
 36    copy(buf[5:], pkt.Payload)
 37    return buf
 38}
 39
 40func decodePacket(data []byte) ReliablePacket {
 41    if len(data) < 5 {
 42        return ReliablePacket{}
 43    }
 44
 45    return ReliablePacket{
 46        SeqNum:  binary.BigEndian.Uint32(data[0:4]),
 47        Ack:     data[4] == 1,
 48        Payload: data[5:],
 49    }
 50}
 51
 52func startReliableUDPServer() {
 53    addr, _ := net.ResolveUDPAddr("udp", "localhost:5050")
 54    conn, err := net.ListenUDP("udp", addr)
 55    if err != nil {
 56        fmt.Println("Server error:", err)
 57        return
 58    }
 59    defer conn.Close()
 60
 61    fmt.Println("Reliable UDP server listening on localhost:5050")
 62
 63    buffer := make([]byte, 1024)
 64
 65    for i := 0; i < 3; i++ {
 66        n, clientAddr, err := conn.ReadFromUDP(buffer)
 67        if err != nil {
 68            continue
 69        }
 70
 71        pkt := decodePacket(buffer[:n])
 72        if pkt.Ack {
 73            continue // Skip ACK packets
 74        }
 75
 76        fmt.Printf("Received packet #%d: %s\n", pkt.SeqNum, string(pkt.Payload))
 77
 78        // Send ACK
 79        ack := ReliablePacket{SeqNum: pkt.SeqNum, Ack: true}
 80        conn.WriteToUDP(encodePacket(ack), clientAddr)
 81    }
 82}
 83
 84func testReliableUDPClient() {
 85    addr, _ := net.ResolveUDPAddr("udp", "localhost:5050")
 86    conn, err := net.DialUDP("udp", nil, addr)
 87    if err != nil {
 88        fmt.Println("Client error:", err)
 89        return
 90    }
 91    defer conn.Close()
 92
 93    sender := &ReliableUDPSender{
 94        conn:    conn,
 95        timeout: 500 * time.Millisecond,
 96        maxRetries: 3,
 97    }
 98
 99    messages := []string{"Packet 1", "Packet 2", "Packet 3"}
100
101    for i, msg := range messages {
102        if err := sender.Send(uint32(i+1), []byte(msg)); err != nil {
103            fmt.Printf("Failed to send: %v\n", err)
104        } else {
105            fmt.Printf("Successfully sent and acknowledged: %s\n", msg)
106        }
107    }
108}
109
110type ReliableUDPSender struct {
111    conn       *net.UDPConn
112    timeout    time.Duration
113    maxRetries int
114    mu         sync.Mutex
115}
116
117func Send(seqNum uint32, payload []byte) error {
118    s.mu.Lock()
119    defer s.mu.Unlock()
120
121    pkt := ReliablePacket{SeqNum: seqNum, Ack: false, Payload: payload}
122    data := encodePacket(pkt)
123
124    for attempt := 0; attempt < s.maxRetries; attempt++ {
125        // Send packet
126        _, err := s.conn.Write(data)
127        if err != nil {
128            return err
129        }
130
131        // Wait for ACK
132        s.conn.SetReadDeadline(time.Now().Add(s.timeout))
133        buffer := make([]byte, 1024)
134        n, err := s.conn.Read(buffer)
135
136        if err == nil {
137            ack := decodePacket(buffer[:n])
138            if ack.Ack && ack.SeqNum == seqNum {
139                return nil // Success
140            }
141        }
142
143        fmt.Printf("Retry attempt %d for packet #%d\n", attempt+1, seqNum)
144    }
145
146    return fmt.Errorf("failed to receive ACK after %d attempts", s.maxRetries)
147}

UDP Packet Loss Handling

UDP's unreliability means packets can be lost, duplicated, or reordered. Production UDP applications need strategies to handle these issues.

Common Strategies:

  • Sequence Numbers: Add sequence numbers to detect loss and reordering
  • Acknowledgments: Receivers confirm receipt, sender retransmits if no ACK
  • Timeout and Retry: Retransmit after timeout
  • Redundancy: Send critical data multiple times
  • Forward Error Correction: Add redundant data for reconstruction

When to Use Each Strategy: Low-latency applications (gaming) prefer redundancy over retransmission. Critical data transfer uses ACKs and retransmission. Streaming media uses forward error correction to avoid retransmission latency.

UDP Socket Options

UDP sockets support various options for controlling behavior. Understanding these options helps you tune performance and reliability.

Buffer Sizes: OS buffers hold received packets. Too small and packets drop during burst traffic; too large and you waste memory. Default is usually 64KB-256KB.

Broadcast and Multicast: UDP supports sending to multiple recipients simultaneously—impossible with TCP. Broadcast reaches all hosts on local network; multicast reaches hosts that join specific groups.

IPv6 Considerations: IPv6 disabled broadcast entirely (use multicast instead). Multicast addresses have specific ranges. Link-local multicast reaches the local network; site-local reaches the organization; global reaches the entire internet.

DNS Lookups and Custom Resolvers

The Internet's Phone Book: DNS (Domain Name System) translates human-readable domain names into IP addresses. When you type golang.org into your browser, DNS resolution happens behind the scenes, converting that name into an IP address like 142.250.191.78.

Beyond Simple Lookups: DNS provides more than just A/AAAA records (IPv4/IPv6 addresses). MX records direct email, TXT records store arbitrary data (often used for verification and configuration), SRV records enable service discovery, and CNAME records create aliases.

Performance Implications: DNS lookups can be slow—often 20-100ms for an uncached query. For high-performance applications, DNS caching is essential. Many network failures are actually DNS failures: timeouts, NXDOMAIN errors, or incorrect responses.

DNS is the phonebook of the internet—it translates human-readable domain names like google.com into IP addresses that computers use to communicate. Without DNS, we'd have to remember IP addresses like 142.250.191.78 instead of google.com.

Go's net package provides comprehensive DNS resolution capabilities that go far beyond simple hostname lookups.

Basic DNS Lookups

DNS Resolution Process: When you call net.LookupIP, Go follows a complex process: check /etc/hosts, query local DNS resolver, resolver may cache or forward to authoritative servers, response travels back through the chain. This can involve multiple servers and take 20-200ms.

DNS Caching Strategies: Go's default resolver uses the system's DNS settings, including its cache. For high-performance applications, implement application-level caching with TTL respect. Balance cache freshness against lookup cost—typical TTLs range from 60 seconds to 24 hours.

DNS Security Considerations: DNS is vulnerable to spoofing and cache poisoning. DNSSEC provides cryptographic verification but isn't universally deployed. For critical applications, validate DNS responses against expected values and use HTTPS to establish secure connections even if DNS is compromised.

Let's start with the fundamentals. DNS can do much more than just look up IP addresses—it can find mail servers, text records, and even help with service discovery.

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6    "time"
 7)
 8
 9// run
10func main() {
11    // Simple hostname lookup
12    ips, err := net.LookupIP("golang.org")
13    if err != nil {
14        fmt.Println("Lookup error:", err)
15        return
16    }
17
18    fmt.Println("golang.org resolves to:")
19    for _, ip := range ips {
20        fmt.Printf("  %s\n", ip)
21    }
22
23    // Reverse DNS lookup
24    names, err := net.LookupAddr("8.8.8.8")
25    if err != nil {
26        fmt.Println("Reverse lookup error:", err)
27    } else {
28        fmt.Println("\n8.8.8.8 reverse lookup:")
29        for _, name := range names {
30            fmt.Printf("  %s\n", name)
31        }
32    }
33
34    // MX records
35    mxRecords, err := net.LookupMX("gmail.com")
36    if err != nil {
37        fmt.Println("MX lookup error:", err)
38    } else {
39        fmt.Println("\nMX records for gmail.com:")
40        for _, mx := range mxRecords {
41            fmt.Printf("  %s\n", mx.Host, mx.Pref)
42        }
43    }
44
45    // TXT records
46    txtRecords, err := net.LookupTXT("google.com")
47    if err != nil {
48        fmt.Println("TXT lookup error:", err)
49    } else {
50        fmt.Println("\nTXT records for google.com:")
51        for _, txt := range txtRecords {
52            fmt.Printf("  %s\n", txt)
53        }
54    }
55}

DNS Lookup with Timeout

Network operations can hang indefinitely if you're not careful. Think of this as setting a deadline for a phone call—if the other party doesn't answer within a reasonable time, you hang up and try again or move on. DNS timeouts are crucial for building responsive applications.

 1package main
 2
 3import (
 4    "context"
 5    "fmt"
 6    "net"
 7    "time"
 8)
 9
10// run
11func main() {
12    // Create resolver with timeout
13    resolver := &net.Resolver{
14        PreferGo: true,
15        Dial: func(ctx context.Context, network, address string) {
16            d := net.Dialer{
17                Timeout: time.Second * 2,
18            }
19            return d.DialContext(ctx, network, address)
20        },
21    }
22
23    // Lookup with context timeout
24    ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
25    defer cancel()
26
27    start := time.Now()
28    ips, err := resolver.LookupIP(ctx, "ip", "golang.org")
29    elapsed := time.Since(start)
30
31    if err != nil {
32        fmt.Printf("Lookup error: %v\n", err)
33        return
34    }
35
36    fmt.Printf("Lookup completed in %s\n", elapsed)
37    fmt.Println("Results:")
38    for _, ip := range ips {
39        fmt.Printf("  %s\n", ip)
40    }
41}

Custom DNS Resolver

DNS lookups can be slow, especially when you're repeatedly looking up the same domains. This is like keeping a phone book on your desk instead of looking up numbers online every time. A caching resolver dramatically improves performance by storing recent lookups.

Real-world Example

Content Delivery Networks like Cloudflare heavily optimize DNS caching. When you visit a popular website, the DNS response might be cached at multiple levels, making subsequent visits nearly instant.

Implementing a custom DNS resolver with caching:

  1package main
  2
  3import (
  4    "context"
  5    "fmt"
  6    "net"
  7    "sync"
  8    "time"
  9)
 10
 11// run
 12func main() {
 13    resolver := NewCachingResolver(5 * time.Minute)
 14
 15    // First lookup
 16    start := time.Now()
 17    ips1, err := resolver.LookupIP(context.Background(), "golang.org")
 18    elapsed1 := time.Since(start)
 19
 20    if err != nil {
 21        fmt.Println("Error:", err)
 22        return
 23    }
 24
 25    fmt.Printf("First lookup took %s\n", elapsed1)
 26    fmt.Println("IPs:", ips1)
 27
 28    // Second lookup
 29    start = time.Now()
 30    ips2, err := resolver.LookupIP(context.Background(), "golang.org")
 31    elapsed2 := time.Since(start)
 32
 33    fmt.Printf("\nSecond lookup took %s\n", elapsed2)
 34    fmt.Println("IPs:", ips2)
 35
 36    // Show cache stats
 37    fmt.Printf("\nCache Stats: %d entries\n", resolver.CacheSize())
 38}
 39
 40type CachingResolver struct {
 41    cache      map[string]*cacheEntry
 42    cacheMu    sync.RWMutex
 43    ttl        time.Duration
 44    resolver   *net.Resolver
 45}
 46
 47type cacheEntry struct {
 48    ips       []net.IP
 49    timestamp time.Time
 50}
 51
 52func NewCachingResolver(ttl time.Duration) *CachingResolver {
 53    return &CachingResolver{
 54        cache: make(map[string]*cacheEntry),
 55        ttl:   ttl,
 56        resolver: &net.Resolver{
 57            PreferGo: true,
 58        },
 59    }
 60}
 61
 62func LookupIP(ctx context.Context, host string) {
 63    // Check cache first
 64    r.cacheMu.RLock()
 65    if entry, exists := r.cache[host]; exists {
 66        if time.Since(entry.timestamp) < r.ttl {
 67            r.cacheMu.RUnlock()
 68            return entry.ips, nil
 69        }
 70    }
 71    r.cacheMu.RUnlock()
 72
 73    // Cache miss or expired - perform lookup
 74    ips, err := r.resolver.LookupIP(ctx, "ip", host)
 75    if err != nil {
 76        return nil, err
 77    }
 78
 79    // Update cache
 80    r.cacheMu.Lock()
 81    r.cache[host] = &cacheEntry{
 82        ips:       ips,
 83        timestamp: time.Now(),
 84    }
 85    r.cacheMu.Unlock()
 86
 87    return ips, nil
 88}
 89
 90func CacheSize() int {
 91    r.cacheMu.RLock()
 92    defer r.cacheMu.RUnlock()
 93    return len(r.cache)
 94}
 95
 96func ClearCache() {
 97    r.cacheMu.Lock()
 98    defer r.cacheMu.Unlock()
 99    r.cache = make(map[string]*cacheEntry)
100}

DNS-Based Service Discovery

In microservices architectures, services need to find each other dynamically. DNS SRV records are like automated service directories—they tell you not just where a service is, but also which instances are healthy and how to reach them.

Real-world Example

Kubernetes uses DNS for service discovery internally. When you have a service named "user-service" running on 3 pods, Kubernetes creates DNS records that automatically load balance across healthy pods. Other services can simply connect to "user-service.default.svc.cluster.local" without worrying about individual pod IPs.

Using DNS SRV records for service discovery:

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6)
 7
 8// run
 9func main() {
10    // SRV lookup format: _service._proto.name
11    // Example: _xmpp-server._tcp.gmail.com
12
13    cname, addrs, err := net.LookupSRV("xmpp-server", "tcp", "gmail.com")
14    if err != nil {
15        fmt.Println("SRV lookup error:", err)
16        return
17    }
18
19    fmt.Printf("Canonical name: %s\n\n", cname)
20    fmt.Println("Service records:")
21
22    for _, addr := range addrs {
23        fmt.Printf("  Target: %s\n", addr.Target)
24        fmt.Printf("  Port: %d\n", addr.Port)
25        fmt.Printf("  Priority: %d\n", addr.Priority)
26        fmt.Printf("  Weight: %d\n\n", addr.Weight)
27    }
28
29    // Simulate service discovery
30    if len(addrs) > 0 {
31        serviceAddr := fmt.Sprintf("%s:%d", addrs[0].Target, addrs[0].Port)
32        fmt.Printf("Connecting to service at: %s\n", serviceAddr)
33    }
34}

Connection Pooling and Timeouts

The Hidden Cost of Connections: Creating a new TCP connection involves DNS lookup, TCP handshake, and possibly TLS negotiation. This can take 100-300ms—an eternity for a modern application. Connection pooling reuses existing connections, reducing latency from 100ms to 1ms.

Timeout Strategies: Timeouts are your first line of defense against hanging applications. Without timeouts, a single slow service can cascade into complete system failure. Go provides multiple timeout mechanisms: dial timeouts, read/write deadlines, and context cancellation.

Connection Health: Not all idle connections are healthy. Network equipment might silently drop connections, remote servers might crash, and middleboxes might interfere. Production connection pools need health checks, automatic reconnection, and graceful degradation.

Now let's dive into production patterns that separate hobby projects from robust, scalable applications. Connection management is like managing a fleet of delivery trucks—you want to reuse them efficiently, maintain them properly, and replace them when they break down.

Efficient connection management is crucial for production applications.

Connection Pool Implementation

Pool Sizing: How many connections should you pool? Too few and you'll bottleneck; too many and you waste resources. A good starting point is 2-10 connections per target host, scaled based on your request rate and latency. Monitor pool utilization and adjust accordingly.

Connection Validation: Pooled connections can become stale—the remote end might close them, or network issues might make them unusable. Implement validation: try to use the connection, catch errors, and create a fresh connection if validation fails.

Pool Monitoring: Production pools need metrics: active connections, idle connections, wait time, creation rate, error rate. These metrics reveal bottlenecks (too small), waste (too large), and connection problems (high error rate).

Creating new connections is expensive—think of it like hiring and training new employees versus reusing experienced ones. A connection pool maintains a set of ready-to-use connections, dramatically improving performance for applications that make frequent network requests.

  1package main
  2
  3import (
  4    "context"
  5    "errors"
  6    "fmt"
  7    "net"
  8    "sync"
  9    "time"
 10)
 11
 12// run
 13func main() {
 14    // Start a test server
 15    go startTestServer()
 16    time.Sleep(100 * time.Millisecond)
 17
 18    // Create connection pool
 19    pool := NewConnPool("localhost:8888", 3, 10*time.Second)
 20
 21    // Use connections
 22    var wg sync.WaitGroup
 23    for i := 0; i < 5; i++ {
 24        wg.Add(1)
 25        go func(id int) {
 26            defer wg.Done()
 27
 28            conn, err := pool.Get()
 29            if err != nil {
 30                fmt.Printf("Worker %d: Failed to get connection: %v\n", id, err)
 31                return
 32            }
 33
 34            fmt.Printf("Worker %d: Got connection\n", id)
 35
 36            // Use connection
 37            time.Sleep(100 * time.Millisecond)
 38
 39            // Return to pool
 40            pool.Put(conn)
 41            fmt.Printf("Worker %d: Returned connection\n", id)
 42        }(i)
 43    }
 44
 45    wg.Wait()
 46
 47    // Show stats
 48    fmt.Printf("\nPool stats: %d active, %d idle\n",
 49        pool.ActiveCount(), pool.IdleCount())
 50
 51    pool.Close()
 52}
 53
 54type ConnPool struct {
 55    addr         string
 56    maxSize      int
 57    idleTimeout  time.Duration
 58    connections  chan *PooledConn
 59    active       int
 60    mu           sync.Mutex
 61    closed       bool
 62}
 63
 64type PooledConn struct {
 65    net.Conn
 66    pool      *ConnPool
 67    createdAt time.Time
 68    lastUsed  time.Time
 69}
 70
 71func NewConnPool(addr string, maxSize int, idleTimeout time.Duration) *ConnPool {
 72    pool := &ConnPool{
 73        addr:        addr,
 74        maxSize:     maxSize,
 75        idleTimeout: idleTimeout,
 76        connections: make(chan *PooledConn, maxSize),
 77    }
 78
 79    // Start connection reaper
 80    go pool.reaper()
 81
 82    return pool
 83}
 84
 85func Get() {
 86    p.mu.Lock()
 87    if p.closed {
 88        p.mu.Unlock()
 89        return nil, errors.New("pool is closed")
 90    }
 91    p.mu.Unlock()
 92
 93    // Try to get from pool
 94    select {
 95    case conn := <-p.connections:
 96        // Check if connection is still valid
 97        if time.Since(conn.lastUsed) > p.idleTimeout {
 98            conn.Close()
 99            return p.createConnection()
100        }
101        conn.lastUsed = time.Now()
102        return conn, nil
103    default:
104        return p.createConnection()
105    }
106}
107
108func Put(conn *PooledConn) {
109    if conn == nil {
110        return
111    }
112
113    conn.lastUsed = time.Now()
114
115    select {
116    case p.connections <- conn:
117        // Successfully returned to pool
118    default:
119        // Pool is full, close connection
120        conn.Conn.Close()
121        p.mu.Lock()
122        p.active--
123        p.mu.Unlock()
124    }
125}
126
127func createConnection() {
128    p.mu.Lock()
129    if p.active >= p.maxSize {
130        p.mu.Unlock()
131        return nil, errors.New("pool is at maximum capacity")
132    }
133    p.active++
134    p.mu.Unlock()
135
136    conn, err := net.DialTimeout("tcp", p.addr, 5*time.Second)
137    if err != nil {
138        p.mu.Lock()
139        p.active--
140        p.mu.Unlock()
141        return nil, err
142    }
143
144    now := time.Now()
145    return &PooledConn{
146        Conn:      conn,
147        pool:      p,
148        createdAt: now,
149        lastUsed:  now,
150    }, nil
151}
152
153func reaper() {
154    ticker := time.NewTicker(30 * time.Second)
155    defer ticker.Stop()
156
157    for range ticker.C {
158        p.mu.Lock()
159        if p.closed {
160            p.mu.Unlock()
161            return
162        }
163        p.mu.Unlock()
164
165        // Check idle connections
166        for {
167            select {
168            case conn := <-p.connections:
169                if time.Since(conn.lastUsed) > p.idleTimeout {
170                    conn.Close()
171                    p.mu.Lock()
172                    p.active--
173                    p.mu.Unlock()
174                } else {
175                    // Put it back
176                    p.connections <- conn
177                    return
178                }
179            default:
180                return
181            }
182        }
183    }
184}
185
186func ActiveCount() int {
187    p.mu.Lock()
188    defer p.mu.Unlock()
189    return p.active
190}
191
192func IdleCount() int {
193    return len(p.connections)
194}
195
196func Close() {
197    p.mu.Lock()
198    p.closed = true
199    p.mu.Unlock()
200
201    close(p.connections)
202    for conn := range p.connections {
203        conn.Close()
204    }
205}
206
207func startTestServer() {
208    listener, _ := net.Listen("tcp", "localhost:8888")
209    defer listener.Close()
210
211    for {
212        conn, err := listener.Accept()
213        if err != nil {
214            return
215        }
216        go func(c net.Conn) {
217            time.Sleep(500 * time.Millisecond)
218            c.Close()
219        }(conn)
220    }
221}

Advanced Connection Management Example

Here's a comprehensive example demonstrating production-ready connection management with all the best practices we've discussed:

  1package main
  2
  3import (
  4    "context"
  5    "fmt"
  6    "net"
  7    "sync"
  8    "sync/atomic"
  9    "time"
 10)
 11
 12// run
 13func main() {
 14    // Create production-grade connection manager
 15    manager := NewConnectionManager(ConnectionConfig{
 16        MaxConnections:    100,
 17        MaxIdleTime:       5 * time.Minute,
 18        HealthCheckPeriod: 30 * time.Second,
 19        ConnectTimeout:    10 * time.Second,
 20    })
 21
 22    // Simulate using connections
 23    ctx := context.Background()
 24    for i := 0; i < 5; i++ {
 25        conn, err := manager.Get(ctx, "localhost:8080")
 26        if err != nil {
 27            fmt.Printf("Failed to get connection: %v\n", err)
 28            continue
 29        }
 30
 31        // Use connection
 32        fmt.Printf("Got connection: %v\n", conn.RemoteAddr())
 33
 34        // Return to pool
 35        manager.Put(conn)
 36    }
 37
 38    // Show statistics
 39    stats := manager.Stats()
 40    fmt.Printf("\nConnection Stats:\n")
 41    fmt.Printf("  Total: %d\n", stats.Total)
 42    fmt.Printf("  Active: %d\n", stats.Active)
 43    fmt.Printf("  Idle: %d\n", stats.Idle)
 44
 45    manager.Close()
 46}
 47
 48// ConnectionConfig holds configuration for connection management
 49type ConnectionConfig struct {
 50    MaxConnections    int           // Maximum total connections
 51    MaxIdleTime       time.Duration // How long idle connections are kept
 52    HealthCheckPeriod time.Duration // How often to check connection health
 53    ConnectTimeout    time.Duration // Timeout for creating new connections
 54}
 55
 56// ConnectionManager manages a pool of network connections
 57type ConnectionManager struct {
 58    config    ConnectionConfig
 59    mu        sync.RWMutex
 60    pools     map[string]*connPool    // Pool per destination
 61    totalConn atomic.Int64             // Total connections across all pools
 62    closed    atomic.Bool             // Whether manager is closed
 63}
 64
 65// connPool manages connections to a single destination
 66type connPool struct {
 67    addr      string
 68    conns     chan *pooledConn
 69    active    atomic.Int32
 70    mu        sync.Mutex
 71}
 72
 73// pooledConn wraps a connection with metadata
 74type pooledConn struct {
 75    net.Conn
 76    pool       *connPool
 77    createdAt  time.Time
 78    lastUsed   time.Time
 79    useCount   int
 80}
 81
 82// ConnectionStats provides statistics about the connection manager
 83type ConnectionStats struct {
 84    Total  int64 // Total connections managed
 85    Active int64 // Currently active connections
 86    Idle   int64 // Idle connections in pools
 87}
 88
 89func NewConnectionManager(config ConnectionConfig) *ConnectionManager {
 90    cm := &ConnectionManager{
 91        config: config,
 92        pools:  make(map[string]*connPool),
 93    }
 94
 95    // Start background health checker
 96    go cm.healthChecker()
 97
 98    return cm
 99}
100
101// Get retrieves a connection from the pool, creating one if necessary
102func Get(ctx context.Context, addr string) {
103    if cm.closed.Load() {
104        return nil, fmt.Errorf("connection manager is closed")
105    }
106
107    // Get or create pool for this destination
108    pool := cm.getOrCreatePool(addr)
109
110    // Try to get existing connection
111    select {
112    case conn := <-pool.conns:
113        // Validate connection is still good
114        if time.Since(conn.lastUsed) < cm.config.MaxIdleTime {
115            conn.lastUsed = time.Now()
116            conn.useCount++
117            pool.active.Add(1)
118            return conn, nil
119        }
120        // Connection too old, close it
121        conn.Close()
122        cm.totalConn.Add(-1)
123    case <-ctx.Done():
124        return nil, ctx.Err()
125    default:
126        // No idle connection available
127    }
128
129    // Create new connection
130    return cm.createConnection(ctx, addr, pool)
131}
132
133// Put returns a connection to the pool
134func Put(conn *pooledConn) {
135    if conn == nil {
136        return
137    }
138
139    conn.pool.active.Add(-1)
140
141    // Try to return to pool
142    select {
143    case conn.pool.conns <- conn:
144        // Successfully returned to pool
145    default:
146        // Pool is full, close connection
147        conn.Close()
148        cm.totalConn.Add(-1)
149    }
150}
151
152// getOrCreatePool gets existing pool or creates new one
153func getOrCreatePool(addr string) *connPool {
154    cm.mu.RLock()
155    pool, exists := cm.pools[addr]
156    cm.mu.RUnlock()
157
158    if exists {
159        return pool
160    }
161
162    // Need to create pool
163    cm.mu.Lock()
164    defer cm.mu.Unlock()
165
166    // Check again in case another goroutine created it
167    if pool, exists := cm.pools[addr]; exists {
168        return pool
169    }
170
171    // Create new pool
172    pool = &connPool{
173        addr:  addr,
174        conns: make(chan *pooledConn, cm.config.MaxConnections/10),
175    }
176    cm.pools[addr] = pool
177
178    return pool
179}
180
181// createConnection creates a new connection
182func createConnection(ctx context.Context, addr string, pool *connPool) {
183    // Check connection limit
184    if cm.totalConn.Load() >= int64(cm.config.MaxConnections) {
185        return nil, fmt.Errorf("connection limit reached")
186    }
187
188    // Create connection with timeout
189    dialer := &net.Dialer{
190        Timeout: cm.config.ConnectTimeout,
191    }
192
193    conn, err := dialer.DialContext(ctx, "tcp", addr)
194    if err != nil {
195        return nil, err
196    }
197
198    // Wrap in pooled connection
199    pooledConn := &pooledConn{
200        Conn:      conn,
201        pool:      pool,
202        createdAt: time.Now(),
203        lastUsed:  time.Now(),
204        useCount:  1,
205    }
206
207    cm.totalConn.Add(1)
208    pool.active.Add(1)
209
210    return pooledConn, nil
211}
212
213// healthChecker periodically checks connection health
214func healthChecker() {
215    ticker := time.NewTicker(cm.config.HealthCheckPeriod)
216    defer ticker.Stop()
217
218    for range ticker.C {
219        if cm.closed.Load() {
220            return
221        }
222
223        cm.checkAllPools()
224    }
225}
226
227// checkAllPools checks health of all connections in all pools
228func checkAllPools() {
229    cm.mu.RLock()
230    pools := make([]*connPool, 0, len(cm.pools))
231    for _, pool := range cm.pools {
232        pools = append(pools, pool)
233    }
234    cm.mu.RUnlock()
235
236    for _, pool := range pools {
237        cm.checkPool(pool)
238    }
239}
240
241// checkPool checks health of connections in a single pool
242func checkPool(pool *connPool) {
243    // Drain pool temporarily
244    conns := make([]*pooledConn, 0)
245
246    for {
247        select {
248        case conn := <-pool.conns:
249            conns = append(conns, conn)
250        default:
251            goto done
252        }
253    }
254
255done:
256    // Check each connection and return healthy ones
257    for _, conn := range conns {
258        if time.Since(conn.lastUsed) > cm.config.MaxIdleTime {
259            // Connection too old
260            conn.Close()
261            cm.totalConn.Add(-1)
262            continue
263        }
264
265        // Return healthy connection
266        select {
267        case pool.conns <- conn:
268        default:
269            // Pool full (shouldn't happen but be safe)
270            conn.Close()
271            cm.totalConn.Add(-1)
272        }
273    }
274}
275
276// Stats returns connection statistics
277func Stats() ConnectionStats {
278    cm.mu.RLock()
279    defer cm.mu.RUnlock()
280
281    var idle int64
282    for _, pool := range cm.pools {
283        idle += int64(len(pool.conns))
284    }
285
286    total := cm.totalConn.Load()
287    return ConnectionStats{
288        Total:  total,
289        Active: total - idle,
290        Idle:   idle,
291    }
292}
293
294// Close closes the connection manager and all connections
295func Close() error {
296    if !cm.closed.CompareAndSwap(false, true) {
297        return fmt.Errorf("already closed")
298    }
299
300    cm.mu.Lock()
301    defer cm.mu.Unlock()
302
303    // Close all pools
304    for _, pool := range cm.pools {
305        close(pool.conns)
306        for conn := range pool.conns {
307            conn.Close()
308        }
309    }
310
311    cm.pools = make(map[string]*connPool)
312    cm.totalConn.Store(0)
313
314    return nil
315}

This example demonstrates:

  • Connection Pooling: Reuse connections efficiently
  • Health Checking: Background checking and cleanup
  • Resource Limits: Enforce maximum connections
  • Context Support: Cancellation and timeouts
  • Thread Safety: Safe concurrent access
  • Statistics: Monitor pool usage

Pool Sizing: How many connections should you pool? Too few and you'll bottleneck; too many and you waste resources. A good starting point is 2-10 connections per target host, scaled based on your request rate and latency. Monitor pool utilization and adjust accordingly.

Connection Validation: Pooled connections can become stale—the remote end might close them, or network issues might make them unusable. Implement validation: try to use the connection, catch errors, and create a fresh connection if validation fails.

Pool Monitoring: Production pools need metrics: active connections, idle connections, wait time, creation rate, error rate. These metrics reveal bottlenecks (too small), waste (too large), and connection problems (high error rate).

Creating new connections is expensive—think of it like hiring and training new employees versus reusing experienced ones. A connection pool maintains a set of ready-to-use connections, dramatically improving performance for applications that make frequent network requests.

  1package main
  2
  3import (
  4    "context"
  5    "errors"
  6    "fmt"
  7    "net"
  8    "sync"
  9    "time"
 10)
 11
 12// run
 13func main() {
 14    // Start a test server
 15    go startTestServer()
 16    time.Sleep(100 * time.Millisecond)
 17
 18    // Create connection pool
 19    pool := NewConnPool("localhost:8888", 3, 10*time.Second)
 20
 21    // Use connections
 22    var wg sync.WaitGroup
 23    for i := 0; i < 5; i++ {
 24        wg.Add(1)
 25        go func(id int) {
 26            defer wg.Done()
 27
 28            conn, err := pool.Get()
 29            if err != nil {
 30                fmt.Printf("Worker %d: Failed to get connection: %v\n", id, err)
 31                return
 32            }
 33
 34            fmt.Printf("Worker %d: Got connection\n", id)
 35
 36            // Use connection
 37            time.Sleep(100 * time.Millisecond)
 38
 39            // Return to pool
 40            pool.Put(conn)
 41            fmt.Printf("Worker %d: Returned connection\n", id)
 42        }(i)
 43    }
 44
 45    wg.Wait()
 46
 47    // Show stats
 48    fmt.Printf("\nPool stats: %d active, %d idle\n",
 49        pool.ActiveCount(), pool.IdleCount())
 50
 51    pool.Close()
 52}
 53
 54type ConnPool struct {
 55    addr         string
 56    maxSize      int
 57    idleTimeout  time.Duration
 58    connections  chan *PooledConn
 59    active       int
 60    mu           sync.Mutex
 61    closed       bool
 62}
 63
 64type PooledConn struct {
 65    net.Conn
 66    pool      *ConnPool
 67    createdAt time.Time
 68    lastUsed  time.Time
 69}
 70
 71func NewConnPool(addr string, maxSize int, idleTimeout time.Duration) *ConnPool {
 72    pool := &ConnPool{
 73        addr:        addr,
 74        maxSize:     maxSize,
 75        idleTimeout: idleTimeout,
 76        connections: make(chan *PooledConn, maxSize),
 77    }
 78
 79    // Start connection reaper
 80    go pool.reaper()
 81
 82    return pool
 83}
 84
 85func Get() {
 86    p.mu.Lock()
 87    if p.closed {
 88        p.mu.Unlock()
 89        return nil, errors.New("pool is closed")
 90    }
 91    p.mu.Unlock()
 92
 93    // Try to get from pool
 94    select {
 95    case conn := <-p.connections:
 96        // Check if connection is still valid
 97        if time.Since(conn.lastUsed) > p.idleTimeout {
 98            conn.Close()
 99            return p.createConnection()
100        }
101        conn.lastUsed = time.Now()
102        return conn, nil
103    default:
104        return p.createConnection()
105    }
106}
107
108func Put(conn *PooledConn) {
109    if conn == nil {
110        return
111    }
112
113    conn.lastUsed = time.Now()
114
115    select {
116    case p.connections <- conn:
117        // Successfully returned to pool
118    default:
119        // Pool is full, close connection
120        conn.Conn.Close()
121        p.mu.Lock()
122        p.active--
123        p.mu.Unlock()
124    }
125}
126
127func createConnection() {
128    p.mu.Lock()
129    if p.active >= p.maxSize {
130        p.mu.Unlock()
131        return nil, errors.New("pool is at maximum capacity")
132    }
133    p.active++
134    p.mu.Unlock()
135
136    conn, err := net.DialTimeout("tcp", p.addr, 5*time.Second)
137    if err != nil {
138        p.mu.Lock()
139        p.active--
140        p.mu.Unlock()
141        return nil, err
142    }
143
144    now := time.Now()
145    return &PooledConn{
146        Conn:      conn,
147        pool:      p,
148        createdAt: now,
149        lastUsed:  now,
150    }, nil
151}
152
153func reaper() {
154    ticker := time.NewTicker(30 * time.Second)
155    defer ticker.Stop()
156
157    for range ticker.C {
158        p.mu.Lock()
159        if p.closed {
160            p.mu.Unlock()
161            return
162        }
163        p.mu.Unlock()
164
165        // Check idle connections
166        for {
167            select {
168            case conn := <-p.connections:
169                if time.Since(conn.lastUsed) > p.idleTimeout {
170                    conn.Close()
171                    p.mu.Lock()
172                    p.active--
173                    p.mu.Unlock()
174                } else {
175                    // Put it back
176                    p.connections <- conn
177                    return
178                }
179            default:
180                return
181            }
182        }
183    }
184}
185
186func ActiveCount() int {
187    p.mu.Lock()
188    defer p.mu.Unlock()
189    return p.active
190}
191
192func IdleCount() int {
193    return len(p.connections)
194}
195
196func Close() {
197    p.mu.Lock()
198    p.closed = true
199    p.mu.Unlock()
200
201    close(p.connections)
202    for conn := range p.connections {
203        conn.Close()
204    }
205}
206
207func startTestServer() {
208    listener, _ := net.Listen("tcp", "localhost:8888")
209    defer listener.Close()
210
211    for {
212        conn, err := listener.Accept()
213        if err != nil {
214            return
215        }
216        go func(c net.Conn) {
217            time.Sleep(500 * time.Millisecond)
218            c.Close()
219        }(conn)
220    }
221}

Advanced Timeout Patterns

⚠️ Critical: Always use timeouts in production! Without them, your application can hang indefinitely waiting for network responses, leading to cascading failures.

Timeouts are like setting deadlines for tasks—they prevent your application from waiting forever for something that might never happen. Let's explore different timeout strategies for various scenarios.

 1package main
 2
 3import (
 4    "context"
 5    "fmt"
 6    "io"
 7    "net"
 8    "time"
 9)
10
11// run
12func main() {
13    // Demonstrate different timeout patterns
14    demonstrateDialTimeout()
15    demonstrateReadWriteTimeout()
16    demonstrateContextTimeout()
17}
18
19func demonstrateDialTimeout() {
20    fmt.Println("=== Dial Timeout ===")
21
22    start := time.Now()
23    conn, err := net.DialTimeout("tcp", "192.0.2.1:80", 2*time.Second)
24    elapsed := time.Since(start)
25
26    if err != nil {
27        fmt.Printf("Dial failed after %s: %v\n", elapsed, err)
28    } else {
29        fmt.Println("Connected")
30        conn.Close()
31    }
32}
33
34func demonstrateReadWriteTimeout() {
35    fmt.Println("\n=== Read/Write Timeout ===")
36
37    // Simulated connection
38    server, client := net.Pipe()
39    defer server.Close()
40    defer client.Close()
41
42    // Set read deadline
43    client.SetReadDeadline(time.Now().Add(1 * time.Second))
44
45    go func() {
46        time.Sleep(2 * time.Second)
47        server.Write([]byte("Too late!"))
48    }()
49
50    buffer := make([]byte, 1024)
51    start := time.Now()
52    _, err := client.Read(buffer)
53    elapsed := time.Since(start)
54
55    if err != nil {
56        fmt.Printf("Read timeout after %s: %v\n", elapsed, err)
57    }
58}
59
60func demonstrateContextTimeout() {
61    fmt.Println("\n=== Context Timeout ===")
62
63    ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
64    defer cancel()
65
66    start := time.Now()
67    conn, err := dialWithContext(ctx, "tcp", "192.0.2.1:80")
68    elapsed := time.Since(start)
69
70    if err != nil {
71        fmt.Printf("Context dial failed after %s: %v\n", elapsed, err)
72    } else {
73        fmt.Println("Connected")
74        conn.Close()
75    }
76}
77
78func dialWithContext(ctx context.Context, network, addr string) {
79    var d net.Dialer
80    return d.DialContext(ctx, network, addr)
81}

Keep-Alive Configuration

TCP keep-alive is like periodically saying "Are you still there?" during a long phone conversation. It prevents silent connections from being dropped by intermediate routers and firewalls, which often close idle connections to save resources.

Common Pitfalls with Keep-Alive

  • Too frequent: Creates unnecessary network traffic
  • Too infrequent: Connections might be dropped by network equipment
  • Platform differences: Default behavior varies across operating systems

Real-world Example

Database connections use keep-alive extensively. A connection pool might maintain connections for hours, and keep-alive packets ensure these connections remain usable even when there's no active traffic.

 1package main
 2
 3import (
 4    "fmt"
 5    "net"
 6    "time"
 7)
 8
 9// run
10func main() {
11    // Demonstrate TCP keep-alive configuration
12
13    listener, err := net.Listen("tcp", "localhost:7777")
14    if err != nil {
15        fmt.Println("Listen error:", err)
16        return
17    }
18    defer listener.Close()
19
20    fmt.Println("Server listening with keep-alive configuration")
21
22    go func() {
23        conn, err := listener.Accept()
24        if err != nil {
25            return
26        }
27        defer conn.Close()
28
29        // Configure keep-alive
30        if tcpConn, ok := conn.(*net.TCPConn); ok {
31            // Enable keep-alive
32            tcpConn.SetKeepAlive(true)
33
34            // Set keep-alive period
35            tcpConn.SetKeepAlivePeriod(30 * time.Second)
36
37            fmt.Println("Server: Keep-alive enabled")
38        }
39
40        // Keep connection open
41        io.Copy(io.Discard, conn)
42    }()
43
44    // Client connection
45    time.Sleep(100 * time.Millisecond)
46    conn, err := net.Dial("tcp", "localhost:7777")
47    if err != nil {
48        fmt.Println("Dial error:", err)
49        return
50    }
51    defer conn.Close()
52
53    if tcpConn, ok := conn.(*net.TCPConn); ok {
54        tcpConn.SetKeepAlive(true)
55        tcpConn.SetKeepAlivePeriod(30 * time.Second)
56        fmt.Println("Client: Keep-alive enabled")
57    }
58
59    time.Sleep(100 * time.Millisecond)
60    fmt.Println("Connection established with keep-alive")
61}

Custom Protocol Implementation

Why Custom Protocols: While HTTP covers many use cases, sometimes you need something specialized. Real-time gaming needs minimal overhead. IoT devices need energy-efficient protocols. Financial systems need precise message ordering and guaranteed delivery semantics.

Protocol Design Principles: Good protocols are versioned (for compatibility), framed (to detect message boundaries), efficient (minimal overhead), and debuggable (human-readable when possible). Binary protocols are fast but hard to debug; text protocols are slower but easier to troubleshoot.

Learning from the Masters: Study existing protocols: Redis uses a simple text protocol (RESP), WebSocket uses framed binary messages, HTTP/2 uses binary framing with multiplexing. Each design choice reflects specific requirements and trade-offs.

Sometimes existing protocols don't quite fit your needs. Creating a custom protocol is like designing your own language—you decide the vocabulary, grammar, and conversation rules.

Let's implement a complete custom protocol similar to HTTP but simpler.

Simple Request-Response Protocol

We'll create a simple text-based protocol that demonstrates the key concepts of protocol design. This protocol will have methods, paths, headers, and bodies—just like HTTP, but simplified for learning purposes.

  1package main
  2
  3import (
  4    "bufio"
  5    "fmt"
  6    "io"
  7    "net"
  8    "strconv"
  9    "strings"
 10    "time"
 11)
 12
 13// run
 14func main() {
 15    // Start protocol server
 16    go startProtocolServer()
 17
 18    time.Sleep(100 * time.Millisecond)
 19
 20    // Test client
 21    testProtocolClient()
 22}
 23
 24// Protocol format:
 25// Request:  METHOD path\nHeader: value\n\n[body]
 26// Response: STATUS message\nHeader: value\n\n[body]
 27
 28type Request struct {
 29    Method  string
 30    Path    string
 31    Headers map[string]string
 32    Body    []byte
 33}
 34
 35type Response struct {
 36    Status  int
 37    Message string
 38    Headers map[string]string
 39    Body    []byte
 40}
 41
 42func startProtocolServer() {
 43    listener, err := net.Listen("tcp", "localhost:4040")
 44    if err != nil {
 45        fmt.Println("Server error:", err)
 46        return
 47    }
 48    defer listener.Close()
 49
 50    fmt.Println("Protocol server listening on localhost:4040")
 51
 52    for i := 0; i < 3; i++ {
 53        conn, err := listener.Accept()
 54        if err != nil {
 55            continue
 56        }
 57
 58        handleProtocolConnection(conn)
 59    }
 60}
 61
 62func handleProtocolConnection(conn net.Conn) {
 63    defer conn.Close()
 64
 65    req, err := readRequest(conn)
 66    if err != nil {
 67        fmt.Println("Read request error:", err)
 68        return
 69    }
 70
 71    fmt.Printf("Received: %s %s\n", req.Method, req.Path)
 72
 73    // Process request
 74    var resp Response
 75
 76    switch req.Method {
 77    case "GET":
 78        resp = Response{
 79            Status:  200,
 80            Message: "OK",
 81            Headers: map[string]string{"Content-Type": "text/plain"},
 82            Body:    []byte("Hello from server!"),
 83        }
 84    case "POST":
 85        resp = Response{
 86            Status:  201,
 87            Message: "Created",
 88            Headers: map[string]string{"Content-Type": "text/plain"},
 89            Body:    []byte(fmt.Sprintf("Received %d bytes", len(req.Body))),
 90        }
 91    default:
 92        resp = Response{
 93            Status:  400,
 94            Message: "Bad Request",
 95            Headers: map[string]string{},
 96            Body:    []byte("Unknown method"),
 97        }
 98    }
 99
100    writeResponse(conn, resp)
101}
102
103func readRequest(conn net.Conn) {
104    reader := bufio.NewReader(conn)
105
106    // Read request line
107    line, err := reader.ReadString('\n')
108    if err != nil {
109        return Request{}, err
110    }
111
112    parts := strings.Fields(line)
113    if len(parts) != 2 {
114        return Request{}, fmt.Errorf("invalid request line")
115    }
116
117    req := Request{
118        Method:  parts[0],
119        Path:    parts[1],
120        Headers: make(map[string]string),
121    }
122
123    // Read headers
124    for {
125        line, err := reader.ReadString('\n')
126        if err != nil {
127            return Request{}, err
128        }
129
130        line = strings.TrimSpace(line)
131        if line == "" {
132            break // End of headers
133        }
134
135        parts := strings.SplitN(line, ":", 2)
136        if len(parts) == 2 {
137            req.Headers[strings.TrimSpace(parts[0])] = strings.TrimSpace(parts[1])
138        }
139    }
140
141    // Read body if Content-Length is specified
142    if lengthStr, ok := req.Headers["Content-Length"]; ok {
143        length, err := strconv.Atoi(lengthStr)
144        if err == nil && length > 0 {
145            req.Body = make([]byte, length)
146            io.ReadFull(reader, req.Body)
147        }
148    }
149
150    return req, nil
151}
152
153func writeResponse(conn net.Conn, resp Response) error {
154    writer := bufio.NewWriter(conn)
155
156    // Write status line
157    fmt.Fprintf(writer, "%d %s\n", resp.Status, resp.Message)
158
159    // Write headers
160    for key, value := range resp.Headers {
161        fmt.Fprintf(writer, "%s: %s\n", key, value)
162    }
163
164    // Write Content-Length
165    fmt.Fprintf(writer, "Content-Length: %d\n", len(resp.Body))
166
167    // Empty line
168    writer.WriteString("\n")
169
170    // Write body
171    writer.Write(resp.Body)
172
173    return writer.Flush()
174}
175
176func testProtocolClient() {
177    // Test GET request
178    conn, err := net.Dial("tcp", "localhost:4040")
179    if err != nil {
180        fmt.Println("Client error:", err)
181        return
182    }
183
184    req := Request{
185        Method:  "GET",
186        Path:    "/test",
187        Headers: map[string]string{"User-Agent": "TestClient/1.0"},
188    }
189
190    writeRequest(conn, req)
191    resp, _ := readResponse(conn)
192    fmt.Printf("Response: %d %s\n", resp.Status, resp.Message)
193    fmt.Printf("Body: %s\n", string(resp.Body))
194    conn.Close()
195
196    // Test POST request
197    conn, _ = net.Dial("tcp", "localhost:4040")
198    req = Request{
199        Method:  "POST",
200        Path:    "/data",
201        Headers: map[string]string{"Content-Type": "text/plain"},
202        Body:    []byte("Test data"),
203    }
204
205    writeRequest(conn, req)
206    resp, _ = readResponse(conn)
207    fmt.Printf("\nResponse: %d %s\n", resp.Status, resp.Message)
208    fmt.Printf("Body: %s\n", string(resp.Body))
209    conn.Close()
210}
211
212func writeRequest(conn net.Conn, req Request) error {
213    writer := bufio.NewWriter(conn)
214
215    fmt.Fprintf(writer, "%s %s\n", req.Method, req.Path)
216
217    for key, value := range req.Headers {
218        fmt.Fprintf(writer, "%s: %s\n", key, value)
219    }
220
221    if len(req.Body) > 0 {
222        fmt.Fprintf(writer, "Content-Length: %d\n", len(req.Body))
223    }
224
225    writer.WriteString("\n")
226
227    if len(req.Body) > 0 {
228        writer.Write(req.Body)
229    }
230
231    return writer.Flush()
232}
233
234func readResponse(conn net.Conn) {
235    reader := bufio.NewReader(conn)
236
237    // Read status line
238    line, err := reader.ReadString('\n')
239    if err != nil {
240        return Response{}, err
241    }
242
243    parts := strings.SplitN(strings.TrimSpace(line), " ", 2)
244    if len(parts) != 2 {
245        return Response{}, fmt.Errorf("invalid status line")
246    }
247
248    status, _ := strconv.Atoi(parts[0])
249
250    resp := Response{
251        Status:  status,
252        Message: parts[1],
253        Headers: make(map[string]string),
254    }
255
256    // Read headers
257    for {
258        line, err := reader.ReadString('\n')
259        if err != nil {
260            return Response{}, err
261        }
262
263        line = strings.TrimSpace(line)
264        if line == "" {
265            break
266        }
267
268        parts := strings.SplitN(line, ":", 2)
269        if len(parts) == 2 {
270            resp.Headers[strings.TrimSpace(parts[0])] = strings.TrimSpace(parts[1])
271        }
272    }
273
274    // Read body
275    if lengthStr, ok := resp.Headers["Content-Length"]; ok {
276        length, err := strconv.Atoi(lengthStr)
277        if err == nil && length > 0 {
278            resp.Body = make([]byte, length)
279            io.ReadFull(reader, resp.Body)
280        }
281    }
282
283    return resp, nil
284}

Protocol with Streaming Support

Some applications need to send continuous streams of data rather than discrete request/response pairs. Think of this as the difference between sending a letter versus having a continuous phone conversation. Streaming protocols are perfect for real-time data, file transfers, and live updates.

Real-world Example

WebSocket connections use streaming protocols extensively. When you're in a Google Doc, changes stream in real-time to all collaborators. Each change is a small message in a continuous stream, rather than a series of separate HTTP requests.

  1package main
  2
  3import (
  4    "bufio"
  5    "fmt"
  6    "io"
  7    "net"
  8    "time"
  9)
 10
 11// run
 12func main() {
 13    // Start streaming server
 14    go startStreamingServer()
 15
 16    time.Sleep(100 * time.Millisecond)
 17
 18    // Test streaming client
 19    testStreamingClient()
 20}
 21
 22// Stream protocol: Each message is length-prefixed
 23// [4 bytes: length][N bytes: data]
 24
 25func startStreamingServer() {
 26    listener, err := net.Listen("tcp", "localhost:3030")
 27    if err != nil {
 28        fmt.Println("Server error:", err)
 29        return
 30    }
 31    defer listener.Close()
 32
 33    fmt.Println("Streaming server listening on localhost:3030")
 34
 35    conn, err := listener.Accept()
 36    if err != nil {
 37        return
 38    }
 39    defer conn.Close()
 40
 41    // Stream data to client
 42    messages := []string{
 43        "First message",
 44        "Second message",
 45        "Third message",
 46        "Final message",
 47    }
 48
 49    for _, msg := range messages {
 50        if err := writeMessage(conn, []byte(msg)); err != nil {
 51            fmt.Println("Write error:", err)
 52            return
 53        }
 54        fmt.Printf("Sent: %s\n", msg)
 55        time.Sleep(100 * time.Millisecond)
 56    }
 57}
 58
 59func writeMessage(w io.Writer, data []byte) error {
 60    // Write length prefix
 61    length := uint32(len(data))
 62    lengthBytes := []byte{
 63        byte(length >> 24),
 64        byte(length >> 16),
 65        byte(length >> 8),
 66        byte(length),
 67    }
 68
 69    if _, err := w.Write(lengthBytes); err != nil {
 70        return err
 71    }
 72
 73    // Write data
 74    _, err := w.Write(data)
 75    return err
 76}
 77
 78func readMessage(r io.Reader) {
 79    // Read length prefix
 80    lengthBytes := make([]byte, 4)
 81    if _, err := io.ReadFull(r, lengthBytes); err != nil {
 82        return nil, err
 83    }
 84
 85    length := uint32(lengthBytes[0])<<24 |
 86        uint32(lengthBytes[1])<<16 |
 87        uint32(lengthBytes[2])<<8 |
 88        uint32(lengthBytes[3])
 89
 90    // Read data
 91    data := make([]byte, length)
 92    _, err := io.ReadFull(r, data)
 93    return data, err
 94}
 95
 96func testStreamingClient() {
 97    conn, err := net.Dial("tcp", "localhost:3030")
 98    if err != nil {
 99        fmt.Println("Client error:", err)
100        return
101    }
102    defer conn.Close()
103
104    fmt.Println("Client connected, receiving stream...")
105
106    for {
107        msg, err := readMessage(conn)
108        if err != nil {
109            if err != io.EOF {
110                fmt.Println("Read error:", err)
111            }
112            break
113        }
114
115        fmt.Printf("Received: %s\n", string(msg))
116    }
117
118    fmt.Println("Stream ended")
119}

Production Patterns

Battle-Tested Patterns: Building production network services requires more than just functional code—you need patterns that handle failures gracefully, scale efficiently, and maintain reliability under load.

Graceful Degradation: When a dependency fails, good services degrade gracefully rather than failing completely. Circuit breakers prevent cascading failures, rate limiters protect against overload, and timeouts prevent hanging operations.

Observability: You can't fix what you can't see. Production services need metrics (requests per second, error rates, latency percentiles), logs (structured, correlated, searchable), and traces (to understand request flows through distributed systems).

Building a network application that works is one thing; building one that's reliable, maintainable, and production-ready is another. These patterns are like best practices for running a restaurant—it's not just about cooking good food, but also about handling rushes, dealing with emergencies, and keeping customers happy even when things go wrong.

Graceful Shutdown

When you need to update your application or shut it down for maintenance, you don't want to abruptly disconnect users. Graceful shutdown is like politely asking customers to finish their meals before closing—you stop accepting new connections but let existing ones complete their work.

  1package main
  2
  3import (
  4    "context"
  5    "fmt"
  6    "net"
  7    "os"
  8    "os/signal"
  9    "sync"
 10    "syscall"
 11    "time"
 12)
 13
 14// run
 15func main() {
 16    server := NewGracefulServer("localhost:2020")
 17
 18    // Setup signal handling
 19    sigChan := make(chan os.Signal, 1)
 20    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
 21
 22    // Start server
 23    go func() {
 24        if err := server.Start(); err != nil {
 25            fmt.Println("Server error:", err)
 26        }
 27    }()
 28
 29    // Simulate some clients
 30    time.Sleep(100 * time.Millisecond)
 31    for i := 0; i < 3; i++ {
 32        go simulateLongRunningClient(i)
 33    }
 34
 35    // Wait for signal
 36    time.Sleep(500 * time.Millisecond)
 37    fmt.Println("\nInitiating graceful shutdown...")
 38
 39    // Shutdown
 40    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
 41    defer cancel()
 42
 43    if err := server.Shutdown(ctx); err != nil {
 44        fmt.Println("Shutdown error:", err)
 45    }
 46
 47    fmt.Println("Server stopped gracefully")
 48}
 49
 50type GracefulServer struct {
 51    addr     string
 52    listener net.Listener
 53    wg       sync.WaitGroup
 54    shutdown chan struct{}
 55    mu       sync.Mutex
 56}
 57
 58func NewGracefulServer(addr string) *GracefulServer {
 59    return &GracefulServer{
 60        addr:     addr,
 61        shutdown: make(chan struct{}),
 62    }
 63}
 64
 65func Start() error {
 66    var err error
 67    s.listener, err = net.Listen("tcp", s.addr)
 68    if err != nil {
 69        return err
 70    }
 71
 72    fmt.Printf("Server listening on %s\n", s.addr)
 73
 74    for {
 75        conn, err := s.listener.Accept()
 76        if err != nil {
 77            select {
 78            case <-s.shutdown:
 79                return nil
 80            default:
 81                fmt.Println("Accept error:", err)
 82                continue
 83            }
 84        }
 85
 86        s.wg.Add(1)
 87        go s.handleConnection(conn)
 88    }
 89}
 90
 91func handleConnection(conn net.Conn) {
 92    defer s.wg.Done()
 93    defer conn.Close()
 94
 95    fmt.Printf("Client connected: %s\n", conn.RemoteAddr())
 96
 97    // Simulate long-running operation
 98    ticker := time.NewTicker(100 * time.Millisecond)
 99    defer ticker.Stop()
100
101    for i := 0; i < 10; i++ {
102        select {
103        case <-s.shutdown:
104            fmt.Printf("Client %s: Shutdown requested, cleaning up...\n", conn.RemoteAddr())
105            return
106        case <-ticker.C:
107            conn.Write([]byte(fmt.Sprintf("Message %d\n", i)))
108        }
109    }
110
111    fmt.Printf("Client %s: Completed normally\n", conn.RemoteAddr())
112}
113
114func Shutdown(ctx context.Context) error {
115    s.mu.Lock()
116    defer s.mu.Unlock()
117
118    // Signal shutdown
119    close(s.shutdown)
120
121    // Stop accepting new connections
122    if s.listener != nil {
123        s.listener.Close()
124    }
125
126    // Wait for existing connections with timeout
127    done := make(chan struct{})
128    go func() {
129        s.wg.Wait()
130        close(done)
131    }()
132
133    select {
134    case <-done:
135        fmt.Println("All connections closed gracefully")
136        return nil
137    case <-ctx.Done():
138        return fmt.Errorf("shutdown timeout: %w", ctx.Err())
139    }
140}
141
142func simulateLongRunningClient(id int) {
143    conn, err := net.Dial("tcp", "localhost:2020")
144    if err != nil {
145        return
146    }
147    defer conn.Close()
148
149    // Read messages
150    buffer := make([]byte, 1024)
151    for {
152        n, err := conn.Read(buffer)
153        if err != nil {
154            return
155        }
156        fmt.Printf("Client %d: %s", id, string(buffer[:n]))
157    }
158}

Circuit Breaker Pattern

What happens when a downstream service becomes unresponsive? Without protection, your application might spend all its time waiting for a service that will never reply. The circuit breaker pattern is like a safety valve that automatically opens when pressure gets too high—it temporarily stops trying to contact failing services, giving them time to recover.

Real-world Example

Netflix famously pioneered the circuit breaker pattern. When their recommendation service starts failing, the circuit breaker opens, and Netflix shows cached recommendations instead. The user experience degrades gracefully instead of breaking completely.

  1package main
  2
  3import (
  4    "errors"
  5    "fmt"
  6    "sync"
  7    "time"
  8)
  9
 10// run
 11func main() {
 12    cb := NewCircuitBreaker(3, 2*time.Second)
 13
 14    // Simulate requests
 15    for i := 0; i < 10; i++ {
 16        err := cb.Call(func() error {
 17            // Simulate a failing service
 18            if i < 5 {
 19                return errors.New("service unavailable")
 20            }
 21            return nil
 22        })
 23
 24        if err != nil {
 25            fmt.Printf("Request %d: %v\n", i, err, cb.State())
 26        } else {
 27            fmt.Printf("Request %d: Success\n", i, cb.State())
 28        }
 29
 30        time.Sleep(300 * time.Millisecond)
 31    }
 32}
 33
 34type CircuitState int
 35
 36const (
 37    StateClosed CircuitState = iota
 38    StateOpen
 39    StateHalfOpen
 40)
 41
 42func String() string {
 43    switch s {
 44    case StateClosed:
 45        return "Closed"
 46    case StateOpen:
 47        return "Open"
 48    case StateHalfOpen:
 49        return "Half-Open"
 50    default:
 51        return "Unknown"
 52    }
 53}
 54
 55type CircuitBreaker struct {
 56    maxFailures  int
 57    resetTimeout time.Duration
 58
 59    mu           sync.Mutex
 60    state        CircuitState
 61    failures     int
 62    lastFailTime time.Time
 63}
 64
 65func NewCircuitBreaker(maxFailures int, resetTimeout time.Duration) *CircuitBreaker {
 66    return &CircuitBreaker{
 67        maxFailures:  maxFailures,
 68        resetTimeout: resetTimeout,
 69        state:        StateClosed,
 70    }
 71}
 72
 73func Call(fn func() error) error {
 74    cb.mu.Lock()
 75
 76    // Check if we should transition from Open to Half-Open
 77    if cb.state == StateOpen {
 78        if time.Since(cb.lastFailTime) > cb.resetTimeout {
 79            cb.state = StateHalfOpen
 80            cb.failures = 0
 81            fmt.Println("Circuit breaker transitioning to Half-Open")
 82        } else {
 83            cb.mu.Unlock()
 84            return errors.New("circuit breaker is open")
 85        }
 86    }
 87
 88    cb.mu.Unlock()
 89
 90    // Execute the function
 91    err := fn()
 92
 93    cb.mu.Lock()
 94    defer cb.mu.Unlock()
 95
 96    if err != nil {
 97        cb.failures++
 98        cb.lastFailTime = time.Now()
 99
100        if cb.failures >= cb.maxFailures {
101            cb.state = StateOpen
102            fmt.Println("Circuit breaker opened due to failures")
103        }
104
105        return err
106    }
107
108    // Success - reset if we were in Half-Open
109    if cb.state == StateHalfOpen {
110        cb.state = StateClosed
111        cb.failures = 0
112        fmt.Println("Circuit breaker closed after successful Half-Open request")
113    }
114
115    return nil
116}
117
118func State() CircuitState {
119    cb.mu.Lock()
120    defer cb.mu.Unlock()
121    return cb.state
122}

Rate Limiting

Without rate limiting, a single misbehaving client or malicious attacker can overwhelm your service. Rate limiting is like having a bouncer at a nightclub—it ensures everyone gets a fair turn and prevents abuse by limiting how many requests each client can make.

Common Rate Limiting Strategies

  • Token bucket: Allows bursts but maintains average rate
  • Fixed window: Resets count every minute/hour
  • Sliding window: More accurate but resource-intensive
  • Distributed rate limiting: For multiple servers sharing limits
 1package main
 2
 3import (
 4    "fmt"
 5    "sync"
 6    "time"
 7)
 8
 9// run
10func main() {
11    limiter := NewRateLimiter(3, time.Second)
12
13    // Simulate requests
14    var wg sync.WaitGroup
15    for i := 0; i < 10; i++ {
16        wg.Add(1)
17        go func(id int) {
18            defer wg.Done()
19
20            if limiter.Allow() {
21                fmt.Printf("Request %d: Allowed\n", id)
22            } else {
23                fmt.Printf("Request %d: Rate limited\n", id)
24            }
25        }(i)
26
27        time.Sleep(100 * time.Millisecond)
28    }
29
30    wg.Wait()
31}
32
33type RateLimiter struct {
34    rate     int
35    interval time.Duration
36    tokens   int
37    lastRef  time.Time
38    mu       sync.Mutex
39}
40
41func NewRateLimiter(rate int, interval time.Duration) *RateLimiter {
42    return &RateLimiter{
43        rate:     rate,
44        interval: interval,
45        tokens:   rate,
46        lastRef:  time.Now(),
47    }
48}
49
50func Allow() bool {
51    rl.mu.Lock()
52    defer rl.mu.Unlock()
53
54    now := time.Now()
55    elapsed := now.Sub(rl.lastRef)
56
57    // Refill tokens based on elapsed time
58    if elapsed >= rl.interval {
59        rl.tokens = rl.rate
60        rl.lastRef = now
61    }
62
63    if rl.tokens > 0 {
64        rl.tokens--
65        return true
66    }
67
68    return false
69}

Connection Health Checking

Connections can go bad even when they appear to be working—think of this as a phone line that's still connected but the quality is so poor you can't understand each other. Health checking ensures you're only using connections that are actually working properly.

Common Health Check Pitfalls

  • Too aggressive: Health checks themselves can overwhelm the service
  • Not aggressive enough: Bad connections linger and cause failures
  • Wrong metrics: Checking connection exists vs. checking it works

Real-world Example

Database connection pools constantly health check connections. A connection might have been dropped by a firewall hours ago, but from your application's perspective, it's still "open". Health checks catch these zombie connections before they cause errors.

  1package main
  2
  3import (
  4    "fmt"
  5    "net"
  6    "sync"
  7    "time"
  8)
  9
 10// run
 11func main() {
 12    manager := NewConnectionManager("localhost:1010", 5*time.Second)
 13
 14    // Start test server
 15    go startHealthCheckServer()
 16    time.Sleep(100 * time.Millisecond)
 17
 18    // Get connection
 19    conn, err := manager.GetConnection()
 20    if err != nil {
 21        fmt.Println("Error:", err)
 22        return
 23    }
 24    defer manager.ReleaseConnection(conn)
 25
 26    fmt.Println("Got healthy connection")
 27
 28    // Use connection
 29    conn.Write([]byte("test\n"))
 30}
 31
 32type ConnectionManager struct {
 33    addr         string
 34    healthCheck  time.Duration
 35    conn         net.Conn
 36    lastCheck    time.Time
 37    mu           sync.Mutex
 38}
 39
 40func NewConnectionManager(addr string, healthCheck time.Duration) *ConnectionManager {
 41    return &ConnectionManager{
 42        addr:        addr,
 43        healthCheck: healthCheck,
 44    }
 45}
 46
 47func GetConnection() {
 48    cm.mu.Lock()
 49    defer cm.mu.Unlock()
 50
 51    // Check if we need to verify health
 52    if cm.conn != nil && time.Since(cm.lastCheck) < cm.healthCheck {
 53        return cm.conn, nil
 54    }
 55
 56    // Health check or create new connection
 57    if cm.conn != nil {
 58        if !cm.isHealthy(cm.conn) {
 59            fmt.Println("Connection unhealthy, creating new one")
 60            cm.conn.Close()
 61            cm.conn = nil
 62        }
 63    }
 64
 65    if cm.conn == nil {
 66        var err error
 67        cm.conn, err = net.Dial("tcp", cm.addr)
 68        if err != nil {
 69            return nil, err
 70        }
 71        fmt.Println("Created new connection")
 72    }
 73
 74    cm.lastCheck = time.Now()
 75    return cm.conn, nil
 76}
 77
 78func ReleaseConnection(conn net.Conn) {
 79    // In a real implementation, you might return to a pool
 80}
 81
 82func isHealthy(conn net.Conn) bool {
 83    // Simple health check: set short deadline and try to read
 84    conn.SetReadDeadline(time.Now().Add(100 * time.Millisecond))
 85    one := make([]byte, 1)
 86
 87    // Try to peek at one byte without consuming it
 88    // In a real implementation, you'd use a proper health check protocol
 89    _, err := conn.Read(one)
 90
 91    // Reset deadline
 92    conn.SetReadDeadline(time.Time{})
 93
 94    return err == nil
 95}
 96
 97func startHealthCheckServer() {
 98    listener, _ := net.Listen("tcp", "localhost:1010")
 99    defer listener.Close()
100
101    for {
102        conn, err := listener.Accept()
103        if err != nil {
104            return
105        }
106
107        go func(c net.Conn) {
108            defer c.Close()
109            buffer := make([]byte, 1024)
110            for {
111                _, err := c.Read(buffer)
112                if err != nil {
113                    return
114                }
115            }
116        }(conn)
117    }
118}

Practice Exercises

Exercise 1: Chat Server

Difficulty: Intermediate | Time: 35-45 minutes

Learning Objectives:

  • Master concurrent TCP server design with goroutine management
  • Implement real-time message broadcasting and client coordination
  • Learn graceful connection handling and cleanup patterns

Real-World Context: Chat servers are the foundation of real-time communication systems like Slack, Discord, and WhatsApp. They demonstrate the core patterns of concurrent connection management, message broadcasting, and client state synchronization that are essential for any real-time application.

Build a concurrent chat server where multiple clients can connect and broadcast messages to all connected clients. Your server should handle multiple simultaneous TCP connections, implement message broadcasting to all connected clients, manage client disconnections gracefully, provide username registration, and support chat commands like /list and /quit. This exercise demonstrates the fundamental patterns used in production real-time communication systems where managing concurrent connections and state synchronization is critical for building scalable chat applications.

Solution
  1package main
  2
  3import (
  4    "bufio"
  5    "fmt"
  6    "net"
  7    "strings"
  8    "sync"
  9    "time"
 10)
 11
 12type ChatServer struct {
 13    clients   map[string]net.Conn
 14    broadcast chan Message
 15    register  chan Client
 16    unregister chan string
 17    mu        sync.RWMutex
 18}
 19
 20type Client struct {
 21    username string
 22    conn     net.Conn
 23}
 24
 25type Message struct {
 26    sender  string
 27    content string
 28}
 29
 30func NewChatServer() *ChatServer {
 31    return &ChatServer{
 32        clients:    make(map[string]net.Conn),
 33        broadcast:  make(chan Message, 100),
 34        register:   make(chan Client),
 35        unregister: make(chan string),
 36    }
 37}
 38
 39func Start(addr string) error {
 40    listener, err := net.Listen("tcp", addr)
 41    if err != nil {
 42        return err
 43    }
 44    defer listener.Close()
 45
 46    fmt.Printf("Chat server listening on %s\n", addr)
 47
 48    // Start broadcast handler
 49    go s.handleBroadcasts()
 50
 51    for {
 52        conn, err := listener.Accept()
 53        if err != nil {
 54            continue
 55        }
 56
 57        go s.handleClient(conn)
 58    }
 59}
 60
 61func handleClient(conn net.Conn) {
 62    defer conn.Close()
 63
 64    // Get username
 65    conn.Write([]byte("Enter username: "))
 66    reader := bufio.NewReader(conn)
 67    username, err := reader.ReadString('\n')
 68    if err != nil {
 69        return
 70    }
 71    username = strings.TrimSpace(username)
 72
 73    // Register client
 74    s.register <- Client{username: username, conn: conn}
 75    defer func() {
 76        s.unregister <- username
 77    }()
 78
 79    // Notify others
 80    s.broadcast <- Message{
 81        sender:  "system",
 82        content: fmt.Sprintf("%s joined the chat", username),
 83    }
 84
 85    // Handle messages
 86    for {
 87        line, err := reader.ReadString('\n')
 88        if err != nil {
 89            break
 90        }
 91
 92        line = strings.TrimSpace(line)
 93
 94        if strings.HasPrefix(line, "/") {
 95            s.handleCommand(username, line, conn)
 96            continue
 97        }
 98
 99        s.broadcast <- Message{
100            sender:  username,
101            content: line,
102        }
103    }
104
105    s.broadcast <- Message{
106        sender:  "system",
107        content: fmt.Sprintf("%s left the chat", username),
108    }
109}
110
111func handleCommand(username, cmd string, conn net.Conn) {
112    parts := strings.Fields(cmd)
113    if len(parts) == 0 {
114        return
115    }
116
117    switch parts[0] {
118    case "/list":
119        s.mu.RLock()
120        users := make([]string, 0, len(s.clients))
121        for user := range s.clients {
122            users = append(users, user)
123        }
124        s.mu.RUnlock()
125
126        conn.Write([]byte(fmt.Sprintf("Users: %s\n", strings.Join(users, ", "))))
127
128    case "/quit":
129        conn.Write([]byte("Goodbye!\n"))
130        conn.Close()
131
132    default:
133        conn.Write([]byte("Unknown command\n"))
134    }
135}
136
137func handleBroadcasts() {
138    for {
139        select {
140        case client := <-s.register:
141            s.mu.Lock()
142            s.clients[client.username] = client.conn
143            s.mu.Unlock()
144
145        case username := <-s.unregister:
146            s.mu.Lock()
147            delete(s.clients, username)
148            s.mu.Unlock()
149
150        case msg := <-s.broadcast:
151            s.mu.RLock()
152            for username, conn := range s.clients {
153                if username != msg.sender {
154                    fmt.Fprintf(conn, "[%s] %s\n", msg.sender, msg.content)
155                }
156            }
157            s.mu.RUnlock()
158        }
159    }
160}
161
162func main() {
163    server := NewChatServer()
164
165    // For testing, run in background
166    go func() {
167        if err := server.Start("localhost:6000"); err != nil {
168            fmt.Println("Server error:", err)
169        }
170    }()
171
172    // Simulate clients
173    time.Sleep(100 * time.Millisecond)
174
175    go simulateChatClient("Alice", []string{"Hello everyone!", "/list"})
176    time.Sleep(100 * time.Millisecond)
177
178    go simulateChatClient("Bob", []string{"Hi Alice!", "/list"})
179    time.Sleep(500 * time.Millisecond)
180
181    time.Sleep(2 * time.Second)
182}
183
184func simulateChatClient(username string, messages []string) {
185    conn, err := net.Dial("tcp", "localhost:6000")
186    if err != nil {
187        fmt.Printf("%s: Dial error: %v\n", username, err)
188        return
189    }
190    defer conn.Close()
191
192    reader := bufio.NewReader(conn)
193
194    // Read prompt
195    reader.ReadString('\n')
196
197    // Send username
198    fmt.Fprintf(conn, "%s\n", username)
199
200    // Read and display messages in background
201    go func() {
202        for {
203            line, err := reader.ReadString('\n')
204            if err != nil {
205                return
206            }
207            fmt.Printf("%s received: %s", username, line)
208        }
209    }()
210
211    // Send messages
212    for _, msg := range messages {
213        time.Sleep(200 * time.Millisecond)
214        fmt.Fprintf(conn, "%s\n", msg)
215    }
216
217    time.Sleep(500 * time.Millisecond)
218}

Exercise 2: Load Balancer

Difficulty: Advanced | Time: 40-50 minutes

Learning Objectives:

  • Master TCP proxy patterns and bidirectional data forwarding
  • Implement health checking and failure detection mechanisms
  • Learn round-robin load balancing algorithms and connection management

Real-World Context: Load balancers are essential components in modern infrastructure, distributing traffic across multiple servers to ensure high availability and scalability. Companies like Netflix, Google, and AWS rely on sophisticated load balancers to handle billions of requests daily.

Implement a simple TCP load balancer that distributes connections across multiple backend servers using round-robin algorithm. Your load balancer should accept client connections, maintain a list of healthy backend servers, use round-robin selection for load distribution, forward data bidirectionally between clients and backends, handle backend failures gracefully, and implement health checking to detect and remove failed servers. This exercise demonstrates the core patterns used in production load balancing systems where traffic distribution, health monitoring, and failure handling are critical for building resilient and scalable applications.

Solution
  1package main
  2
  3import (
  4    "fmt"
  5    "io"
  6    "net"
  7    "sync"
  8    "sync/atomic"
  9    "time"
 10)
 11
 12type LoadBalancer struct {
 13    backends     []*Backend
 14    current      atomic.Uint32
 15    healthCheck  time.Duration
 16    mu           sync.RWMutex
 17}
 18
 19type Backend struct {
 20    addr    string
 21    healthy atomic.Bool
 22}
 23
 24func NewLoadBalancer(backends []string, healthCheck time.Duration) *LoadBalancer {
 25    lb := &LoadBalancer{
 26        backends:    make([]*Backend, len(backends)),
 27        healthCheck: healthCheck,
 28    }
 29
 30    for i, addr := range backends {
 31        backend := &Backend{addr: addr}
 32        backend.healthy.Store(true)
 33        lb.backends[i] = backend
 34    }
 35
 36    // Start health checks
 37    go lb.startHealthChecks()
 38
 39    return lb
 40}
 41
 42func Start(addr string) error {
 43    listener, err := net.Listen("tcp", addr)
 44    if err != nil {
 45        return err
 46    }
 47    defer listener.Close()
 48
 49    fmt.Printf("Load balancer listening on %s\n", addr)
 50    fmt.Printf("Backends: %v\n", lb.getBackendAddrs())
 51
 52    for {
 53        conn, err := listener.Accept()
 54        if err != nil {
 55            continue
 56        }
 57
 58        go lb.handleConnection(conn)
 59    }
 60}
 61
 62func handleConnection(clientConn net.Conn) {
 63    defer clientConn.Close()
 64
 65    // Get next healthy backend
 66    backend := lb.nextBackend()
 67    if backend == nil {
 68        fmt.Println("No healthy backends available")
 69        return
 70    }
 71
 72    // Connect to backend
 73    backendConn, err := net.DialTimeout("tcp", backend.addr, 5*time.Second)
 74    if err != nil {
 75        fmt.Printf("Failed to connect to backend %s: %v\n", backend.addr, err)
 76        backend.healthy.Store(false)
 77        return
 78    }
 79    defer backendConn.Close()
 80
 81    fmt.Printf("Proxying connection to %s\n", backend.addr)
 82
 83    // Bidirectional copy
 84    var wg sync.WaitGroup
 85    wg.Add(2)
 86
 87    // Client -> Backend
 88    go func() {
 89        defer wg.Done()
 90        io.Copy(backendConn, clientConn)
 91        backendConn.(*net.TCPConn).CloseWrite()
 92    }()
 93
 94    // Backend -> Client
 95    go func() {
 96        defer wg.Done()
 97        io.Copy(clientConn, backendConn)
 98        clientConn.(*net.TCPConn).CloseWrite()
 99    }()
100
101    wg.Wait()
102}
103
104func nextBackend() *Backend {
105    lb.mu.RLock()
106    defer lb.mu.RUnlock()
107
108    n := len(lb.backends)
109    if n == 0 {
110        return nil
111    }
112
113    // Try to find healthy backend
114    for i := 0; i < n; i++ {
115        idx := lb.current.Add(1) % uint32(n)
116        backend := lb.backends[idx]
117
118        if backend.healthy.Load() {
119            return backend
120        }
121    }
122
123    return nil
124}
125
126func startHealthChecks() {
127    ticker := time.NewTicker(lb.healthCheck)
128    defer ticker.Stop()
129
130    for range ticker.C {
131        lb.mu.RLock()
132        for _, backend := range lb.backends {
133            go lb.checkBackend(backend)
134        }
135        lb.mu.RUnlock()
136    }
137}
138
139func checkBackend(backend *Backend) {
140    conn, err := net.DialTimeout("tcp", backend.addr, 2*time.Second)
141    if err != nil {
142        if backend.healthy.Load() {
143            fmt.Printf("Backend %s is down\n", backend.addr)
144        }
145        backend.healthy.Store(false)
146        return
147    }
148    conn.Close()
149
150    if !backend.healthy.Load() {
151        fmt.Printf("Backend %s is back up\n", backend.addr)
152    }
153    backend.healthy.Store(true)
154}
155
156func getBackendAddrs() []string {
157    lb.mu.RLock()
158    defer lb.mu.RUnlock()
159
160    addrs := make([]string, len(lb.backends))
161    for i, b := range lb.backends {
162        addrs[i] = b.addr
163    }
164    return addrs
165}
166
167func main() {
168    // Start backend servers
169    go startBackendServer("localhost:8001", "Backend-1")
170    go startBackendServer("localhost:8002", "Backend-2")
171    go startBackendServer("localhost:8003", "Backend-3")
172
173    time.Sleep(100 * time.Millisecond)
174
175    // Start load balancer
176    lb := NewLoadBalancer(
177        []string{"localhost:8001", "localhost:8002", "localhost:8003"},
178        3*time.Second,
179    )
180
181    go func() {
182        if err := lb.Start("localhost:9000"); err != nil {
183            fmt.Println("Load balancer error:", err)
184        }
185    }()
186
187    time.Sleep(100 * time.Millisecond)
188
189    // Test with clients
190    for i := 0; i < 6; i++ {
191        go testClient(i)
192        time.Sleep(100 * time.Millisecond)
193    }
194
195    time.Sleep(2 * time.Second)
196}
197
198func startBackendServer(addr, name string) {
199    listener, err := net.Listen("tcp", addr)
200    if err != nil {
201        fmt.Printf("%s error: %v\n", name, err)
202        return
203    }
204    defer listener.Close()
205
206    fmt.Printf("%s listening on %s\n", name, addr)
207
208    for {
209        conn, err := listener.Accept()
210        if err != nil {
211            continue
212        }
213
214        go func(c net.Conn) {
215            defer c.Close()
216
217            // Echo with server name
218            response := fmt.Sprintf("Response from %s\n", name)
219            c.Write([]byte(response))
220        }(conn)
221    }
222}
223
224func testClient(id int) {
225    conn, err := net.Dial("tcp", "localhost:9000")
226    if err != nil {
227        fmt.Printf("Client %d error: %v\n", id, err)
228        return
229    }
230    defer conn.Close()
231
232    // Read response
233    buffer := make([]byte, 1024)
234    n, err := conn.Read(buffer)
235    if err != nil {
236        fmt.Printf("Client %d read error: %v\n", id, err)
237        return
238    }
239
240    fmt.Printf("Client %d: %s", id, string(buffer[:n]))
241}

Exercise 3: HTTP Proxy Server

Difficulty: Advanced | Time: 30-40 minutes

Learning Objectives:

  • Build HTTP proxy servers with request forwarding and response handling
  • Master HTTP header manipulation and proxy protocol implementation
  • Learn request/response logging and performance monitoring patterns

Real-World Context: HTTP proxies are crucial components in web infrastructure, used for caching, filtering, load balancing, and security monitoring. Corporate networks, CDNs, and API gateways all rely on HTTP proxy technology to manage and optimize web traffic.

Build an HTTP proxy server that forwards requests to backend servers with request/response logging. Your proxy should handle HTTP request forwarding, preserve and manipulate headers appropriately, add proxy-specific headers like X-Forwarded-For, log detailed request/response information including timing and status codes, and demonstrate the patterns used in production proxy servers. This exercise shows how to build the fundamental components of API gateways and reverse proxies that are essential for modern web application architecture.

Solution with Explanation
  1package main
  2
  3import (
  4	"fmt"
  5	"io"
  6	"log"
  7	"net"
  8	"net/http"
  9	"net/url"
 10	"time"
 11)
 12
 13// HTTPProxy forwards HTTP requests to backend servers
 14type HTTPProxy struct {
 15	targetURL *url.URL
 16}
 17
 18func NewHTTPProxy(targetURL string) {
 19	u, err := url.Parse(targetURL)
 20	if err != nil {
 21		return nil, err
 22	}
 23
 24	return &HTTPProxy{targetURL: u}, nil
 25}
 26
 27func ServeHTTP(w http.ResponseWriter, r *http.Request) {
 28	start := time.Now()
 29
 30	// Log incoming request
 31	log.Printf("[PROXY] %s %s from %s", r.Method, r.URL.Path, r.RemoteAddr)
 32
 33	// Create proxy request
 34	proxyURL := *r.URL
 35	proxyURL.Scheme = p.targetURL.Scheme
 36	proxyURL.Host = p.targetURL.Host
 37
 38	proxyReq, err := http.NewRequest(r.Method, proxyURL.String(), r.Body)
 39	if err != nil {
 40		http.Error(w, "Proxy request failed", http.StatusInternalServerError)
 41		log.Printf("[ERROR] Failed to create proxy request: %v", err)
 42		return
 43	}
 44
 45	// Copy headers
 46	for key, values := range r.Header {
 47		for _, value := range values {
 48			proxyReq.Header.Add(key, value)
 49		}
 50	}
 51
 52	// Set proxy headers
 53	proxyReq.Header.Set("X-Forwarded-For", r.RemoteAddr)
 54	proxyReq.Header.Set("X-Forwarded-Proto", "http")
 55
 56	// Send request
 57	client := &http.Client{
 58		Timeout: 30 * time.Second,
 59	}
 60
 61	resp, err := client.Do(proxyReq)
 62	if err != nil {
 63		http.Error(w, "Backend unavailable", http.StatusBadGateway)
 64		log.Printf("[ERROR] Backend request failed: %v", err)
 65		return
 66	}
 67	defer resp.Body.Close()
 68
 69	// Copy response headers
 70	for key, values := range resp.Header {
 71		for _, value := range values {
 72			w.Header().Add(key, value)
 73		}
 74	}
 75
 76	// Copy status code
 77	w.WriteHeader(resp.StatusCode)
 78
 79	// Copy response body
 80	written, err := io.Copy(w, resp.Body)
 81	if err != nil {
 82		log.Printf("[ERROR] Failed to copy response: %v", err)
 83		return
 84	}
 85
 86	duration := time.Since(start)
 87	log.Printf("[PROXY] %s %s -> %d",
 88		r.Method, r.URL.Path, resp.StatusCode, written, duration)
 89}
 90
 91func main() {
 92	// For demo, we'll create a simple backend server first
 93	go func() {
 94		http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
 95			fmt.Fprintf(w, "Backend response for %s\n", r.URL.Path)
 96			fmt.Fprintf(w, "Method: %s\n", r.Method)
 97			fmt.Fprintf(w, "X-Forwarded-For: %s\n", r.Header.Get("X-Forwarded-For"))
 98		})
 99		log.Println("Backend server starting on :8081")
100		http.ListenAndServe(":8081", nil)
101	}()
102
103	time.Sleep(100 * time.Millisecond)
104
105	// Create proxy
106	proxy, err := NewHTTPProxy("http://localhost:8081")
107	if err != nil {
108		panic(err)
109	}
110
111	// Start proxy server
112	fmt.Println("HTTP Proxy starting on :8080")
113	fmt.Println("Proxying to http://localhost:8081")
114	fmt.Println("\nTest with: curl http://localhost:8080/test")
115
116	// For demo purposes, simulate a request
117	time.Sleep(100 * time.Millisecond)
118	resp, err := http.Get("http://localhost:8080/test")
119	if err != nil {
120		fmt.Printf("Test request failed: %v\n", err)
121	} else {
122		body, _ := io.ReadAll(resp.Body)
123		resp.Body.Close()
124		fmt.Printf("\nProxy Response:\n%s\n", string(body))
125	}
126}
127// run

Explanation:

This HTTP proxy demonstrates:

  • Request Forwarding: Forwards HTTP requests to backend server
  • Header Preservation: Copies request and response headers
  • X-Forwarded-For: Adds proxy headers for client tracking
  • Error Handling: Handles backend failures gracefully
  • Logging: Comprehensive request/response logging
  • Timeout Management: Sets reasonable timeouts for backend requests

Production enhancements:

  • Add load balancing across multiple backends
  • Implement connection pooling
  • Add caching for static resources
  • Implement rate limiting
  • Add authentication/authorization
  • Support WebSocket proxying
  • Add health checks for backends
  • Implement circuit breakers

Exercise 4: UDP File Transfer

Difficulty: Advanced | Time: 45-55 minutes

Learning Objectives:

  • Master UDP communication with reliability mechanisms
  • Implement packet sequencing and acknowledgment systems
  • Learn to build reliable file transfer protocols on top of UDP

Real-World Context: While TCP handles reliability automatically, understanding how to build reliability on top of UDP is crucial for applications needing custom control over reliability, speed, and packet handling. Video streaming, gaming, and IoT systems often use custom UDP protocols for better performance.

Build a UDP-based file transfer system with reliability mechanisms. Your system should implement packet sequencing to ensure files are reconstructed correctly, add acknowledgment and retransmission logic for reliable delivery, handle packet loss and corruption gracefully, demonstrate flow control to prevent overwhelming the receiver, and show how to build reliability on top of UDP's connectionless protocol. This exercise demonstrates the fundamental principles behind protocols like QUIC and custom streaming solutions where you need TCP-like reliability with UDP's performance characteristics.

Solution with Explanation
  1package main
  2
  3import (
  4	"crypto/rand"
  5	"encoding/binary"
  6	"fmt"
  7	"io"
  8	"net"
  9	"os"
 10	"sync"
 11	"time"
 12)
 13
 14const (
 15	MaxPacketSize = 1024
 16	AckTimeout    = 2 * time.Second
 17	MaxRetries    = 5
 18)
 19
 20// Packet types
 21const (
 22	PacketTypeData byte = 1
 23	PacketTypeAck  byte = 2
 24	PacketTypeFin  byte = 3
 25)
 26
 27// Packet structure: [Type(1)][SeqNum(4)][Payload(N)]
 28type Packet struct {
 29	Type    byte
 30	SeqNum  uint32
 31	Payload []byte
 32}
 33
 34type ReliableFileTransfer struct {
 35	conn       *net.UDPConn
 36	packetBuf  sync.Map // seqNum -> packet
 37	windowSize int
 38	nextSeq    uint32
 39	lastAck    uint32
 40	fileSize   int64
 41	received   int64
 42}
 43
 44func NewReliableFileTransfer(conn *net.UDPConn) *ReliableFileTransfer {
 45	return &ReliableFileTransfer{
 46		conn:       conn,
 47		windowSize: 10,
 48		nextSeq:    1,
 49		lastAck:    0,
 50	}
 51}
 52
 53func SendFile(filename string, remoteAddr *net.UDPAddr) error {
 54	file, err := os.Open(filename)
 55	if err != nil {
 56		return fmt.Errorf("failed to open file: %w", err)
 57	}
 58	defer file.Close()
 59
 60	// Get file size
 61	stat, _ := file.Stat()
 62	rft.fileSize = stat.Size()
 63
 64	fmt.Printf("Sending file: %s\n", filename, rft.fileSize)
 65
 66	// Send file in packets
 67	buffer := make([]byte, MaxPacketSize-5) // 5 bytes for header
 68	seqNum := uint32(1)
 69
 70	for {
 71		n, err := file.Read(buffer)
 72		if err == io.EOF {
 73			break
 74		}
 75		if err != nil {
 76			return fmt.Errorf("file read error: %w", err)
 77		}
 78
 79		// Create data packet
 80		packet := Packet{
 81			Type:    PacketTypeData,
 82			SeqNum:  seqNum,
 83			Payload: buffer[:n],
 84		}
 85
 86		// Send with retry logic
 87		if err := rft.sendPacketWithRetry(packet, remoteAddr); err != nil {
 88			return fmt.Errorf("failed to send packet %d: %w", seqNum, err)
 89		}
 90
 91		fmt.Printf("Sent packet %d\n", seqNum, n)
 92		seqNum++
 93		time.Sleep(10 * time.Millisecond) // Small delay to avoid overwhelming receiver
 94	}
 95
 96	// Send FIN packet
 97	finPacket := Packet{
 98		Type:   PacketTypeFin,
 99		SeqNum: seqNum,
100	}
101	if err := rft.sendPacketWithRetry(finPacket, remoteAddr); err != nil {
102		return fmt.Errorf("failed to send FIN packet: %w", err)
103	}
104
105	fmt.Printf("File transfer completed\n")
106	return nil
107}
108
109func sendPacketWithRetry(packet Packet, addr *net.UDPAddr) error {
110	data := rft.encodePacket(packet)
111
112	for attempt := 0; attempt < MaxRetries; attempt++ {
113		// Send packet
114		if _, err := rft.conn.WriteToUDP(data, addr); err != nil {
115			return err
116		}
117
118		// Wait for ACK
119		if rft.waitForAck(packet.SeqNum) {
120			return nil
121		}
122
123		fmt.Printf("Retry packet %d\n", packet.SeqNum, attempt+1)
124		time.Sleep(AckTimeout)
125	}
126
127	return fmt.Errorf("max retries exceeded for packet %d", packet.SeqNum)
128}
129
130func waitForAck(seqNum uint32) bool {
131	buffer := make([]byte, 1024)
132	rft.conn.SetReadDeadline(time.Now().Add(AckTimeout))
133
134	for {
135		n, addr, err := rft.conn.ReadFromUDP(buffer)
136		if err != nil {
137			return false
138		}
139
140		packet := rft.decodePacket(buffer[:n])
141		if packet.Type == PacketTypeAck && packet.SeqNum == seqNum {
142			return true
143		}
144	}
145}
146
147func ReceiveFile(outputFilename string) error {
148	file, err := os.Create(outputFilename)
149	if err != nil {
150		return fmt.Errorf("failed to create output file: %w", err)
151	}
152	defer file.Close()
153
154	expectedSeq := uint32(1)
155	receivedPackets := make(map[uint32][]byte)
156
157	fmt.Printf("Receiving file to: %s\n", outputFilename)
158
159	for {
160		buffer := make([]byte, MaxPacketSize)
161		rft.conn.SetReadDeadline(time.Now().Add(5 * time.Second))
162
163		n, addr, err := rft.conn.ReadFromUDP(buffer)
164		if err != nil {
165			if netErr, ok := err.(net.Error); ok && netErr.Timeout() {
166				// Check if we've received all packets
167				if len(receivedPackets) == 0 {
168					break // No more packets expected
169				}
170				continue
171			}
172			return err
173		}
174
175		packet := rft.decodePacket(buffer[:n])
176
177		switch packet.Type {
178		case PacketTypeData:
179			if packet.SeqNum == expectedSeq {
180				// Write to file in order
181				file.Write(packet.Payload)
182				fmt.Printf("Received packet %d\n", packet.SeqNum, len(packet.Payload))
183				expectedSeq++
184
185				// Send ACK
186				ackPacket := Packet{Type: PacketTypeAck, SeqNum: packet.SeqNum}
187				ackData := rft.encodePacket(ackPacket)
188				rft.conn.WriteToUDP(ackData, addr)
189
190				// Check for out-of-order packets
191				for i := expectedSeq; i < expectedSeq+10; i++ {
192					if data, exists := receivedPackets[i]; exists {
193						file.Write(data)
194						fmt.Printf("Wrote out-of-order packet %d\n", i)
195						delete(receivedPackets, i)
196						expectedSeq++
197					} else {
198						break
199					}
200				}
201			} else if packet.SeqNum > expectedSeq {
202				// Out-of-order packet, store it
203				receivedPackets[packet.SeqNum] = packet.Payload
204				fmt.Printf("Stored out-of-order packet %d\n", packet.SeqNum)
205
206				// Send ACK
207				ackPacket := Packet{Type: PacketTypeAck, SeqNum: packet.SeqNum}
208				ackData := rft.encodePacket(ackPacket)
209				rft.conn.WriteToUDP(ackData, addr)
210			} else {
211				// Duplicate packet, just ACK it
212				ackPacket := Packet{Type: PacketTypeAck, SeqNum: packet.SeqNum}
213				ackData := rft.encodePacket(ackPacket)
214				rft.conn.WriteToUDP(ackData, addr)
215			}
216
217		case PacketTypeFin:
218			fmt.Printf("Received FIN packet\n")
219			// Send ACK for FIN
220			ackPacket := Packet{Type: PacketTypeAck, SeqNum: packet.SeqNum}
221			ackData := rft.encodePacket(ackPacket)
222			rft.conn.WriteToUDP(ackData, addr)
223			return nil
224		}
225	}
226}
227
228func encodePacket(packet Packet) []byte {
229	data := make([]byte, 5+len(packet.Payload))
230	data[0] = packet.Type
231	binary.BigEndian.PutUint32(data[1:5], packet.SeqNum)
232	copy(data[5:], packet.Payload)
233	return data
234}
235
236func decodePacket(data []byte) Packet {
237	if len(data) < 5 {
238		return Packet{}
239	}
240
241	return Packet{
242		Type:    data[0],
243		SeqNum:  binary.BigEndian.Uint32(data[1:5]),
244		Payload: data[5:],
245	}
246}
247
248func main() {
249	// Create test file
250	testData := []byte("This is a test file for UDP transfer.\n" +
251		"It contains multiple lines of text to demonstrate packet sequencing.\n" +
252		"Each line should arrive in order despite UDP's connectionless nature.\n" +
253		"Packet loss and reordering should be handled gracefully.\n")
254
255	filename := "test_transfer.txt"
256	if err := os.WriteFile(filename, testData, 0644); err != nil {
257		panic(err)
258	}
259	defer os.Remove(filename)
260
261	// Start receiver
262	receiverAddr, _ := net.ResolveUDPAddr("udp", "localhost:8888")
263	receiverConn, err := net.ListenUDP("udp", receiverAddr)
264	if err != nil {
265		panic(err)
266	}
267	defer receiverConn.Close()
268
269	go func() {
270		receiver := NewReliableFileTransfer(receiverConn)
271		if err := receiver.ReceiveFile("received_file.txt"); err != nil {
272			fmt.Printf("Receive error: %v\n", err)
273		} else {
274			fmt.Printf("File received successfully\n")
275		}
276	}()
277
278	time.Sleep(100 * time.Millisecond)
279
280	// Start sender
281	senderAddr, _ := net.ResolveUDPAddr("udp", "localhost:9999")
282	senderConn, err := net.DialUDP("udp", nil, receiverAddr)
283	if err != nil {
284		panic(err)
285	}
286	defer senderConn.Close()
287
288	sender := NewReliableFileTransfer(senderConn)
289	if err := sender.SendFile(filename, receiverAddr); err != nil {
290		fmt.Printf("Send error: %v\n", err)
291	}
292
293	time.Sleep(1 * time.Second)
294
295	// Verify received file
296	if receivedData, err := os.ReadFile("received_file.txt"); err == nil {
297		fmt.Printf("\nOriginal file size: %d bytes\n", len(testData))
298		fmt.Printf("Received file size: %d bytes\n", len(receivedData))
299		if string(testData) == string(receivedData) {
300			fmt.Printf("✓ File transfer successful\n")
301		} else {
302			fmt.Printf("✗ File transfer failed - data mismatch\n")
303		}
304		os.Remove("received_file.txt")
305	}
306}
307// run

Explanation:

This UDP file transfer system demonstrates:

  • Packet Sequencing: Ensures packets are reassembled in correct order
  • Reliability Layer: Implements ACK/retry mechanisms on top of UDP
  • Flow Control: Handles out-of-order packets and manages transmission windows
  • Error Recovery: Detects and handles packet loss with automatic retransmission
  • Connection Management: Properly handles file transfer completion and cleanup

Production use cases:

  • Custom streaming protocols for video/audio
  • Gaming network protocols needing low latency
  • IoT sensor data collection with reliability guarantees
  • Financial data transmission with custom reliability requirements

Exercise 5: Port Scanner

Difficulty: Intermediate | Time: 25-35 minutes

Learning Objectives:

  • Master TCP connection scanning and port checking techniques
  • Implement concurrent network probing with goroutine pools
  • Learn network service detection and timeout management

Real-World Context: Port scanning is essential for network security auditing, service discovery, and system administration. Security professionals use port scanners to identify open services and potential vulnerabilities, while DevOps engineers use them for service discovery and network troubleshooting.

Build a concurrent port scanner that checks for open ports on target hosts. Your scanner should scan multiple ports concurrently using goroutine pools, handle TCP connection timeouts gracefully, detect common services by attempting to identify service banners, provide configurable scanning options, and demonstrate efficient network probing patterns. This exercise shows how to build network reconnaissance tools while understanding the fundamentals of TCP service discovery and network security assessment.

Solution with Explanation
  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"net"
  7	"strconv"
  8	"strings"
  9	"sync"
 10	"time"
 11)
 12
 13type PortScanner struct {
 14	MaxConcurrent int
 15	Timeout       time.Duration
 16	Delay         time.Duration
 17}
 18
 19type ScanResult struct {
 20	Port   int
 21	Open   bool
 22	Banner string
 23	Error  error
 24}
 25
 26type ServiceSignature struct {
 27	Name    string
 28	Banner  string
 29	Probe   string
 30	Timeout time.Duration
 31}
 32
 33// Common service signatures for detection
 34var serviceSignatures = []ServiceSignature{
 35	{"HTTP", "HTTP", "GET / HTTP/1.0\r\n\r\n", 2 * time.Second},
 36	{"SSH", "SSH-", "", 1 * time.Second},
 37	{"FTP", "FTP", "", 1 * time.Second},
 38	{"SMTP", "SMTP", "", 2 * time.Second},
 39	{"POP3", "POP3", "", 1 * time.Second},
 40	{"IMAP", "IMAP", "", 1 * time.Second},
 41	{"Telnet", "", "", 1 * time.Second},
 42}
 43
 44func NewPortScanner(maxConcurrent int, timeout time.Duration) *PortScanner {
 45	return &PortScanner{
 46		MaxConcurrent: maxConcurrent,
 47		Timeout:       timeout,
 48		Delay:         10 * time.Millisecond, // Small delay between connections
 49	}
 50}
 51
 52func ScanHost(host string, ports []int) <-chan ScanResult {
 53	results := make(chan ScanResult, len(ports))
 54
 55	// Create semaphore for concurrency control
 56	semaphore := make(chan struct{}, ps.MaxConcurrent)
 57	var wg sync.WaitGroup
 58
 59	for _, port := range ports {
 60		wg.Add(1)
 61		go func(p int) {
 62			defer wg.Done()
 63			semaphore <- struct{}{} // Acquire
 64			defer func() { <-semaphore }() // Release
 65
 66			result := ps.scanPort(host, p)
 67			results <- result
 68
 69			// Small delay to avoid overwhelming the target
 70			time.Sleep(ps.Delay)
 71		}(port)
 72	}
 73
 74	// Close results channel when all scans complete
 75	go func() {
 76		wg.Wait()
 77		close(results)
 78	}()
 79
 80	return results
 81}
 82
 83func scanPort(host string, port int) ScanResult {
 84	result := ScanResult{Port: port}
 85
 86	// Connect with timeout
 87	address := fmt.Sprintf("%s:%d", host, port)
 88	conn, err := net.DialTimeout("tcp", address, ps.Timeout)
 89	if err != nil {
 90		result.Open = false
 91		result.Error = err
 92		return result
 93	}
 94	defer conn.Close()
 95
 96	result.Open = true
 97
 98	// Try to detect service and get banner
 99	banner, service := ps.detectService(conn)
100	result.Banner = banner
101
102	if service != "" {
103		result.Banner = fmt.Sprintf("[%s] %s", service, banner)
104	}
105
106	return result
107}
108
109func detectService(conn net.Conn) {
110	// Set read timeout for banner detection
111	conn.SetReadDeadline(time.Now().Add(2 * time.Second))
112
113	// Try to read initial banner
114	scanner := bufio.NewScanner(conn)
115	if scanner.Scan() {
116		banner := scanner.Text()
117		return identifyService(banner), banner
118	}
119
120	// Try active probing for services that don't send banners
121	for _, sig := range serviceSignatures {
122		if sig.Probe != "" {
123			conn.SetWriteDeadline(time.Now().Add(sig.Timeout))
124			if _, err := fmt.Fprint(conn, sig.Probe); err != nil {
125				continue
126			}
127		}
128
129		conn.SetReadDeadline(time.Now().Add(sig.Timeout))
130		if scanner.Scan() {
131			banner := scanner.Text()
132			if strings.Contains(banner, sig.Banner) {
133				return sig.Name, banner
134			}
135		}
136	}
137
138	return "", ""
139}
140
141func identifyService(banner string) string {
142	banner = strings.ToUpper(banner)
143
144	serviceMap := map[string]string{
145		"SSH": "SSH",
146		"FTP": "FTP",
147		"HTTP": "HTTP",
148		"SMTP": "SMTP",
149		"POP3": "POP3",
150		"IMAP": "IMAP",
151	}
152
153	for service, identifier := range serviceMap {
154		if strings.Contains(banner, identifier) {
155			return service
156		}
157	}
158
159	return ""
160}
161
162func ScanRange(host string, startPort, endPort int) <-chan ScanResult {
163	ports := make([]int, endPort-startPort+1)
164	for i := range ports {
165		ports[i] = startPort + i
166	}
167	return ps.ScanHost(host, ports)
168}
169
170func ScanCommonPorts(host string) <-chan ScanResult {
171	// Common ports to scan
172	commonPorts := []int{
173		21,   // FTP
174		22,   // SSH
175		23,   // Telnet
176		25,   // SMTP
177		53,   // DNS
178		80,   // HTTP
179		110,  // POP3
180		143,  // IMAP
181		443,  // HTTPS
182		993,  // IMAPS
183		995,  // POP3S
184		1433, // MSSQL
185		1521, // Oracle
186		3306, // MySQL
187		3389, // RDP
188		5432, // PostgreSQL
189		5900, // VNC
190		6379, // Redis
191		8080, // HTTP Alt
192		8443, // HTTPS Alt
193	}
194
195	return ps.ScanHost(host, commonPorts)
196}
197
198func printResults(results <-chan ScanResult, target string) {
199	fmt.Printf("Scan results for %s:\n", target)
200	fmt.Println(strings.Repeat("=", 50))
201
202	openCount := 0
203	for result := range results {
204		if result.Open {
205			openCount++
206			status := "OPEN"
207			if result.Banner != "" {
208				fmt.Printf("Port %5d: %-8s %s\n", result.Port, status, result.Banner)
209			} else {
210				fmt.Printf("Port %5d: %s\n", result.Port, status)
211			}
212		} else {
213			// Optionally show closed ports with verbose mode
214			// fmt.Printf("Port %5d: CLOSED\n", result.Port, result.Error)
215		}
216	}
217
218	fmt.Println(strings.Repeat("=", 50))
219	fmt.Printf("Scan completed. %d open ports found.\n", openCount)
220}
221
222func main() {
223	// Create scanner
224	scanner := NewPortScanner(50, 2*time.Second)
225
226	// Target to scan
227	target := "localhost"
228
229	fmt.Printf("Starting port scan of %s...\n\n", target)
230
231	// Scan common ports first
232	fmt.Println("=== Scanning Common Ports ===")
233	results := scanner.ScanCommonPorts(target)
234	printResults(results, target)
235
236	// Scan a range of ports
237	fmt.Println("\n=== Scanning Port Range 8000-8010 ===")
238	rangeResults := scanner.ScanRange(target, 8000, 8010)
239	printResults(rangeResults, target)
240
241	// Demonstrate custom port list
242	fmt.Println("\n=== Scanning Custom Port List ===")
243	customPorts := []int{3000, 3001, 5000, 5001, 7000, 7001, 9000, 9001}
244	customResults := scanner.ScanHost(target, customPorts)
245	printResults(customResults, target)
246}
247// run

Explanation:

This port scanner demonstrates:

  • Concurrent Scanning: Uses goroutine pools for efficient parallel port checking
  • Timeout Management: Properly handles connection timeouts to avoid hanging
  • Service Detection: Attempts to identify services through banners and active probing
  • Rate Limiting: Includes delays to avoid overwhelming target systems
  • Flexible Scanning: Supports port ranges, common ports, and custom port lists

Production considerations:

  • Add stealth scanning techniques
  • Implement IPv6 support
  • Add more comprehensive service signatures database
  • Include UDP port scanning capabilities
  • Add output formats
  • Implement scan result persistence and historical comparison

Further Reading

Summary

Network programming in Go is a journey from simple client-server communication to building robust, production-ready distributed systems. The patterns we've explored are the building blocks that power everything from web services to microservices architectures.

Key Takeaways

  1. TCP Programming

    • Building servers and clients with net.Listen and net.Dial
    • Handling concurrent connections with goroutines
    • Binary protocol implementation
    • Connection management and timeouts
  2. UDP Programming

    • Connectionless communication with net.ListenUDP
    • Multicast support for one-to-many communication
    • Implementing reliability on top of UDP
    • Packet-based communication patterns
  3. DNS Operations

    • Hostname and IP address resolution
    • MX, TXT, and SRV record lookups
    • Custom resolvers with caching
    • Service discovery patterns
  4. Connection Management

    • Connection pooling for resource efficiency
    • Timeout strategies
    • Keep-alive configuration
    • Health checking
  5. Custom Protocols

    • Text-based request-response protocols
    • Binary protocols with length prefixing
    • Streaming data protocols
    • Protocol versioning
  6. Production Patterns

    • Graceful shutdown with context
    • Circuit breaker for fault tolerance
    • Rate limiting to prevent overload
    • Connection health monitoring

Best Practices

  1. Always Use Timeouts: Never perform network operations without timeouts
  2. Handle Errors Gracefully: Network operations can fail in many ways
  3. Use Context: Propagate cancellation and timeouts through context
  4. Pool Connections: Reuse connections for better performance
  5. Implement Retry Logic: Use exponential backoff for transient failures
  6. Monitor Health: Regularly check connection and service health
  7. Graceful Shutdown: Always drain connections before shutting down
  8. Buffer Management: Be careful with buffer sizes to avoid memory issues
  9. Concurrency Safety: Protect shared state with proper synchronization
  10. Test Edge Cases: Network failures, timeouts, and partial reads/writes

Common Pitfalls

  • Forgetting to close connections
  • Not setting timeouts
  • Ignoring partial reads/writes
  • Not handling network errors properly
  • Blocking operations in handlers
  • Not limiting concurrent connections
  • Exposing internal implementation in protocols

💡 Final Key Takeaway: Network programming is fundamentally about dealing with failure. Networks are unreliable, connections break, and services fail. The most robust network applications are those that expect and handle these failures gracefully, providing a good user experience even when things go wrong.