Think of network programming as the digital equivalent of the postal system. Just as mail needs addresses, routes, and delivery methods, network applications need protocols, sockets, and data transmission strategies. Go's net package provides everything you need to build everything from simple client-server applications to complex distributed systems.
In today's connected world, every application is a network application. Whether you're building a web service, mobile app backend, or IoT device, understanding network programming is essential. Go makes network programming accessible with its simple yet powerful abstractions that handle the complexity while giving you control when you need it.
💡 Key Takeaway: Network programming in Go is like building with LEGOs—you have simple, well-defined pieces that can be combined to create complex, robust systems.
Introduction to Network Programming
Understanding network programming requires grasping several foundational concepts. At its core, network communication follows a layered model where each layer provides specific services to the layers above it. While you don't need to memorize every detail of the OSI model, understanding that TCP/IP operates at different layers helps you choose the right abstraction for your needs.
The Network Stack: When you write net.Dial("tcp", "example.com:80"), Go handles DNS resolution, TCP connection establishment, and socket management—all the complex details that would take hundreds of lines to implement manually. This is the power of Go's networking abstractions: they hide complexity without sacrificing control.
Why Go Excels at Network Programming: Go was designed at Google to build networked systems. Its concurrency primitives (goroutines and channels) make handling thousands of simultaneous connections natural. The standard library is production-ready, battle-tested in services handling billions of requests daily.
Before diving into code, let's understand the building blocks. Think of network programming as learning a new language—once you understand the grammar and vocabulary, you can express complex ideas fluently.
Go's networking capabilities are built on a few core packages:
- net: Core networking primitives
- net/http: HTTP client and server
- net/url: URL parsing and manipulation
- net/textproto: Text-based protocol utilities
Basic Network Concepts
The OSI and TCP/IP Models
Understanding Network Layers: Network communication is organized into layers, each providing services to the layer above. The OSI model defines seven layers, but the simpler TCP/IP model (which better reflects reality) has four: Link, Internet, Transport, and Application.
Application Layer: This is where your Go code operates. HTTP, FTP, SMTP, DNS—these are application-layer protocols. Go's net/http and net packages provide application-layer functionality.
Transport Layer: TCP and UDP operate here, providing end-to-end communication between applications. TCP adds reliability, ordering, and flow control. UDP is minimal—just port numbers and checksums.
Internet Layer: IP handles routing packets across networks. IPv4 uses 32-bit addresses; IPv6 uses 128-bit addresses. Go abstracts IPv4/IPv6 differences—most code works with both transparently.
Link Layer: Ethernet, WiFi, and other technologies operate here. Go's net package abstracts link-layer details—you usually don't need to worry about MAC addresses or Ethernet frames.
Address Resolution and Port Numbers
Port Numbers: Ports identify specific applications on a host. Well-known ports (0-1023) are reserved for standard services: 80 for HTTP, 443 for HTTPS, 22 for SSH. Registered ports (1024-49151) are assigned by IANA. Dynamic ports (49152-65535) are available for temporary use.
Socket Pairs: A TCP connection is uniquely identified by four values: source IP, source port, destination IP, destination port. This tuple allows multiple connections between the same hosts—each gets a different source port.
Address Resolution: DNS isn't the only name resolution mechanism. Hosts files (/etc/hosts) provide static mappings. mDNS (multicast DNS) enables local network discovery without DNS servers. Go's resolver uses the system's configuration, checking hosts files before DNS.
Network Byte Order
Endianness: Different CPUs store multi-byte integers differently. Network protocols use big-endian (most significant byte first), called network byte order. Go's encoding/binary package handles conversions. The binary.BigEndian type represents network byte order.
Why This Matters: When implementing binary protocols, you must use network byte order for interoperability. A 16-bit port number must be sent with the high byte first, regardless of your CPU's native format.
Let's start with the fundamentals. Network addresses are like street addresses—they tell us where to send data, and different address types serve different purposes. Understanding these basics will make everything else clearer.
1package main
2
3import (
4 "fmt"
5 "net"
6 "time"
7)
8
9// run
10func main() {
11 // Network address parsing
12 addr, err := net.ResolveTCPAddr("tcp", "localhost:8080")
13 if err != nil {
14 fmt.Println("Error:", err)
15 return
16 }
17 fmt.Printf("Network: %s, Address: %s\n", addr.Network(), addr.String())
18
19 // IP address parsing
20 ip := net.ParseIP("192.168.1.1")
21 if ip != nil {
22 fmt.Printf("IP: %s, Is IPv4: %t\n", ip, ip.To4() != nil)
23 }
24
25 // Network interface information
26 interfaces, _ := net.Interfaces()
27 fmt.Println("\nNetwork Interfaces:")
28 for _, iface := range interfaces {
29 fmt.Printf(" %s\n", iface.Name, iface.Flags)
30 }
31
32 // Timeout operations
33 timeout := 5 * time.Second
34 fmt.Printf("\nDefault timeout: %s\n", timeout)
35}
Network Addresses Deep Dive: Every network communication needs an address—a way to identify the destination. In Go, addresses are more than just strings; they're typed structures that validate format and provide useful methods. The net.TCPAddr, net.UDPAddr, and net.UnixAddr types give you compile-time safety and runtime flexibility.
When you parse an address like localhost:8080, Go resolves the hostname, validates the port number, and creates a structure that can be used with various network operations. This abstraction means you can write code that works with IPv4, IPv6, and Unix domain sockets without changing your core logic.
Network Types in Go
⚠️ Important: Understanding network types is crucial for building efficient applications. Using the wrong network type can lead to performance issues or security vulnerabilities.
Go supports multiple network types, each designed for specific use cases:
1package main
2
3import (
4 "fmt"
5 "net"
6)
7
8// run
9func main() {
10 // Different network types
11 networks := []string{
12 "tcp", // TCP/IPv4 and IPv6
13 "tcp4", // TCP/IPv4 only
14 "tcp6", // TCP/IPv6 only
15 "udp", // UDP/IPv4 and IPv6
16 "udp4", // UDP/IPv4 only
17 "udp6", // UDP/IPv6 only
18 "ip", // Raw IP packets
19 "unix", // Unix domain sockets
20 }
21
22 fmt.Println("Supported Network Types:")
23 for _, network := range networks {
24 fmt.Printf(" - %s\n", network)
25 }
26
27 // IP address classification
28 testIPs := []string{
29 "127.0.0.1",
30 "::1",
31 "192.168.1.1",
32 "8.8.8.8",
33 "2001:4860:4860::8888",
34 }
35
36 fmt.Println("\nIP Address Classification:")
37 for _, ipStr := range testIPs {
38 ip := net.ParseIP(ipStr)
39 if ip == nil {
40 continue
41 }
42 fmt.Printf(" %s: ", ipStr)
43 if ip.IsLoopback() {
44 fmt.Print("Loopback ")
45 }
46 if ip.IsPrivate() {
47 fmt.Print("Private ")
48 }
49 if ip.To4() != nil {
50 fmt.Print("IPv4")
51 } else {
52 fmt.Print("IPv6")
53 }
54 fmt.Println()
55 }
56}
Network Error Handling
Network operations can fail in numerous ways, and robust applications must handle these failures gracefully. Understanding common network errors helps you build resilient applications.
1package main
2
3import (
4 "fmt"
5 "net"
6 "os"
7 "syscall"
8)
9
10// run
11func main() {
12 // Demonstrate different network error types
13 demonstrateNetworkErrors()
14}
15
16func demonstrateNetworkErrors() {
17 // Connection refused error
18 conn, err := net.Dial("tcp", "localhost:99999")
19 if err != nil {
20 if opErr, ok := err.(*net.OpError); ok {
21 fmt.Printf("Operation: %s\n", opErr.Op)
22 fmt.Printf("Network: %s\n", opErr.Net)
23 fmt.Printf("Error: %v\n", opErr.Err)
24
25 // Check specific error types
26 if opErr.Timeout() {
27 fmt.Println("This was a timeout error")
28 }
29 if opErr.Temporary() {
30 fmt.Println("This is a temporary error, retry may succeed")
31 }
32 }
33 }
34 if conn != nil {
35 conn.Close()
36 }
37
38 // DNS errors
39 _, err = net.LookupHost("this-domain-does-not-exist-12345.com")
40 if err != nil {
41 if dnsErr, ok := err.(*net.DNSError); ok {
42 fmt.Printf("\nDNS Error: %s\n", dnsErr.Name)
43 fmt.Printf("Temporary: %t, Timeout: %t\n", dnsErr.IsTemporary, dnsErr.IsTimeout)
44 }
45 }
46}
Common Network Errors:
- Connection Refused: Target port isn't listening
- Network Unreachable: Routing problem or network down
- Host Unreachable: Target host not responding
- Timeout: Operation took too long
- Connection Reset: Remote end closed abruptly
- DNS Failure: Hostname resolution failed
Network Interface Information
Understanding your system's network interfaces is crucial for debugging and for applications that need to listen on specific interfaces.
1package main
2
3import (
4 "fmt"
5 "net"
6)
7
8// run
9func main() {
10 // Get all network interfaces
11 interfaces, err := net.Interfaces()
12 if err != nil {
13 fmt.Println("Error:", err)
14 return
15 }
16
17 fmt.Println("Network Interfaces:")
18 for _, iface := range interfaces {
19 fmt.Printf("\nInterface: %s\n", iface.Name)
20 fmt.Printf(" MTU: %d\n", iface.MTU)
21 fmt.Printf(" Hardware Address: %s\n", iface.HardwareAddr)
22 fmt.Printf(" Flags: %s\n", iface.Flags)
23
24 // Get addresses for this interface
25 addrs, err := iface.Addrs()
26 if err != nil {
27 continue
28 }
29
30 for _, addr := range addrs {
31 fmt.Printf(" Address: %s\n", addr)
32 }
33
34 // Check interface status
35 if iface.Flags&net.FlagUp != 0 {
36 fmt.Println(" Status: UP")
37 }
38 if iface.Flags&net.FlagLoopback != 0 {
39 fmt.Println(" Type: Loopback")
40 }
41 }
42}
TCP Server and Client
TCP (Transmission Control Protocol) is the workhorse of the internet. When you load a webpage, send an email, or transfer a file, TCP is usually involved. It provides reliable, ordered, and error-checked delivery of data between applications.
Why TCP Matters: Unlike UDP, TCP guarantees that your data arrives intact and in order. It handles packet loss, retransmission, flow control, and congestion control automatically. This makes TCP perfect for applications where data integrity is crucial: databases, file transfers, HTTP, SSH, and most network protocols.
The Three-Way Handshake: Every TCP connection begins with a three-way handshake (SYN, SYN-ACK, ACK). Go's net.Dial handles this automatically, but understanding it helps you debug connection issues and set appropriate timeouts.
Now let's move from theory to practice. TCP is like registered mail—it guarantees delivery, maintains order, and ensures data integrity. This reliability makes TCP perfect for most applications where data must arrive correctly and in order.
Think of a TCP connection as a phone conversation—you establish a connection, exchange messages back and forth, and then hang up. The connection ensures that messages arrive in the order they were sent and that none are lost along the way.
Basic TCP Server
TCP Connection Lifecycle: Understanding the lifecycle of a TCP connection helps you write better network code. A connection goes through several states: CLOSED → SYN_SENT → ESTABLISHED → FIN_WAIT → TIME_WAIT → CLOSED. Go abstracts this complexity, but knowing it helps with debugging.
Socket Basics: At the OS level, sockets are file descriptors. When you call net.Listen, Go creates a socket, binds it to an address, and listens for connections. Each accepted connection gets its own socket. This multiplexing is handled by the OS kernel, making Go servers highly efficient.
Concurrent Client Handling: The power of Go's networking comes from goroutines. Each client can have its own goroutine without the overhead of threads. This enables servers to handle thousands or even millions of concurrent connections—something that would be impractical with traditional thread-per-client models.
Let's start with a simple echo server that demonstrates the fundamental TCP concepts. This server listens for connections, receives messages, and echoes them back in uppercase.
1package main
2
3import (
4 "bufio"
5 "fmt"
6 "net"
7 "strings"
8 "time"
9)
10
11// run
12func main() {
13 // Start server in background
14 go startEchoServer()
15
16 // Give server time to start
17 time.Sleep(100 * time.Millisecond)
18
19 // Test the server
20 testClient()
21}
22
23func startEchoServer() {
24 listener, err := net.Listen("tcp", "localhost:8080")
25 if err != nil {
26 fmt.Println("Server error:", err)
27 return
28 }
29 defer listener.Close()
30
31 fmt.Println("Echo server listening on localhost:8080")
32
33 // Accept one connection for demo
34 conn, err := listener.Accept()
35 if err != nil {
36 fmt.Println("Accept error:", err)
37 return
38 }
39
40 handleConnection(conn)
41}
42
43func handleConnection(conn net.Conn) {
44 defer conn.Close()
45
46 fmt.Printf("Client connected: %s\n", conn.RemoteAddr())
47
48 scanner := bufio.NewScanner(conn)
49 for scanner.Scan() {
50 msg := scanner.Text()
51 fmt.Printf("Received: %s\n", msg)
52
53 // Echo back
54 response := strings.ToUpper(msg) + "\n"
55 conn.Write([]byte(response))
56
57 if msg == "quit" {
58 break
59 }
60 }
61
62 fmt.Println("Client disconnected")
63}
64
65func testClient() {
66 conn, err := net.Dial("tcp", "localhost:8080")
67 if err != nil {
68 fmt.Println("Client error:", err)
69 return
70 }
71 defer conn.Close()
72
73 messages := []string{"hello", "world", "quit"}
74
75 for _, msg := range messages {
76 fmt.Printf("Sending: %s\n", msg)
77 fmt.Fprintf(conn, "%s\n", msg)
78
79 response, _ := bufio.NewReader(conn).ReadString('\n')
80 fmt.Printf("Response: %s", response)
81 }
82}
TCP State Machine and Connection Management
Understanding TCP's internal state machine helps you write more robust network code and debug connection issues effectively.
The Eleven States: TCP connections transition through states: CLOSED, LISTEN, SYN-SENT, SYN-RECEIVED, ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT. Each state has specific meanings and transitions.
ESTABLISHED State: This is the normal operating state where data can be transferred bidirectionally. Most of a connection's lifetime is spent here. Both sides can send and receive data freely.
TIME-WAIT State: After closing a connection, it enters TIME-WAIT for 2*MSL (Maximum Segment Lifetime, typically 2-4 minutes). This prevents old duplicate packets from interfering with new connections using the same port. High-traffic servers can exhaust ports due to TIME-WAIT connections—this is why connection pooling matters.
Half-Close: TCP supports half-close—one direction closed while the other remains open. Go's net.TCPConn has CloseRead() and CloseWrite() for this. Half-close is useful when you've finished sending but want to receive acknowledgments or remaining data.
TCP Options: Beyond basic headers, TCP supports options for features like: window scaling (for high-bandwidth connections), timestamps (for RTT measurement), selective acknowledgment (SACK) for efficient retransmission, and maximum segment size (MSS) negotiation.
TCP Flow Control and Congestion Control
Network Security Best Practices
Network security is critical for production applications. Even if your application doesn't handle sensitive data directly, it's part of a larger system that might.
Input Validation: Never trust network input. Validate sizes, formats, and content. A malicious client could send gigabyte messages to exhaust memory, or crafted data to exploit parsing bugs.
Rate Limiting: Protect against DOS attacks by limiting requests per IP. Use token buckets or sliding windows. Go's golang.org/x/time/rate package provides excellent primitives.
TLS Everywhere: Use TLS (HTTPS) for all external communication. Let's Encrypt provides free certificates. Go's crypto/tls package makes TLS easy—often just changing "tcp" to "tls" in dial/listen calls.
Certificate Validation: Always validate TLS certificates. Don't skip verification in production (a common debugging hack that becomes a security hole). Pin certificates for critical services.
Timeout All Operations: Attackers can slowloris—open connections and send data very slowly to exhaust resources. Timeouts protect against this. Set read/write deadlines on all connections.
Resource Limits: Limit concurrent connections, request sizes, request rates. Monitor resource usage. Implement graceful degradation when limits are approached.
Network Testing Strategies
Unit Testing with net.Pipe: For testing protocol implementations without actual network, use net.Pipe(). It creates connected in-memory connections perfect for testing.
Loopback Testing: Tests can listen on localhost and connect to themselves. This tests real network stack but avoids external dependencies.
Table-Driven Tests: Network code has many edge cases—test them systematically with table-driven tests covering: normal operation, timeouts, connection loss, malformed data, large messages, concurrent operations.
Race Detection: Network code is inherently concurrent. Always run tests with race detector (go test -race). Catches subtle concurrency bugs that might only appear under load.
Load Testing: Functional tests aren't enough—test under load to find bottlenecks, memory leaks, and race conditions that only appear with many concurrent connections.
Network Monitoring and Observability
Metrics to Track: Request rate (requests/second), error rate (errors/second), latency (p50, p90, p99), connection count (active connections), throughput (bytes/second).
Structured Logging: Log connection establishment, errors, slow requests. Use structured logging (JSON) for easy parsing. Include request IDs for tracing.
Distributed Tracing: For microservices, implement tracing (OpenTelemetry, Zipkin, Jaeger). Traces show request flow through services, identifying bottlenecks.
Health Checks: Expose health check endpoints. Kubernetes and load balancers use these to route traffic only to healthy instances.
Connection Metrics: Track connection pool statistics—active connections, wait time, creation rate. These reveal bottlenecks and sizing issues.
Flow Control: Prevents fast senders from overwhelming slow receivers. TCP uses a sliding window—the receiver advertises how much buffer space it has, and the sender won't exceed this. Go handles this automatically, but understanding it explains why sending can block.
Congestion Control: Prevents senders from overwhelming the network. TCP uses algorithms like Reno, Cubic, and BBR to detect congestion (via packet loss or RTT increases) and reduce send rate accordingly. This is why network throughput varies—TCP constantly adapts to conditions.
Nagle's Algorithm: Combines small writes into larger packets to reduce overhead. Can add latency for interactive applications. Disable with SetNoDelay(true) when you need immediate transmission. SSH and Telnet disable Nagle for responsive interaction.
TCP Fast Open: Extension allowing data in the initial SYN packet, reducing handshake latency by one RTT. Requires OS and remote support. Go's standard library doesn't expose TFO directly, but the OS may use it transparently.
Concurrent TCP Server
A real-world TCP server needs to handle multiple clients simultaneously. Think of this as a receptionist handling multiple phone calls—you need a system to manage multiple conversations without mixing them up. In Go, goroutines make this surprisingly elegant.
A production TCP server handles multiple connections concurrently:
1package main
2
3import (
4 "bufio"
5 "context"
6 "fmt"
7 "net"
8 "sync"
9 "sync/atomic"
10 "time"
11)
12
13// run
14func main() {
15 server := NewTCPServer("localhost:9090")
16
17 ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
18 defer cancel()
19
20 // Run server
21 go func() {
22 if err := server.Start(ctx); err != nil {
23 fmt.Println("Server error:", err)
24 }
25 }()
26
27 // Give server time to start
28 time.Sleep(100 * time.Millisecond)
29
30 // Simulate multiple clients
31 var wg sync.WaitGroup
32 for i := 0; i < 3; i++ {
33 wg.Add(1)
34 go func(id int) {
35 defer wg.Done()
36 simulateClient(id)
37 }(i)
38 }
39
40 wg.Wait()
41 time.Sleep(100 * time.Millisecond)
42
43 stats := server.Stats()
44 fmt.Printf("\nServer Statistics:\n")
45 fmt.Printf(" Total Connections: %d\n", stats.TotalConnections)
46 fmt.Printf(" Active Connections: %d\n", stats.ActiveConnections)
47 fmt.Printf(" Messages Processed: %d\n", stats.MessagesProcessed)
48}
49
50type TCPServer struct {
51 addr string
52 listener net.Listener
53 wg sync.WaitGroup
54 totalConnections atomic.Int64
55 activeConnections atomic.Int64
56 messagesProcessed atomic.Int64
57}
58
59type ServerStats struct {
60 TotalConnections int64
61 ActiveConnections int64
62 MessagesProcessed int64
63}
64
65func NewTCPServer(addr string) *TCPServer {
66 return &TCPServer{addr: addr}
67}
68
69func Start(ctx context.Context) error {
70 var err error
71 s.listener, err = net.Listen("tcp", s.addr)
72 if err != nil {
73 return err
74 }
75 defer s.listener.Close()
76
77 fmt.Printf("TCP Server started on %s\n", s.addr)
78
79 // Accept connections until context is cancelled
80 go func() {
81 <-ctx.Done()
82 s.listener.Close()
83 }()
84
85 for {
86 conn, err := s.listener.Accept()
87 if err != nil {
88 // Check if shutdown was requested
89 select {
90 case <-ctx.Done():
91 break
92 default:
93 fmt.Println("Accept error:", err)
94 continue
95 }
96 break
97 }
98
99 s.totalConnections.Add(1)
100 s.activeConnections.Add(1)
101
102 s.wg.Add(1)
103 go s.handleConnection(conn)
104 }
105
106 // Wait for all connections to finish
107 s.wg.Wait()
108 return nil
109}
110
111func handleConnection(conn net.Conn) {
112 defer s.wg.Done()
113 defer conn.Close()
114 defer s.activeConnections.Add(-1)
115
116 // Set read timeout
117 conn.SetReadDeadline(time.Now().Add(5 * time.Second))
118
119 scanner := bufio.NewScanner(conn)
120 for scanner.Scan() {
121 s.messagesProcessed.Add(1)
122 msg := scanner.Text()
123
124 // Process message
125 response := fmt.Sprintf("Processed: %s\n", msg)
126 conn.Write([]byte(response))
127
128 // Reset deadline
129 conn.SetReadDeadline(time.Now().Add(5 * time.Second))
130 }
131}
132
133func Stats() ServerStats {
134 return ServerStats{
135 TotalConnections: s.totalConnections.Load(),
136 ActiveConnections: s.activeConnections.Load(),
137 MessagesProcessed: s.messagesProcessed.Load(),
138 }
139}
140
141func simulateClient(id int) {
142 conn, err := net.Dial("tcp", "localhost:9090")
143 if err != nil {
144 fmt.Printf("Client %d error: %v\n", id, err)
145 return
146 }
147 defer conn.Close()
148
149 msg := fmt.Sprintf("Message from client %d", id)
150 fmt.Fprintf(conn, "%s\n", msg)
151
152 response, _ := bufio.NewReader(conn).ReadString('\n')
153 fmt.Printf("Client %d received: %s", id, response)
154}
TCP Client with Retry Logic
Network connections are unreliable—servers go down, networks fail, and timeouts occur. A production client needs to be resilient, like a determined delivery driver who tries multiple routes and times to deliver a package.
⚠️ Important: Network failures are the rule, not the exception. Always implement retry logic with exponential backoff to handle temporary failures gracefully.
Production clients need robust error handling and retry mechanisms:
1package main
2
3import (
4 "fmt"
5 "net"
6 "time"
7)
8
9// run
10func main() {
11 client := NewTCPClient(RetryConfig{
12 MaxRetries: 3,
13 InitialWait: 100 * time.Millisecond,
14 MaxWait: 2 * time.Second,
15 Multiplier: 2.0,
16 })
17
18 // Try to connect to a server
19 conn, err := client.Connect("localhost:8888")
20 if err != nil {
21 fmt.Printf("Failed to connect: %v\n", err)
22 } else {
23 fmt.Println("Connected successfully")
24 conn.Close()
25 }
26}
27
28type RetryConfig struct {
29 MaxRetries int
30 InitialWait time.Duration
31 MaxWait time.Duration
32 Multiplier float64
33}
34
35type TCPClient struct {
36 config RetryConfig
37}
38
39func NewTCPClient(config RetryConfig) *TCPClient {
40 return &TCPClient{config: config}
41}
42
43func Connect(addr string) {
44 var conn net.Conn
45 var err error
46
47 wait := c.config.InitialWait
48
49 for attempt := 0; attempt <= c.config.MaxRetries; attempt++ {
50 if attempt > 0 {
51 fmt.Printf("Retry attempt %d/%d...\n",
52 attempt, c.config.MaxRetries, wait)
53 time.Sleep(wait)
54
55 // Exponential backoff
56 wait = time.Duration(float64(wait) * c.config.Multiplier)
57 if wait > c.config.MaxWait {
58 wait = c.config.MaxWait
59 }
60 }
61
62 fmt.Printf("Connecting to %s...\n", addr)
63 conn, err = net.DialTimeout("tcp", addr, 2*time.Second)
64 if err == nil {
65 return conn, nil
66 }
67
68 fmt.Printf("Connection failed: %v\n", err)
69 }
70
71 return nil, fmt.Errorf("failed after %d attempts: %w", c.config.MaxRetries+1, err)
72}
73
74func ConnectWithContext(addr string, timeout time.Duration) {
75 var d net.Dialer
76 d.Timeout = timeout
77
78 return d.Dial("tcp", addr)
79}
Binary Protocol TCP Server
While text protocols are human-readable, binary protocols are more efficient. Think of this as the difference between writing a letter in English versus using a secret code—binary protocols use the most compact representation possible.
Real-world Example
Database protocols like PostgreSQL and MySQL use binary protocols for efficiency. Redis, the popular in-memory database, uses a simple binary protocol that's both fast and easy to implement.
Example of a TCP server using a binary protocol:
1package main
2
3import (
4 "encoding/binary"
5 "fmt"
6 "io"
7 "net"
8 "time"
9)
10
11// run
12func main() {
13 // Start binary protocol server
14 go startBinaryServer()
15
16 time.Sleep(100 * time.Millisecond)
17
18 // Test client
19 testBinaryClient()
20}
21
22// Message format:
23// [4 bytes: length][1 byte: type][N bytes: payload]
24type Message struct {
25 Type byte
26 Payload []byte
27}
28
29const (
30 MsgTypeEcho byte = 1
31 MsgTypeReverse byte = 2
32 MsgTypeQuit byte = 3
33)
34
35func startBinaryServer() {
36 listener, err := net.Listen("tcp", "localhost:7070")
37 if err != nil {
38 fmt.Println("Server error:", err)
39 return
40 }
41 defer listener.Close()
42
43 fmt.Println("Binary protocol server listening on localhost:7070")
44
45 conn, err := listener.Accept()
46 if err != nil {
47 return
48 }
49
50 handleBinaryConnection(conn)
51}
52
53func handleBinaryConnection(conn net.Conn) {
54 defer conn.Close()
55
56 for {
57 msg, err := readMessage(conn)
58 if err != nil {
59 if err != io.EOF {
60 fmt.Println("Read error:", err)
61 }
62 return
63 }
64
65 fmt.Printf("Received message type %d: %s\n", msg.Type, string(msg.Payload))
66
67 var response Message
68 switch msg.Type {
69 case MsgTypeEcho:
70 response = Message{Type: MsgTypeEcho, Payload: msg.Payload}
71 case MsgTypeReverse:
72 reversed := reverseBytes(msg.Payload)
73 response = Message{Type: MsgTypeReverse, Payload: reversed}
74 case MsgTypeQuit:
75 fmt.Println("Quit message received")
76 return
77 default:
78 fmt.Printf("Unknown message type: %d\n", msg.Type)
79 continue
80 }
81
82 if err := writeMessage(conn, response); err != nil {
83 fmt.Println("Write error:", err)
84 return
85 }
86 }
87}
88
89func readMessage(conn net.Conn) {
90 // Read length
91 var length uint32
92 if err := binary.Read(conn, binary.BigEndian, &length); err != nil {
93 return Message{}, err
94 }
95
96 // Read type
97 var msgType byte
98 if err := binary.Read(conn, binary.BigEndian, &msgType); err != nil {
99 return Message{}, err
100 }
101
102 // Read payload
103 payload := make([]byte, length)
104 if _, err := io.ReadFull(conn, payload); err != nil {
105 return Message{}, err
106 }
107
108 return Message{Type: msgType, Payload: payload}, nil
109}
110
111func writeMessage(conn net.Conn, msg Message) error {
112 // Write length
113 length := uint32(len(msg.Payload))
114 if err := binary.Write(conn, binary.BigEndian, length); err != nil {
115 return err
116 }
117
118 // Write type
119 if err := binary.Write(conn, binary.BigEndian, msg.Type); err != nil {
120 return err
121 }
122
123 // Write payload
124 _, err := conn.Write(msg.Payload)
125 return err
126}
127
128func reverseBytes(b []byte) []byte {
129 result := make([]byte, len(b))
130 for i := 0; i < len(b); i++ {
131 result[i] = b[len(b)-1-i]
132 }
133 return result
134}
135
136func testBinaryClient() {
137 conn, err := net.Dial("tcp", "localhost:7070")
138 if err != nil {
139 fmt.Println("Client error:", err)
140 return
141 }
142 defer conn.Close()
143
144 // Send echo message
145 writeMessage(conn, Message{Type: MsgTypeEcho, Payload: []byte("Hello")})
146 response, _ := readMessage(conn)
147 fmt.Printf("Echo response: %s\n", string(response.Payload))
148
149 // Send reverse message
150 writeMessage(conn, Message{Type: MsgTypeReverse, Payload: []byte("World")})
151 response, _ = readMessage(conn)
152 fmt.Printf("Reverse response: %s\n", string(response.Payload))
153
154 // Send quit
155 writeMessage(conn, Message{Type: MsgTypeQuit, Payload: nil})
156}
UDP Communication
When Speed Trumps Reliability: UDP (User Datagram Protocol) is connectionless—like sending a postcard instead of registered mail. You send data and hope it arrives, but you get no confirmation and no guarantees about order or delivery. This sounds risky, but for certain applications, it's exactly what you need.
Use Cases for UDP: Real-time video streaming can tolerate occasional frame loss but can't tolerate delays from retransmission. Online gaming needs low latency more than perfect reliability—a dropped position update isn't worth the delay of retransmission. DNS queries are simple request-response pairs where UDP's overhead savings matter.
UDP's Hidden Power: Without connection state, a single UDP server can efficiently handle millions of clients. Without retransmission logic, UDP has minimal latency. For broadcast and multicast scenarios, UDP is often the only practical choice.
If TCP is like a phone conversation, UDP is like sending postcards—fast, lightweight, but with no guarantee of delivery or order. UDP is connectionless and provides no guarantees about delivery, ordering, or duplicate protection. It's ideal for applications where low latency is more important than reliability.
When to Use UDP vs TCP
- UDP: Gaming, video streaming, voice calls, DNS lookups
- TCP: Web browsing, email, file transfers
Basic UDP Server and Client
UDP Performance Characteristics: UDP has lower latency than TCP because there's no connection setup and no acknowledgment overhead. For a local network, UDP can achieve sub-millisecond latency, while TCP typically requires several milliseconds for the handshake alone.
UDP Packet Size Considerations: UDP packets larger than the MTU (typically 1500 bytes for Ethernet) get fragmented at the IP layer. Fragmentation can cause problems: if any fragment is lost, the entire packet is lost. For maximum reliability, keep UDP packets under 1472 bytes (1500 - 20 IP header - 8 UDP header).
MTU Discovery: The Path MTU (PMTU) can vary across networks. While TCP handles this automatically, UDP applications may need to implement their own MTU discovery or use conservative packet sizes. For internet applications, 512 bytes is safe; 1024 bytes is usually safe; 1472 bytes is optimal for most networks.
Let's explore UDP with a simple echo server. Notice how much simpler the code is compared to TCP—there's no need to maintain connections or handle the complexity of reliable delivery.
1package main
2
3import (
4 "fmt"
5 "net"
6 "time"
7)
8
9// run
10func main() {
11 // Start UDP server
12 go startUDPServer()
13
14 time.Sleep(100 * time.Millisecond)
15
16 // Test UDP client
17 testUDPClient()
18}
19
20func startUDPServer() {
21 addr, err := net.ResolveUDPAddr("udp", "localhost:6060")
22 if err != nil {
23 fmt.Println("Resolve error:", err)
24 return
25 }
26
27 conn, err := net.ListenUDP("udp", addr)
28 if err != nil {
29 fmt.Println("Listen error:", err)
30 return
31 }
32 defer conn.Close()
33
34 fmt.Println("UDP server listening on localhost:6060")
35
36 buffer := make([]byte, 1024)
37
38 // Handle a few packets for demo
39 for i := 0; i < 5; i++ {
40 n, clientAddr, err := conn.ReadFromUDP(buffer)
41 if err != nil {
42 fmt.Println("Read error:", err)
43 continue
44 }
45
46 msg := string(buffer[:n])
47 fmt.Printf("Received from %s: %s\n", clientAddr, msg)
48
49 // Echo back
50 response := fmt.Sprintf("Echo: %s", msg)
51 conn.WriteToUDP([]byte(response), clientAddr)
52 }
53}
54
55func testUDPClient() {
56 addr, err := net.ResolveUDPAddr("udp", "localhost:6060")
57 if err != nil {
58 fmt.Println("Resolve error:", err)
59 return
60 }
61
62 conn, err := net.DialUDP("udp", nil, addr)
63 if err != nil {
64 fmt.Println("Dial error:", err)
65 return
66 }
67 defer conn.Close()
68
69 messages := []string{"Hello", "UDP", "World"}
70
71 for _, msg := range messages {
72 fmt.Printf("Sending: %s\n", msg)
73 conn.Write([]byte(msg))
74
75 // Read response
76 buffer := make([]byte, 1024)
77 conn.SetReadDeadline(time.Now().Add(1 * time.Second))
78 n, err := conn.Read(buffer)
79 if err != nil {
80 fmt.Println("Read error:", err)
81 continue
82 }
83
84 fmt.Printf("Response: %s\n", string(buffer[:n]))
85 }
86}
UDP Multicast
Multicast is like a radio broadcast—one sender transmits, and multiple receivers can tune in to listen. This is incredibly efficient for applications like stock quote feeds, video streaming, or service discovery where you need to send the same data to many recipients.
Real-world Example
Video conferencing apps often use multicast for screen sharing. The host's screen data is sent once, and all participants receive it simultaneously, reducing bandwidth usage compared to sending individual streams to each participant.
UDP supports multicast for sending data to multiple receivers:
1package main
2
3import (
4 "fmt"
5 "net"
6 "time"
7)
8
9// run
10func main() {
11 // Start multicast listeners
12 go startMulticastListener("Listener-1")
13 go startMulticastListener("Listener-2")
14
15 time.Sleep(200 * time.Millisecond)
16
17 // Send multicast messages
18 sendMulticast()
19}
20
21const (
22 multicastAddr = "224.0.0.1:9999"
23)
24
25func startMulticastListener(name string) {
26 addr, err := net.ResolveUDPAddr("udp", multicastAddr)
27 if err != nil {
28 fmt.Printf("%s: Resolve error: %v\n", name, err)
29 return
30 }
31
32 conn, err := net.ListenMulticastUDP("udp", nil, addr)
33 if err != nil {
34 fmt.Printf("%s: Listen error: %v\n", name, err)
35 return
36 }
37 defer conn.Close()
38
39 fmt.Printf("%s: Listening on %s\n", name, multicastAddr)
40
41 buffer := make([]byte, 1024)
42 conn.SetReadDeadline(time.Now().Add(2 * time.Second))
43
44 for {
45 n, src, err := conn.ReadFromUDP(buffer)
46 if err != nil {
47 break
48 }
49
50 msg := string(buffer[:n])
51 fmt.Printf("%s: Received from %s: %s\n", name, src, msg)
52 }
53}
54
55func sendMulticast() {
56 addr, err := net.ResolveUDPAddr("udp", multicastAddr)
57 if err != nil {
58 fmt.Println("Resolve error:", err)
59 return
60 }
61
62 conn, err := net.DialUDP("udp", nil, addr)
63 if err != nil {
64 fmt.Println("Dial error:", err)
65 return
66 }
67 defer conn.Close()
68
69 messages := []string{"Multicast Message 1", "Multicast Message 2"}
70
71 for _, msg := range messages {
72 fmt.Printf("Broadcasting: %s\n", msg)
73 conn.Write([]byte(msg))
74 time.Sleep(100 * time.Millisecond)
75 }
76}
Reliable UDP with Acknowledgments
Sometimes you need the speed of UDP with the reliability of TCP. This is like certified mail—you send a postcard but require a return receipt to confirm delivery. By adding acknowledgments, we can build reliability on top of UDP's lightweight foundation.
⚠️ Important: Building reliable protocols is complex! Before implementing your own, consider whether TCP meets your needs. Only implement custom reliability when you have very specific requirements.
Implementing reliability on top of UDP:
1package main
2
3import (
4 "encoding/binary"
5 "fmt"
6 "net"
7 "sync"
8 "time"
9)
10
11// run
12func main() {
13 // Start reliable UDP server
14 go startReliableUDPServer()
15
16 time.Sleep(100 * time.Millisecond)
17
18 // Test reliable client
19 testReliableUDPClient()
20}
21
22type ReliablePacket struct {
23 SeqNum uint32
24 Ack bool
25 Payload []byte
26}
27
28func encodePacket(pkt ReliablePacket) []byte {
29 buf := make([]byte, 5+len(pkt.Payload))
30 binary.BigEndian.PutUint32(buf[0:4], pkt.SeqNum)
31 if pkt.Ack {
32 buf[4] = 1
33 } else {
34 buf[4] = 0
35 }
36 copy(buf[5:], pkt.Payload)
37 return buf
38}
39
40func decodePacket(data []byte) ReliablePacket {
41 if len(data) < 5 {
42 return ReliablePacket{}
43 }
44
45 return ReliablePacket{
46 SeqNum: binary.BigEndian.Uint32(data[0:4]),
47 Ack: data[4] == 1,
48 Payload: data[5:],
49 }
50}
51
52func startReliableUDPServer() {
53 addr, _ := net.ResolveUDPAddr("udp", "localhost:5050")
54 conn, err := net.ListenUDP("udp", addr)
55 if err != nil {
56 fmt.Println("Server error:", err)
57 return
58 }
59 defer conn.Close()
60
61 fmt.Println("Reliable UDP server listening on localhost:5050")
62
63 buffer := make([]byte, 1024)
64
65 for i := 0; i < 3; i++ {
66 n, clientAddr, err := conn.ReadFromUDP(buffer)
67 if err != nil {
68 continue
69 }
70
71 pkt := decodePacket(buffer[:n])
72 if pkt.Ack {
73 continue // Skip ACK packets
74 }
75
76 fmt.Printf("Received packet #%d: %s\n", pkt.SeqNum, string(pkt.Payload))
77
78 // Send ACK
79 ack := ReliablePacket{SeqNum: pkt.SeqNum, Ack: true}
80 conn.WriteToUDP(encodePacket(ack), clientAddr)
81 }
82}
83
84func testReliableUDPClient() {
85 addr, _ := net.ResolveUDPAddr("udp", "localhost:5050")
86 conn, err := net.DialUDP("udp", nil, addr)
87 if err != nil {
88 fmt.Println("Client error:", err)
89 return
90 }
91 defer conn.Close()
92
93 sender := &ReliableUDPSender{
94 conn: conn,
95 timeout: 500 * time.Millisecond,
96 maxRetries: 3,
97 }
98
99 messages := []string{"Packet 1", "Packet 2", "Packet 3"}
100
101 for i, msg := range messages {
102 if err := sender.Send(uint32(i+1), []byte(msg)); err != nil {
103 fmt.Printf("Failed to send: %v\n", err)
104 } else {
105 fmt.Printf("Successfully sent and acknowledged: %s\n", msg)
106 }
107 }
108}
109
110type ReliableUDPSender struct {
111 conn *net.UDPConn
112 timeout time.Duration
113 maxRetries int
114 mu sync.Mutex
115}
116
117func Send(seqNum uint32, payload []byte) error {
118 s.mu.Lock()
119 defer s.mu.Unlock()
120
121 pkt := ReliablePacket{SeqNum: seqNum, Ack: false, Payload: payload}
122 data := encodePacket(pkt)
123
124 for attempt := 0; attempt < s.maxRetries; attempt++ {
125 // Send packet
126 _, err := s.conn.Write(data)
127 if err != nil {
128 return err
129 }
130
131 // Wait for ACK
132 s.conn.SetReadDeadline(time.Now().Add(s.timeout))
133 buffer := make([]byte, 1024)
134 n, err := s.conn.Read(buffer)
135
136 if err == nil {
137 ack := decodePacket(buffer[:n])
138 if ack.Ack && ack.SeqNum == seqNum {
139 return nil // Success
140 }
141 }
142
143 fmt.Printf("Retry attempt %d for packet #%d\n", attempt+1, seqNum)
144 }
145
146 return fmt.Errorf("failed to receive ACK after %d attempts", s.maxRetries)
147}
UDP Packet Loss Handling
UDP's unreliability means packets can be lost, duplicated, or reordered. Production UDP applications need strategies to handle these issues.
Common Strategies:
- Sequence Numbers: Add sequence numbers to detect loss and reordering
- Acknowledgments: Receivers confirm receipt, sender retransmits if no ACK
- Timeout and Retry: Retransmit after timeout
- Redundancy: Send critical data multiple times
- Forward Error Correction: Add redundant data for reconstruction
When to Use Each Strategy: Low-latency applications (gaming) prefer redundancy over retransmission. Critical data transfer uses ACKs and retransmission. Streaming media uses forward error correction to avoid retransmission latency.
UDP Socket Options
UDP sockets support various options for controlling behavior. Understanding these options helps you tune performance and reliability.
Buffer Sizes: OS buffers hold received packets. Too small and packets drop during burst traffic; too large and you waste memory. Default is usually 64KB-256KB.
Broadcast and Multicast: UDP supports sending to multiple recipients simultaneously—impossible with TCP. Broadcast reaches all hosts on local network; multicast reaches hosts that join specific groups.
IPv6 Considerations: IPv6 disabled broadcast entirely (use multicast instead). Multicast addresses have specific ranges. Link-local multicast reaches the local network; site-local reaches the organization; global reaches the entire internet.
DNS Lookups and Custom Resolvers
The Internet's Phone Book: DNS (Domain Name System) translates human-readable domain names into IP addresses. When you type golang.org into your browser, DNS resolution happens behind the scenes, converting that name into an IP address like 142.250.191.78.
Beyond Simple Lookups: DNS provides more than just A/AAAA records (IPv4/IPv6 addresses). MX records direct email, TXT records store arbitrary data (often used for verification and configuration), SRV records enable service discovery, and CNAME records create aliases.
Performance Implications: DNS lookups can be slow—often 20-100ms for an uncached query. For high-performance applications, DNS caching is essential. Many network failures are actually DNS failures: timeouts, NXDOMAIN errors, or incorrect responses.
DNS is the phonebook of the internet—it translates human-readable domain names like google.com into IP addresses that computers use to communicate. Without DNS, we'd have to remember IP addresses like 142.250.191.78 instead of google.com.
Go's net package provides comprehensive DNS resolution capabilities that go far beyond simple hostname lookups.
Basic DNS Lookups
DNS Resolution Process: When you call net.LookupIP, Go follows a complex process: check /etc/hosts, query local DNS resolver, resolver may cache or forward to authoritative servers, response travels back through the chain. This can involve multiple servers and take 20-200ms.
DNS Caching Strategies: Go's default resolver uses the system's DNS settings, including its cache. For high-performance applications, implement application-level caching with TTL respect. Balance cache freshness against lookup cost—typical TTLs range from 60 seconds to 24 hours.
DNS Security Considerations: DNS is vulnerable to spoofing and cache poisoning. DNSSEC provides cryptographic verification but isn't universally deployed. For critical applications, validate DNS responses against expected values and use HTTPS to establish secure connections even if DNS is compromised.
Let's start with the fundamentals. DNS can do much more than just look up IP addresses—it can find mail servers, text records, and even help with service discovery.
1package main
2
3import (
4 "fmt"
5 "net"
6 "time"
7)
8
9// run
10func main() {
11 // Simple hostname lookup
12 ips, err := net.LookupIP("golang.org")
13 if err != nil {
14 fmt.Println("Lookup error:", err)
15 return
16 }
17
18 fmt.Println("golang.org resolves to:")
19 for _, ip := range ips {
20 fmt.Printf(" %s\n", ip)
21 }
22
23 // Reverse DNS lookup
24 names, err := net.LookupAddr("8.8.8.8")
25 if err != nil {
26 fmt.Println("Reverse lookup error:", err)
27 } else {
28 fmt.Println("\n8.8.8.8 reverse lookup:")
29 for _, name := range names {
30 fmt.Printf(" %s\n", name)
31 }
32 }
33
34 // MX records
35 mxRecords, err := net.LookupMX("gmail.com")
36 if err != nil {
37 fmt.Println("MX lookup error:", err)
38 } else {
39 fmt.Println("\nMX records for gmail.com:")
40 for _, mx := range mxRecords {
41 fmt.Printf(" %s\n", mx.Host, mx.Pref)
42 }
43 }
44
45 // TXT records
46 txtRecords, err := net.LookupTXT("google.com")
47 if err != nil {
48 fmt.Println("TXT lookup error:", err)
49 } else {
50 fmt.Println("\nTXT records for google.com:")
51 for _, txt := range txtRecords {
52 fmt.Printf(" %s\n", txt)
53 }
54 }
55}
DNS Lookup with Timeout
Network operations can hang indefinitely if you're not careful. Think of this as setting a deadline for a phone call—if the other party doesn't answer within a reasonable time, you hang up and try again or move on. DNS timeouts are crucial for building responsive applications.
1package main
2
3import (
4 "context"
5 "fmt"
6 "net"
7 "time"
8)
9
10// run
11func main() {
12 // Create resolver with timeout
13 resolver := &net.Resolver{
14 PreferGo: true,
15 Dial: func(ctx context.Context, network, address string) {
16 d := net.Dialer{
17 Timeout: time.Second * 2,
18 }
19 return d.DialContext(ctx, network, address)
20 },
21 }
22
23 // Lookup with context timeout
24 ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
25 defer cancel()
26
27 start := time.Now()
28 ips, err := resolver.LookupIP(ctx, "ip", "golang.org")
29 elapsed := time.Since(start)
30
31 if err != nil {
32 fmt.Printf("Lookup error: %v\n", err)
33 return
34 }
35
36 fmt.Printf("Lookup completed in %s\n", elapsed)
37 fmt.Println("Results:")
38 for _, ip := range ips {
39 fmt.Printf(" %s\n", ip)
40 }
41}
Custom DNS Resolver
DNS lookups can be slow, especially when you're repeatedly looking up the same domains. This is like keeping a phone book on your desk instead of looking up numbers online every time. A caching resolver dramatically improves performance by storing recent lookups.
Real-world Example
Content Delivery Networks like Cloudflare heavily optimize DNS caching. When you visit a popular website, the DNS response might be cached at multiple levels, making subsequent visits nearly instant.
Implementing a custom DNS resolver with caching:
1package main
2
3import (
4 "context"
5 "fmt"
6 "net"
7 "sync"
8 "time"
9)
10
11// run
12func main() {
13 resolver := NewCachingResolver(5 * time.Minute)
14
15 // First lookup
16 start := time.Now()
17 ips1, err := resolver.LookupIP(context.Background(), "golang.org")
18 elapsed1 := time.Since(start)
19
20 if err != nil {
21 fmt.Println("Error:", err)
22 return
23 }
24
25 fmt.Printf("First lookup took %s\n", elapsed1)
26 fmt.Println("IPs:", ips1)
27
28 // Second lookup
29 start = time.Now()
30 ips2, err := resolver.LookupIP(context.Background(), "golang.org")
31 elapsed2 := time.Since(start)
32
33 fmt.Printf("\nSecond lookup took %s\n", elapsed2)
34 fmt.Println("IPs:", ips2)
35
36 // Show cache stats
37 fmt.Printf("\nCache Stats: %d entries\n", resolver.CacheSize())
38}
39
40type CachingResolver struct {
41 cache map[string]*cacheEntry
42 cacheMu sync.RWMutex
43 ttl time.Duration
44 resolver *net.Resolver
45}
46
47type cacheEntry struct {
48 ips []net.IP
49 timestamp time.Time
50}
51
52func NewCachingResolver(ttl time.Duration) *CachingResolver {
53 return &CachingResolver{
54 cache: make(map[string]*cacheEntry),
55 ttl: ttl,
56 resolver: &net.Resolver{
57 PreferGo: true,
58 },
59 }
60}
61
62func LookupIP(ctx context.Context, host string) {
63 // Check cache first
64 r.cacheMu.RLock()
65 if entry, exists := r.cache[host]; exists {
66 if time.Since(entry.timestamp) < r.ttl {
67 r.cacheMu.RUnlock()
68 return entry.ips, nil
69 }
70 }
71 r.cacheMu.RUnlock()
72
73 // Cache miss or expired - perform lookup
74 ips, err := r.resolver.LookupIP(ctx, "ip", host)
75 if err != nil {
76 return nil, err
77 }
78
79 // Update cache
80 r.cacheMu.Lock()
81 r.cache[host] = &cacheEntry{
82 ips: ips,
83 timestamp: time.Now(),
84 }
85 r.cacheMu.Unlock()
86
87 return ips, nil
88}
89
90func CacheSize() int {
91 r.cacheMu.RLock()
92 defer r.cacheMu.RUnlock()
93 return len(r.cache)
94}
95
96func ClearCache() {
97 r.cacheMu.Lock()
98 defer r.cacheMu.Unlock()
99 r.cache = make(map[string]*cacheEntry)
100}
DNS-Based Service Discovery
In microservices architectures, services need to find each other dynamically. DNS SRV records are like automated service directories—they tell you not just where a service is, but also which instances are healthy and how to reach them.
Real-world Example
Kubernetes uses DNS for service discovery internally. When you have a service named "user-service" running on 3 pods, Kubernetes creates DNS records that automatically load balance across healthy pods. Other services can simply connect to "user-service.default.svc.cluster.local" without worrying about individual pod IPs.
Using DNS SRV records for service discovery:
1package main
2
3import (
4 "fmt"
5 "net"
6)
7
8// run
9func main() {
10 // SRV lookup format: _service._proto.name
11 // Example: _xmpp-server._tcp.gmail.com
12
13 cname, addrs, err := net.LookupSRV("xmpp-server", "tcp", "gmail.com")
14 if err != nil {
15 fmt.Println("SRV lookup error:", err)
16 return
17 }
18
19 fmt.Printf("Canonical name: %s\n\n", cname)
20 fmt.Println("Service records:")
21
22 for _, addr := range addrs {
23 fmt.Printf(" Target: %s\n", addr.Target)
24 fmt.Printf(" Port: %d\n", addr.Port)
25 fmt.Printf(" Priority: %d\n", addr.Priority)
26 fmt.Printf(" Weight: %d\n\n", addr.Weight)
27 }
28
29 // Simulate service discovery
30 if len(addrs) > 0 {
31 serviceAddr := fmt.Sprintf("%s:%d", addrs[0].Target, addrs[0].Port)
32 fmt.Printf("Connecting to service at: %s\n", serviceAddr)
33 }
34}
Connection Pooling and Timeouts
The Hidden Cost of Connections: Creating a new TCP connection involves DNS lookup, TCP handshake, and possibly TLS negotiation. This can take 100-300ms—an eternity for a modern application. Connection pooling reuses existing connections, reducing latency from 100ms to 1ms.
Timeout Strategies: Timeouts are your first line of defense against hanging applications. Without timeouts, a single slow service can cascade into complete system failure. Go provides multiple timeout mechanisms: dial timeouts, read/write deadlines, and context cancellation.
Connection Health: Not all idle connections are healthy. Network equipment might silently drop connections, remote servers might crash, and middleboxes might interfere. Production connection pools need health checks, automatic reconnection, and graceful degradation.
Now let's dive into production patterns that separate hobby projects from robust, scalable applications. Connection management is like managing a fleet of delivery trucks—you want to reuse them efficiently, maintain them properly, and replace them when they break down.
Efficient connection management is crucial for production applications.
Connection Pool Implementation
Pool Sizing: How many connections should you pool? Too few and you'll bottleneck; too many and you waste resources. A good starting point is 2-10 connections per target host, scaled based on your request rate and latency. Monitor pool utilization and adjust accordingly.
Connection Validation: Pooled connections can become stale—the remote end might close them, or network issues might make them unusable. Implement validation: try to use the connection, catch errors, and create a fresh connection if validation fails.
Pool Monitoring: Production pools need metrics: active connections, idle connections, wait time, creation rate, error rate. These metrics reveal bottlenecks (too small), waste (too large), and connection problems (high error rate).
Creating new connections is expensive—think of it like hiring and training new employees versus reusing experienced ones. A connection pool maintains a set of ready-to-use connections, dramatically improving performance for applications that make frequent network requests.
1package main
2
3import (
4 "context"
5 "errors"
6 "fmt"
7 "net"
8 "sync"
9 "time"
10)
11
12// run
13func main() {
14 // Start a test server
15 go startTestServer()
16 time.Sleep(100 * time.Millisecond)
17
18 // Create connection pool
19 pool := NewConnPool("localhost:8888", 3, 10*time.Second)
20
21 // Use connections
22 var wg sync.WaitGroup
23 for i := 0; i < 5; i++ {
24 wg.Add(1)
25 go func(id int) {
26 defer wg.Done()
27
28 conn, err := pool.Get()
29 if err != nil {
30 fmt.Printf("Worker %d: Failed to get connection: %v\n", id, err)
31 return
32 }
33
34 fmt.Printf("Worker %d: Got connection\n", id)
35
36 // Use connection
37 time.Sleep(100 * time.Millisecond)
38
39 // Return to pool
40 pool.Put(conn)
41 fmt.Printf("Worker %d: Returned connection\n", id)
42 }(i)
43 }
44
45 wg.Wait()
46
47 // Show stats
48 fmt.Printf("\nPool stats: %d active, %d idle\n",
49 pool.ActiveCount(), pool.IdleCount())
50
51 pool.Close()
52}
53
54type ConnPool struct {
55 addr string
56 maxSize int
57 idleTimeout time.Duration
58 connections chan *PooledConn
59 active int
60 mu sync.Mutex
61 closed bool
62}
63
64type PooledConn struct {
65 net.Conn
66 pool *ConnPool
67 createdAt time.Time
68 lastUsed time.Time
69}
70
71func NewConnPool(addr string, maxSize int, idleTimeout time.Duration) *ConnPool {
72 pool := &ConnPool{
73 addr: addr,
74 maxSize: maxSize,
75 idleTimeout: idleTimeout,
76 connections: make(chan *PooledConn, maxSize),
77 }
78
79 // Start connection reaper
80 go pool.reaper()
81
82 return pool
83}
84
85func Get() {
86 p.mu.Lock()
87 if p.closed {
88 p.mu.Unlock()
89 return nil, errors.New("pool is closed")
90 }
91 p.mu.Unlock()
92
93 // Try to get from pool
94 select {
95 case conn := <-p.connections:
96 // Check if connection is still valid
97 if time.Since(conn.lastUsed) > p.idleTimeout {
98 conn.Close()
99 return p.createConnection()
100 }
101 conn.lastUsed = time.Now()
102 return conn, nil
103 default:
104 return p.createConnection()
105 }
106}
107
108func Put(conn *PooledConn) {
109 if conn == nil {
110 return
111 }
112
113 conn.lastUsed = time.Now()
114
115 select {
116 case p.connections <- conn:
117 // Successfully returned to pool
118 default:
119 // Pool is full, close connection
120 conn.Conn.Close()
121 p.mu.Lock()
122 p.active--
123 p.mu.Unlock()
124 }
125}
126
127func createConnection() {
128 p.mu.Lock()
129 if p.active >= p.maxSize {
130 p.mu.Unlock()
131 return nil, errors.New("pool is at maximum capacity")
132 }
133 p.active++
134 p.mu.Unlock()
135
136 conn, err := net.DialTimeout("tcp", p.addr, 5*time.Second)
137 if err != nil {
138 p.mu.Lock()
139 p.active--
140 p.mu.Unlock()
141 return nil, err
142 }
143
144 now := time.Now()
145 return &PooledConn{
146 Conn: conn,
147 pool: p,
148 createdAt: now,
149 lastUsed: now,
150 }, nil
151}
152
153func reaper() {
154 ticker := time.NewTicker(30 * time.Second)
155 defer ticker.Stop()
156
157 for range ticker.C {
158 p.mu.Lock()
159 if p.closed {
160 p.mu.Unlock()
161 return
162 }
163 p.mu.Unlock()
164
165 // Check idle connections
166 for {
167 select {
168 case conn := <-p.connections:
169 if time.Since(conn.lastUsed) > p.idleTimeout {
170 conn.Close()
171 p.mu.Lock()
172 p.active--
173 p.mu.Unlock()
174 } else {
175 // Put it back
176 p.connections <- conn
177 return
178 }
179 default:
180 return
181 }
182 }
183 }
184}
185
186func ActiveCount() int {
187 p.mu.Lock()
188 defer p.mu.Unlock()
189 return p.active
190}
191
192func IdleCount() int {
193 return len(p.connections)
194}
195
196func Close() {
197 p.mu.Lock()
198 p.closed = true
199 p.mu.Unlock()
200
201 close(p.connections)
202 for conn := range p.connections {
203 conn.Close()
204 }
205}
206
207func startTestServer() {
208 listener, _ := net.Listen("tcp", "localhost:8888")
209 defer listener.Close()
210
211 for {
212 conn, err := listener.Accept()
213 if err != nil {
214 return
215 }
216 go func(c net.Conn) {
217 time.Sleep(500 * time.Millisecond)
218 c.Close()
219 }(conn)
220 }
221}
Advanced Connection Management Example
Here's a comprehensive example demonstrating production-ready connection management with all the best practices we've discussed:
1package main
2
3import (
4 "context"
5 "fmt"
6 "net"
7 "sync"
8 "sync/atomic"
9 "time"
10)
11
12// run
13func main() {
14 // Create production-grade connection manager
15 manager := NewConnectionManager(ConnectionConfig{
16 MaxConnections: 100,
17 MaxIdleTime: 5 * time.Minute,
18 HealthCheckPeriod: 30 * time.Second,
19 ConnectTimeout: 10 * time.Second,
20 })
21
22 // Simulate using connections
23 ctx := context.Background()
24 for i := 0; i < 5; i++ {
25 conn, err := manager.Get(ctx, "localhost:8080")
26 if err != nil {
27 fmt.Printf("Failed to get connection: %v\n", err)
28 continue
29 }
30
31 // Use connection
32 fmt.Printf("Got connection: %v\n", conn.RemoteAddr())
33
34 // Return to pool
35 manager.Put(conn)
36 }
37
38 // Show statistics
39 stats := manager.Stats()
40 fmt.Printf("\nConnection Stats:\n")
41 fmt.Printf(" Total: %d\n", stats.Total)
42 fmt.Printf(" Active: %d\n", stats.Active)
43 fmt.Printf(" Idle: %d\n", stats.Idle)
44
45 manager.Close()
46}
47
48// ConnectionConfig holds configuration for connection management
49type ConnectionConfig struct {
50 MaxConnections int // Maximum total connections
51 MaxIdleTime time.Duration // How long idle connections are kept
52 HealthCheckPeriod time.Duration // How often to check connection health
53 ConnectTimeout time.Duration // Timeout for creating new connections
54}
55
56// ConnectionManager manages a pool of network connections
57type ConnectionManager struct {
58 config ConnectionConfig
59 mu sync.RWMutex
60 pools map[string]*connPool // Pool per destination
61 totalConn atomic.Int64 // Total connections across all pools
62 closed atomic.Bool // Whether manager is closed
63}
64
65// connPool manages connections to a single destination
66type connPool struct {
67 addr string
68 conns chan *pooledConn
69 active atomic.Int32
70 mu sync.Mutex
71}
72
73// pooledConn wraps a connection with metadata
74type pooledConn struct {
75 net.Conn
76 pool *connPool
77 createdAt time.Time
78 lastUsed time.Time
79 useCount int
80}
81
82// ConnectionStats provides statistics about the connection manager
83type ConnectionStats struct {
84 Total int64 // Total connections managed
85 Active int64 // Currently active connections
86 Idle int64 // Idle connections in pools
87}
88
89func NewConnectionManager(config ConnectionConfig) *ConnectionManager {
90 cm := &ConnectionManager{
91 config: config,
92 pools: make(map[string]*connPool),
93 }
94
95 // Start background health checker
96 go cm.healthChecker()
97
98 return cm
99}
100
101// Get retrieves a connection from the pool, creating one if necessary
102func Get(ctx context.Context, addr string) {
103 if cm.closed.Load() {
104 return nil, fmt.Errorf("connection manager is closed")
105 }
106
107 // Get or create pool for this destination
108 pool := cm.getOrCreatePool(addr)
109
110 // Try to get existing connection
111 select {
112 case conn := <-pool.conns:
113 // Validate connection is still good
114 if time.Since(conn.lastUsed) < cm.config.MaxIdleTime {
115 conn.lastUsed = time.Now()
116 conn.useCount++
117 pool.active.Add(1)
118 return conn, nil
119 }
120 // Connection too old, close it
121 conn.Close()
122 cm.totalConn.Add(-1)
123 case <-ctx.Done():
124 return nil, ctx.Err()
125 default:
126 // No idle connection available
127 }
128
129 // Create new connection
130 return cm.createConnection(ctx, addr, pool)
131}
132
133// Put returns a connection to the pool
134func Put(conn *pooledConn) {
135 if conn == nil {
136 return
137 }
138
139 conn.pool.active.Add(-1)
140
141 // Try to return to pool
142 select {
143 case conn.pool.conns <- conn:
144 // Successfully returned to pool
145 default:
146 // Pool is full, close connection
147 conn.Close()
148 cm.totalConn.Add(-1)
149 }
150}
151
152// getOrCreatePool gets existing pool or creates new one
153func getOrCreatePool(addr string) *connPool {
154 cm.mu.RLock()
155 pool, exists := cm.pools[addr]
156 cm.mu.RUnlock()
157
158 if exists {
159 return pool
160 }
161
162 // Need to create pool
163 cm.mu.Lock()
164 defer cm.mu.Unlock()
165
166 // Check again in case another goroutine created it
167 if pool, exists := cm.pools[addr]; exists {
168 return pool
169 }
170
171 // Create new pool
172 pool = &connPool{
173 addr: addr,
174 conns: make(chan *pooledConn, cm.config.MaxConnections/10),
175 }
176 cm.pools[addr] = pool
177
178 return pool
179}
180
181// createConnection creates a new connection
182func createConnection(ctx context.Context, addr string, pool *connPool) {
183 // Check connection limit
184 if cm.totalConn.Load() >= int64(cm.config.MaxConnections) {
185 return nil, fmt.Errorf("connection limit reached")
186 }
187
188 // Create connection with timeout
189 dialer := &net.Dialer{
190 Timeout: cm.config.ConnectTimeout,
191 }
192
193 conn, err := dialer.DialContext(ctx, "tcp", addr)
194 if err != nil {
195 return nil, err
196 }
197
198 // Wrap in pooled connection
199 pooledConn := &pooledConn{
200 Conn: conn,
201 pool: pool,
202 createdAt: time.Now(),
203 lastUsed: time.Now(),
204 useCount: 1,
205 }
206
207 cm.totalConn.Add(1)
208 pool.active.Add(1)
209
210 return pooledConn, nil
211}
212
213// healthChecker periodically checks connection health
214func healthChecker() {
215 ticker := time.NewTicker(cm.config.HealthCheckPeriod)
216 defer ticker.Stop()
217
218 for range ticker.C {
219 if cm.closed.Load() {
220 return
221 }
222
223 cm.checkAllPools()
224 }
225}
226
227// checkAllPools checks health of all connections in all pools
228func checkAllPools() {
229 cm.mu.RLock()
230 pools := make([]*connPool, 0, len(cm.pools))
231 for _, pool := range cm.pools {
232 pools = append(pools, pool)
233 }
234 cm.mu.RUnlock()
235
236 for _, pool := range pools {
237 cm.checkPool(pool)
238 }
239}
240
241// checkPool checks health of connections in a single pool
242func checkPool(pool *connPool) {
243 // Drain pool temporarily
244 conns := make([]*pooledConn, 0)
245
246 for {
247 select {
248 case conn := <-pool.conns:
249 conns = append(conns, conn)
250 default:
251 goto done
252 }
253 }
254
255done:
256 // Check each connection and return healthy ones
257 for _, conn := range conns {
258 if time.Since(conn.lastUsed) > cm.config.MaxIdleTime {
259 // Connection too old
260 conn.Close()
261 cm.totalConn.Add(-1)
262 continue
263 }
264
265 // Return healthy connection
266 select {
267 case pool.conns <- conn:
268 default:
269 // Pool full (shouldn't happen but be safe)
270 conn.Close()
271 cm.totalConn.Add(-1)
272 }
273 }
274}
275
276// Stats returns connection statistics
277func Stats() ConnectionStats {
278 cm.mu.RLock()
279 defer cm.mu.RUnlock()
280
281 var idle int64
282 for _, pool := range cm.pools {
283 idle += int64(len(pool.conns))
284 }
285
286 total := cm.totalConn.Load()
287 return ConnectionStats{
288 Total: total,
289 Active: total - idle,
290 Idle: idle,
291 }
292}
293
294// Close closes the connection manager and all connections
295func Close() error {
296 if !cm.closed.CompareAndSwap(false, true) {
297 return fmt.Errorf("already closed")
298 }
299
300 cm.mu.Lock()
301 defer cm.mu.Unlock()
302
303 // Close all pools
304 for _, pool := range cm.pools {
305 close(pool.conns)
306 for conn := range pool.conns {
307 conn.Close()
308 }
309 }
310
311 cm.pools = make(map[string]*connPool)
312 cm.totalConn.Store(0)
313
314 return nil
315}
This example demonstrates:
- Connection Pooling: Reuse connections efficiently
- Health Checking: Background checking and cleanup
- Resource Limits: Enforce maximum connections
- Context Support: Cancellation and timeouts
- Thread Safety: Safe concurrent access
- Statistics: Monitor pool usage
Pool Sizing: How many connections should you pool? Too few and you'll bottleneck; too many and you waste resources. A good starting point is 2-10 connections per target host, scaled based on your request rate and latency. Monitor pool utilization and adjust accordingly.
Connection Validation: Pooled connections can become stale—the remote end might close them, or network issues might make them unusable. Implement validation: try to use the connection, catch errors, and create a fresh connection if validation fails.
Pool Monitoring: Production pools need metrics: active connections, idle connections, wait time, creation rate, error rate. These metrics reveal bottlenecks (too small), waste (too large), and connection problems (high error rate).
Creating new connections is expensive—think of it like hiring and training new employees versus reusing experienced ones. A connection pool maintains a set of ready-to-use connections, dramatically improving performance for applications that make frequent network requests.
1package main
2
3import (
4 "context"
5 "errors"
6 "fmt"
7 "net"
8 "sync"
9 "time"
10)
11
12// run
13func main() {
14 // Start a test server
15 go startTestServer()
16 time.Sleep(100 * time.Millisecond)
17
18 // Create connection pool
19 pool := NewConnPool("localhost:8888", 3, 10*time.Second)
20
21 // Use connections
22 var wg sync.WaitGroup
23 for i := 0; i < 5; i++ {
24 wg.Add(1)
25 go func(id int) {
26 defer wg.Done()
27
28 conn, err := pool.Get()
29 if err != nil {
30 fmt.Printf("Worker %d: Failed to get connection: %v\n", id, err)
31 return
32 }
33
34 fmt.Printf("Worker %d: Got connection\n", id)
35
36 // Use connection
37 time.Sleep(100 * time.Millisecond)
38
39 // Return to pool
40 pool.Put(conn)
41 fmt.Printf("Worker %d: Returned connection\n", id)
42 }(i)
43 }
44
45 wg.Wait()
46
47 // Show stats
48 fmt.Printf("\nPool stats: %d active, %d idle\n",
49 pool.ActiveCount(), pool.IdleCount())
50
51 pool.Close()
52}
53
54type ConnPool struct {
55 addr string
56 maxSize int
57 idleTimeout time.Duration
58 connections chan *PooledConn
59 active int
60 mu sync.Mutex
61 closed bool
62}
63
64type PooledConn struct {
65 net.Conn
66 pool *ConnPool
67 createdAt time.Time
68 lastUsed time.Time
69}
70
71func NewConnPool(addr string, maxSize int, idleTimeout time.Duration) *ConnPool {
72 pool := &ConnPool{
73 addr: addr,
74 maxSize: maxSize,
75 idleTimeout: idleTimeout,
76 connections: make(chan *PooledConn, maxSize),
77 }
78
79 // Start connection reaper
80 go pool.reaper()
81
82 return pool
83}
84
85func Get() {
86 p.mu.Lock()
87 if p.closed {
88 p.mu.Unlock()
89 return nil, errors.New("pool is closed")
90 }
91 p.mu.Unlock()
92
93 // Try to get from pool
94 select {
95 case conn := <-p.connections:
96 // Check if connection is still valid
97 if time.Since(conn.lastUsed) > p.idleTimeout {
98 conn.Close()
99 return p.createConnection()
100 }
101 conn.lastUsed = time.Now()
102 return conn, nil
103 default:
104 return p.createConnection()
105 }
106}
107
108func Put(conn *PooledConn) {
109 if conn == nil {
110 return
111 }
112
113 conn.lastUsed = time.Now()
114
115 select {
116 case p.connections <- conn:
117 // Successfully returned to pool
118 default:
119 // Pool is full, close connection
120 conn.Conn.Close()
121 p.mu.Lock()
122 p.active--
123 p.mu.Unlock()
124 }
125}
126
127func createConnection() {
128 p.mu.Lock()
129 if p.active >= p.maxSize {
130 p.mu.Unlock()
131 return nil, errors.New("pool is at maximum capacity")
132 }
133 p.active++
134 p.mu.Unlock()
135
136 conn, err := net.DialTimeout("tcp", p.addr, 5*time.Second)
137 if err != nil {
138 p.mu.Lock()
139 p.active--
140 p.mu.Unlock()
141 return nil, err
142 }
143
144 now := time.Now()
145 return &PooledConn{
146 Conn: conn,
147 pool: p,
148 createdAt: now,
149 lastUsed: now,
150 }, nil
151}
152
153func reaper() {
154 ticker := time.NewTicker(30 * time.Second)
155 defer ticker.Stop()
156
157 for range ticker.C {
158 p.mu.Lock()
159 if p.closed {
160 p.mu.Unlock()
161 return
162 }
163 p.mu.Unlock()
164
165 // Check idle connections
166 for {
167 select {
168 case conn := <-p.connections:
169 if time.Since(conn.lastUsed) > p.idleTimeout {
170 conn.Close()
171 p.mu.Lock()
172 p.active--
173 p.mu.Unlock()
174 } else {
175 // Put it back
176 p.connections <- conn
177 return
178 }
179 default:
180 return
181 }
182 }
183 }
184}
185
186func ActiveCount() int {
187 p.mu.Lock()
188 defer p.mu.Unlock()
189 return p.active
190}
191
192func IdleCount() int {
193 return len(p.connections)
194}
195
196func Close() {
197 p.mu.Lock()
198 p.closed = true
199 p.mu.Unlock()
200
201 close(p.connections)
202 for conn := range p.connections {
203 conn.Close()
204 }
205}
206
207func startTestServer() {
208 listener, _ := net.Listen("tcp", "localhost:8888")
209 defer listener.Close()
210
211 for {
212 conn, err := listener.Accept()
213 if err != nil {
214 return
215 }
216 go func(c net.Conn) {
217 time.Sleep(500 * time.Millisecond)
218 c.Close()
219 }(conn)
220 }
221}
Advanced Timeout Patterns
⚠️ Critical: Always use timeouts in production! Without them, your application can hang indefinitely waiting for network responses, leading to cascading failures.
Timeouts are like setting deadlines for tasks—they prevent your application from waiting forever for something that might never happen. Let's explore different timeout strategies for various scenarios.
1package main
2
3import (
4 "context"
5 "fmt"
6 "io"
7 "net"
8 "time"
9)
10
11// run
12func main() {
13 // Demonstrate different timeout patterns
14 demonstrateDialTimeout()
15 demonstrateReadWriteTimeout()
16 demonstrateContextTimeout()
17}
18
19func demonstrateDialTimeout() {
20 fmt.Println("=== Dial Timeout ===")
21
22 start := time.Now()
23 conn, err := net.DialTimeout("tcp", "192.0.2.1:80", 2*time.Second)
24 elapsed := time.Since(start)
25
26 if err != nil {
27 fmt.Printf("Dial failed after %s: %v\n", elapsed, err)
28 } else {
29 fmt.Println("Connected")
30 conn.Close()
31 }
32}
33
34func demonstrateReadWriteTimeout() {
35 fmt.Println("\n=== Read/Write Timeout ===")
36
37 // Simulated connection
38 server, client := net.Pipe()
39 defer server.Close()
40 defer client.Close()
41
42 // Set read deadline
43 client.SetReadDeadline(time.Now().Add(1 * time.Second))
44
45 go func() {
46 time.Sleep(2 * time.Second)
47 server.Write([]byte("Too late!"))
48 }()
49
50 buffer := make([]byte, 1024)
51 start := time.Now()
52 _, err := client.Read(buffer)
53 elapsed := time.Since(start)
54
55 if err != nil {
56 fmt.Printf("Read timeout after %s: %v\n", elapsed, err)
57 }
58}
59
60func demonstrateContextTimeout() {
61 fmt.Println("\n=== Context Timeout ===")
62
63 ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
64 defer cancel()
65
66 start := time.Now()
67 conn, err := dialWithContext(ctx, "tcp", "192.0.2.1:80")
68 elapsed := time.Since(start)
69
70 if err != nil {
71 fmt.Printf("Context dial failed after %s: %v\n", elapsed, err)
72 } else {
73 fmt.Println("Connected")
74 conn.Close()
75 }
76}
77
78func dialWithContext(ctx context.Context, network, addr string) {
79 var d net.Dialer
80 return d.DialContext(ctx, network, addr)
81}
Keep-Alive Configuration
TCP keep-alive is like periodically saying "Are you still there?" during a long phone conversation. It prevents silent connections from being dropped by intermediate routers and firewalls, which often close idle connections to save resources.
Common Pitfalls with Keep-Alive
- Too frequent: Creates unnecessary network traffic
- Too infrequent: Connections might be dropped by network equipment
- Platform differences: Default behavior varies across operating systems
Real-world Example
Database connections use keep-alive extensively. A connection pool might maintain connections for hours, and keep-alive packets ensure these connections remain usable even when there's no active traffic.
1package main
2
3import (
4 "fmt"
5 "net"
6 "time"
7)
8
9// run
10func main() {
11 // Demonstrate TCP keep-alive configuration
12
13 listener, err := net.Listen("tcp", "localhost:7777")
14 if err != nil {
15 fmt.Println("Listen error:", err)
16 return
17 }
18 defer listener.Close()
19
20 fmt.Println("Server listening with keep-alive configuration")
21
22 go func() {
23 conn, err := listener.Accept()
24 if err != nil {
25 return
26 }
27 defer conn.Close()
28
29 // Configure keep-alive
30 if tcpConn, ok := conn.(*net.TCPConn); ok {
31 // Enable keep-alive
32 tcpConn.SetKeepAlive(true)
33
34 // Set keep-alive period
35 tcpConn.SetKeepAlivePeriod(30 * time.Second)
36
37 fmt.Println("Server: Keep-alive enabled")
38 }
39
40 // Keep connection open
41 io.Copy(io.Discard, conn)
42 }()
43
44 // Client connection
45 time.Sleep(100 * time.Millisecond)
46 conn, err := net.Dial("tcp", "localhost:7777")
47 if err != nil {
48 fmt.Println("Dial error:", err)
49 return
50 }
51 defer conn.Close()
52
53 if tcpConn, ok := conn.(*net.TCPConn); ok {
54 tcpConn.SetKeepAlive(true)
55 tcpConn.SetKeepAlivePeriod(30 * time.Second)
56 fmt.Println("Client: Keep-alive enabled")
57 }
58
59 time.Sleep(100 * time.Millisecond)
60 fmt.Println("Connection established with keep-alive")
61}
Custom Protocol Implementation
Why Custom Protocols: While HTTP covers many use cases, sometimes you need something specialized. Real-time gaming needs minimal overhead. IoT devices need energy-efficient protocols. Financial systems need precise message ordering and guaranteed delivery semantics.
Protocol Design Principles: Good protocols are versioned (for compatibility), framed (to detect message boundaries), efficient (minimal overhead), and debuggable (human-readable when possible). Binary protocols are fast but hard to debug; text protocols are slower but easier to troubleshoot.
Learning from the Masters: Study existing protocols: Redis uses a simple text protocol (RESP), WebSocket uses framed binary messages, HTTP/2 uses binary framing with multiplexing. Each design choice reflects specific requirements and trade-offs.
Sometimes existing protocols don't quite fit your needs. Creating a custom protocol is like designing your own language—you decide the vocabulary, grammar, and conversation rules.
Let's implement a complete custom protocol similar to HTTP but simpler.
Simple Request-Response Protocol
We'll create a simple text-based protocol that demonstrates the key concepts of protocol design. This protocol will have methods, paths, headers, and bodies—just like HTTP, but simplified for learning purposes.
1package main
2
3import (
4 "bufio"
5 "fmt"
6 "io"
7 "net"
8 "strconv"
9 "strings"
10 "time"
11)
12
13// run
14func main() {
15 // Start protocol server
16 go startProtocolServer()
17
18 time.Sleep(100 * time.Millisecond)
19
20 // Test client
21 testProtocolClient()
22}
23
24// Protocol format:
25// Request: METHOD path\nHeader: value\n\n[body]
26// Response: STATUS message\nHeader: value\n\n[body]
27
28type Request struct {
29 Method string
30 Path string
31 Headers map[string]string
32 Body []byte
33}
34
35type Response struct {
36 Status int
37 Message string
38 Headers map[string]string
39 Body []byte
40}
41
42func startProtocolServer() {
43 listener, err := net.Listen("tcp", "localhost:4040")
44 if err != nil {
45 fmt.Println("Server error:", err)
46 return
47 }
48 defer listener.Close()
49
50 fmt.Println("Protocol server listening on localhost:4040")
51
52 for i := 0; i < 3; i++ {
53 conn, err := listener.Accept()
54 if err != nil {
55 continue
56 }
57
58 handleProtocolConnection(conn)
59 }
60}
61
62func handleProtocolConnection(conn net.Conn) {
63 defer conn.Close()
64
65 req, err := readRequest(conn)
66 if err != nil {
67 fmt.Println("Read request error:", err)
68 return
69 }
70
71 fmt.Printf("Received: %s %s\n", req.Method, req.Path)
72
73 // Process request
74 var resp Response
75
76 switch req.Method {
77 case "GET":
78 resp = Response{
79 Status: 200,
80 Message: "OK",
81 Headers: map[string]string{"Content-Type": "text/plain"},
82 Body: []byte("Hello from server!"),
83 }
84 case "POST":
85 resp = Response{
86 Status: 201,
87 Message: "Created",
88 Headers: map[string]string{"Content-Type": "text/plain"},
89 Body: []byte(fmt.Sprintf("Received %d bytes", len(req.Body))),
90 }
91 default:
92 resp = Response{
93 Status: 400,
94 Message: "Bad Request",
95 Headers: map[string]string{},
96 Body: []byte("Unknown method"),
97 }
98 }
99
100 writeResponse(conn, resp)
101}
102
103func readRequest(conn net.Conn) {
104 reader := bufio.NewReader(conn)
105
106 // Read request line
107 line, err := reader.ReadString('\n')
108 if err != nil {
109 return Request{}, err
110 }
111
112 parts := strings.Fields(line)
113 if len(parts) != 2 {
114 return Request{}, fmt.Errorf("invalid request line")
115 }
116
117 req := Request{
118 Method: parts[0],
119 Path: parts[1],
120 Headers: make(map[string]string),
121 }
122
123 // Read headers
124 for {
125 line, err := reader.ReadString('\n')
126 if err != nil {
127 return Request{}, err
128 }
129
130 line = strings.TrimSpace(line)
131 if line == "" {
132 break // End of headers
133 }
134
135 parts := strings.SplitN(line, ":", 2)
136 if len(parts) == 2 {
137 req.Headers[strings.TrimSpace(parts[0])] = strings.TrimSpace(parts[1])
138 }
139 }
140
141 // Read body if Content-Length is specified
142 if lengthStr, ok := req.Headers["Content-Length"]; ok {
143 length, err := strconv.Atoi(lengthStr)
144 if err == nil && length > 0 {
145 req.Body = make([]byte, length)
146 io.ReadFull(reader, req.Body)
147 }
148 }
149
150 return req, nil
151}
152
153func writeResponse(conn net.Conn, resp Response) error {
154 writer := bufio.NewWriter(conn)
155
156 // Write status line
157 fmt.Fprintf(writer, "%d %s\n", resp.Status, resp.Message)
158
159 // Write headers
160 for key, value := range resp.Headers {
161 fmt.Fprintf(writer, "%s: %s\n", key, value)
162 }
163
164 // Write Content-Length
165 fmt.Fprintf(writer, "Content-Length: %d\n", len(resp.Body))
166
167 // Empty line
168 writer.WriteString("\n")
169
170 // Write body
171 writer.Write(resp.Body)
172
173 return writer.Flush()
174}
175
176func testProtocolClient() {
177 // Test GET request
178 conn, err := net.Dial("tcp", "localhost:4040")
179 if err != nil {
180 fmt.Println("Client error:", err)
181 return
182 }
183
184 req := Request{
185 Method: "GET",
186 Path: "/test",
187 Headers: map[string]string{"User-Agent": "TestClient/1.0"},
188 }
189
190 writeRequest(conn, req)
191 resp, _ := readResponse(conn)
192 fmt.Printf("Response: %d %s\n", resp.Status, resp.Message)
193 fmt.Printf("Body: %s\n", string(resp.Body))
194 conn.Close()
195
196 // Test POST request
197 conn, _ = net.Dial("tcp", "localhost:4040")
198 req = Request{
199 Method: "POST",
200 Path: "/data",
201 Headers: map[string]string{"Content-Type": "text/plain"},
202 Body: []byte("Test data"),
203 }
204
205 writeRequest(conn, req)
206 resp, _ = readResponse(conn)
207 fmt.Printf("\nResponse: %d %s\n", resp.Status, resp.Message)
208 fmt.Printf("Body: %s\n", string(resp.Body))
209 conn.Close()
210}
211
212func writeRequest(conn net.Conn, req Request) error {
213 writer := bufio.NewWriter(conn)
214
215 fmt.Fprintf(writer, "%s %s\n", req.Method, req.Path)
216
217 for key, value := range req.Headers {
218 fmt.Fprintf(writer, "%s: %s\n", key, value)
219 }
220
221 if len(req.Body) > 0 {
222 fmt.Fprintf(writer, "Content-Length: %d\n", len(req.Body))
223 }
224
225 writer.WriteString("\n")
226
227 if len(req.Body) > 0 {
228 writer.Write(req.Body)
229 }
230
231 return writer.Flush()
232}
233
234func readResponse(conn net.Conn) {
235 reader := bufio.NewReader(conn)
236
237 // Read status line
238 line, err := reader.ReadString('\n')
239 if err != nil {
240 return Response{}, err
241 }
242
243 parts := strings.SplitN(strings.TrimSpace(line), " ", 2)
244 if len(parts) != 2 {
245 return Response{}, fmt.Errorf("invalid status line")
246 }
247
248 status, _ := strconv.Atoi(parts[0])
249
250 resp := Response{
251 Status: status,
252 Message: parts[1],
253 Headers: make(map[string]string),
254 }
255
256 // Read headers
257 for {
258 line, err := reader.ReadString('\n')
259 if err != nil {
260 return Response{}, err
261 }
262
263 line = strings.TrimSpace(line)
264 if line == "" {
265 break
266 }
267
268 parts := strings.SplitN(line, ":", 2)
269 if len(parts) == 2 {
270 resp.Headers[strings.TrimSpace(parts[0])] = strings.TrimSpace(parts[1])
271 }
272 }
273
274 // Read body
275 if lengthStr, ok := resp.Headers["Content-Length"]; ok {
276 length, err := strconv.Atoi(lengthStr)
277 if err == nil && length > 0 {
278 resp.Body = make([]byte, length)
279 io.ReadFull(reader, resp.Body)
280 }
281 }
282
283 return resp, nil
284}
Protocol with Streaming Support
Some applications need to send continuous streams of data rather than discrete request/response pairs. Think of this as the difference between sending a letter versus having a continuous phone conversation. Streaming protocols are perfect for real-time data, file transfers, and live updates.
Real-world Example
WebSocket connections use streaming protocols extensively. When you're in a Google Doc, changes stream in real-time to all collaborators. Each change is a small message in a continuous stream, rather than a series of separate HTTP requests.
1package main
2
3import (
4 "bufio"
5 "fmt"
6 "io"
7 "net"
8 "time"
9)
10
11// run
12func main() {
13 // Start streaming server
14 go startStreamingServer()
15
16 time.Sleep(100 * time.Millisecond)
17
18 // Test streaming client
19 testStreamingClient()
20}
21
22// Stream protocol: Each message is length-prefixed
23// [4 bytes: length][N bytes: data]
24
25func startStreamingServer() {
26 listener, err := net.Listen("tcp", "localhost:3030")
27 if err != nil {
28 fmt.Println("Server error:", err)
29 return
30 }
31 defer listener.Close()
32
33 fmt.Println("Streaming server listening on localhost:3030")
34
35 conn, err := listener.Accept()
36 if err != nil {
37 return
38 }
39 defer conn.Close()
40
41 // Stream data to client
42 messages := []string{
43 "First message",
44 "Second message",
45 "Third message",
46 "Final message",
47 }
48
49 for _, msg := range messages {
50 if err := writeMessage(conn, []byte(msg)); err != nil {
51 fmt.Println("Write error:", err)
52 return
53 }
54 fmt.Printf("Sent: %s\n", msg)
55 time.Sleep(100 * time.Millisecond)
56 }
57}
58
59func writeMessage(w io.Writer, data []byte) error {
60 // Write length prefix
61 length := uint32(len(data))
62 lengthBytes := []byte{
63 byte(length >> 24),
64 byte(length >> 16),
65 byte(length >> 8),
66 byte(length),
67 }
68
69 if _, err := w.Write(lengthBytes); err != nil {
70 return err
71 }
72
73 // Write data
74 _, err := w.Write(data)
75 return err
76}
77
78func readMessage(r io.Reader) {
79 // Read length prefix
80 lengthBytes := make([]byte, 4)
81 if _, err := io.ReadFull(r, lengthBytes); err != nil {
82 return nil, err
83 }
84
85 length := uint32(lengthBytes[0])<<24 |
86 uint32(lengthBytes[1])<<16 |
87 uint32(lengthBytes[2])<<8 |
88 uint32(lengthBytes[3])
89
90 // Read data
91 data := make([]byte, length)
92 _, err := io.ReadFull(r, data)
93 return data, err
94}
95
96func testStreamingClient() {
97 conn, err := net.Dial("tcp", "localhost:3030")
98 if err != nil {
99 fmt.Println("Client error:", err)
100 return
101 }
102 defer conn.Close()
103
104 fmt.Println("Client connected, receiving stream...")
105
106 for {
107 msg, err := readMessage(conn)
108 if err != nil {
109 if err != io.EOF {
110 fmt.Println("Read error:", err)
111 }
112 break
113 }
114
115 fmt.Printf("Received: %s\n", string(msg))
116 }
117
118 fmt.Println("Stream ended")
119}
Production Patterns
Battle-Tested Patterns: Building production network services requires more than just functional code—you need patterns that handle failures gracefully, scale efficiently, and maintain reliability under load.
Graceful Degradation: When a dependency fails, good services degrade gracefully rather than failing completely. Circuit breakers prevent cascading failures, rate limiters protect against overload, and timeouts prevent hanging operations.
Observability: You can't fix what you can't see. Production services need metrics (requests per second, error rates, latency percentiles), logs (structured, correlated, searchable), and traces (to understand request flows through distributed systems).
Building a network application that works is one thing; building one that's reliable, maintainable, and production-ready is another. These patterns are like best practices for running a restaurant—it's not just about cooking good food, but also about handling rushes, dealing with emergencies, and keeping customers happy even when things go wrong.
Graceful Shutdown
When you need to update your application or shut it down for maintenance, you don't want to abruptly disconnect users. Graceful shutdown is like politely asking customers to finish their meals before closing—you stop accepting new connections but let existing ones complete their work.
1package main
2
3import (
4 "context"
5 "fmt"
6 "net"
7 "os"
8 "os/signal"
9 "sync"
10 "syscall"
11 "time"
12)
13
14// run
15func main() {
16 server := NewGracefulServer("localhost:2020")
17
18 // Setup signal handling
19 sigChan := make(chan os.Signal, 1)
20 signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
21
22 // Start server
23 go func() {
24 if err := server.Start(); err != nil {
25 fmt.Println("Server error:", err)
26 }
27 }()
28
29 // Simulate some clients
30 time.Sleep(100 * time.Millisecond)
31 for i := 0; i < 3; i++ {
32 go simulateLongRunningClient(i)
33 }
34
35 // Wait for signal
36 time.Sleep(500 * time.Millisecond)
37 fmt.Println("\nInitiating graceful shutdown...")
38
39 // Shutdown
40 ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
41 defer cancel()
42
43 if err := server.Shutdown(ctx); err != nil {
44 fmt.Println("Shutdown error:", err)
45 }
46
47 fmt.Println("Server stopped gracefully")
48}
49
50type GracefulServer struct {
51 addr string
52 listener net.Listener
53 wg sync.WaitGroup
54 shutdown chan struct{}
55 mu sync.Mutex
56}
57
58func NewGracefulServer(addr string) *GracefulServer {
59 return &GracefulServer{
60 addr: addr,
61 shutdown: make(chan struct{}),
62 }
63}
64
65func Start() error {
66 var err error
67 s.listener, err = net.Listen("tcp", s.addr)
68 if err != nil {
69 return err
70 }
71
72 fmt.Printf("Server listening on %s\n", s.addr)
73
74 for {
75 conn, err := s.listener.Accept()
76 if err != nil {
77 select {
78 case <-s.shutdown:
79 return nil
80 default:
81 fmt.Println("Accept error:", err)
82 continue
83 }
84 }
85
86 s.wg.Add(1)
87 go s.handleConnection(conn)
88 }
89}
90
91func handleConnection(conn net.Conn) {
92 defer s.wg.Done()
93 defer conn.Close()
94
95 fmt.Printf("Client connected: %s\n", conn.RemoteAddr())
96
97 // Simulate long-running operation
98 ticker := time.NewTicker(100 * time.Millisecond)
99 defer ticker.Stop()
100
101 for i := 0; i < 10; i++ {
102 select {
103 case <-s.shutdown:
104 fmt.Printf("Client %s: Shutdown requested, cleaning up...\n", conn.RemoteAddr())
105 return
106 case <-ticker.C:
107 conn.Write([]byte(fmt.Sprintf("Message %d\n", i)))
108 }
109 }
110
111 fmt.Printf("Client %s: Completed normally\n", conn.RemoteAddr())
112}
113
114func Shutdown(ctx context.Context) error {
115 s.mu.Lock()
116 defer s.mu.Unlock()
117
118 // Signal shutdown
119 close(s.shutdown)
120
121 // Stop accepting new connections
122 if s.listener != nil {
123 s.listener.Close()
124 }
125
126 // Wait for existing connections with timeout
127 done := make(chan struct{})
128 go func() {
129 s.wg.Wait()
130 close(done)
131 }()
132
133 select {
134 case <-done:
135 fmt.Println("All connections closed gracefully")
136 return nil
137 case <-ctx.Done():
138 return fmt.Errorf("shutdown timeout: %w", ctx.Err())
139 }
140}
141
142func simulateLongRunningClient(id int) {
143 conn, err := net.Dial("tcp", "localhost:2020")
144 if err != nil {
145 return
146 }
147 defer conn.Close()
148
149 // Read messages
150 buffer := make([]byte, 1024)
151 for {
152 n, err := conn.Read(buffer)
153 if err != nil {
154 return
155 }
156 fmt.Printf("Client %d: %s", id, string(buffer[:n]))
157 }
158}
Circuit Breaker Pattern
What happens when a downstream service becomes unresponsive? Without protection, your application might spend all its time waiting for a service that will never reply. The circuit breaker pattern is like a safety valve that automatically opens when pressure gets too high—it temporarily stops trying to contact failing services, giving them time to recover.
Real-world Example
Netflix famously pioneered the circuit breaker pattern. When their recommendation service starts failing, the circuit breaker opens, and Netflix shows cached recommendations instead. The user experience degrades gracefully instead of breaking completely.
1package main
2
3import (
4 "errors"
5 "fmt"
6 "sync"
7 "time"
8)
9
10// run
11func main() {
12 cb := NewCircuitBreaker(3, 2*time.Second)
13
14 // Simulate requests
15 for i := 0; i < 10; i++ {
16 err := cb.Call(func() error {
17 // Simulate a failing service
18 if i < 5 {
19 return errors.New("service unavailable")
20 }
21 return nil
22 })
23
24 if err != nil {
25 fmt.Printf("Request %d: %v\n", i, err, cb.State())
26 } else {
27 fmt.Printf("Request %d: Success\n", i, cb.State())
28 }
29
30 time.Sleep(300 * time.Millisecond)
31 }
32}
33
34type CircuitState int
35
36const (
37 StateClosed CircuitState = iota
38 StateOpen
39 StateHalfOpen
40)
41
42func String() string {
43 switch s {
44 case StateClosed:
45 return "Closed"
46 case StateOpen:
47 return "Open"
48 case StateHalfOpen:
49 return "Half-Open"
50 default:
51 return "Unknown"
52 }
53}
54
55type CircuitBreaker struct {
56 maxFailures int
57 resetTimeout time.Duration
58
59 mu sync.Mutex
60 state CircuitState
61 failures int
62 lastFailTime time.Time
63}
64
65func NewCircuitBreaker(maxFailures int, resetTimeout time.Duration) *CircuitBreaker {
66 return &CircuitBreaker{
67 maxFailures: maxFailures,
68 resetTimeout: resetTimeout,
69 state: StateClosed,
70 }
71}
72
73func Call(fn func() error) error {
74 cb.mu.Lock()
75
76 // Check if we should transition from Open to Half-Open
77 if cb.state == StateOpen {
78 if time.Since(cb.lastFailTime) > cb.resetTimeout {
79 cb.state = StateHalfOpen
80 cb.failures = 0
81 fmt.Println("Circuit breaker transitioning to Half-Open")
82 } else {
83 cb.mu.Unlock()
84 return errors.New("circuit breaker is open")
85 }
86 }
87
88 cb.mu.Unlock()
89
90 // Execute the function
91 err := fn()
92
93 cb.mu.Lock()
94 defer cb.mu.Unlock()
95
96 if err != nil {
97 cb.failures++
98 cb.lastFailTime = time.Now()
99
100 if cb.failures >= cb.maxFailures {
101 cb.state = StateOpen
102 fmt.Println("Circuit breaker opened due to failures")
103 }
104
105 return err
106 }
107
108 // Success - reset if we were in Half-Open
109 if cb.state == StateHalfOpen {
110 cb.state = StateClosed
111 cb.failures = 0
112 fmt.Println("Circuit breaker closed after successful Half-Open request")
113 }
114
115 return nil
116}
117
118func State() CircuitState {
119 cb.mu.Lock()
120 defer cb.mu.Unlock()
121 return cb.state
122}
Rate Limiting
Without rate limiting, a single misbehaving client or malicious attacker can overwhelm your service. Rate limiting is like having a bouncer at a nightclub—it ensures everyone gets a fair turn and prevents abuse by limiting how many requests each client can make.
Common Rate Limiting Strategies
- Token bucket: Allows bursts but maintains average rate
- Fixed window: Resets count every minute/hour
- Sliding window: More accurate but resource-intensive
- Distributed rate limiting: For multiple servers sharing limits
1package main
2
3import (
4 "fmt"
5 "sync"
6 "time"
7)
8
9// run
10func main() {
11 limiter := NewRateLimiter(3, time.Second)
12
13 // Simulate requests
14 var wg sync.WaitGroup
15 for i := 0; i < 10; i++ {
16 wg.Add(1)
17 go func(id int) {
18 defer wg.Done()
19
20 if limiter.Allow() {
21 fmt.Printf("Request %d: Allowed\n", id)
22 } else {
23 fmt.Printf("Request %d: Rate limited\n", id)
24 }
25 }(i)
26
27 time.Sleep(100 * time.Millisecond)
28 }
29
30 wg.Wait()
31}
32
33type RateLimiter struct {
34 rate int
35 interval time.Duration
36 tokens int
37 lastRef time.Time
38 mu sync.Mutex
39}
40
41func NewRateLimiter(rate int, interval time.Duration) *RateLimiter {
42 return &RateLimiter{
43 rate: rate,
44 interval: interval,
45 tokens: rate,
46 lastRef: time.Now(),
47 }
48}
49
50func Allow() bool {
51 rl.mu.Lock()
52 defer rl.mu.Unlock()
53
54 now := time.Now()
55 elapsed := now.Sub(rl.lastRef)
56
57 // Refill tokens based on elapsed time
58 if elapsed >= rl.interval {
59 rl.tokens = rl.rate
60 rl.lastRef = now
61 }
62
63 if rl.tokens > 0 {
64 rl.tokens--
65 return true
66 }
67
68 return false
69}
Connection Health Checking
Connections can go bad even when they appear to be working—think of this as a phone line that's still connected but the quality is so poor you can't understand each other. Health checking ensures you're only using connections that are actually working properly.
Common Health Check Pitfalls
- Too aggressive: Health checks themselves can overwhelm the service
- Not aggressive enough: Bad connections linger and cause failures
- Wrong metrics: Checking connection exists vs. checking it works
Real-world Example
Database connection pools constantly health check connections. A connection might have been dropped by a firewall hours ago, but from your application's perspective, it's still "open". Health checks catch these zombie connections before they cause errors.
1package main
2
3import (
4 "fmt"
5 "net"
6 "sync"
7 "time"
8)
9
10// run
11func main() {
12 manager := NewConnectionManager("localhost:1010", 5*time.Second)
13
14 // Start test server
15 go startHealthCheckServer()
16 time.Sleep(100 * time.Millisecond)
17
18 // Get connection
19 conn, err := manager.GetConnection()
20 if err != nil {
21 fmt.Println("Error:", err)
22 return
23 }
24 defer manager.ReleaseConnection(conn)
25
26 fmt.Println("Got healthy connection")
27
28 // Use connection
29 conn.Write([]byte("test\n"))
30}
31
32type ConnectionManager struct {
33 addr string
34 healthCheck time.Duration
35 conn net.Conn
36 lastCheck time.Time
37 mu sync.Mutex
38}
39
40func NewConnectionManager(addr string, healthCheck time.Duration) *ConnectionManager {
41 return &ConnectionManager{
42 addr: addr,
43 healthCheck: healthCheck,
44 }
45}
46
47func GetConnection() {
48 cm.mu.Lock()
49 defer cm.mu.Unlock()
50
51 // Check if we need to verify health
52 if cm.conn != nil && time.Since(cm.lastCheck) < cm.healthCheck {
53 return cm.conn, nil
54 }
55
56 // Health check or create new connection
57 if cm.conn != nil {
58 if !cm.isHealthy(cm.conn) {
59 fmt.Println("Connection unhealthy, creating new one")
60 cm.conn.Close()
61 cm.conn = nil
62 }
63 }
64
65 if cm.conn == nil {
66 var err error
67 cm.conn, err = net.Dial("tcp", cm.addr)
68 if err != nil {
69 return nil, err
70 }
71 fmt.Println("Created new connection")
72 }
73
74 cm.lastCheck = time.Now()
75 return cm.conn, nil
76}
77
78func ReleaseConnection(conn net.Conn) {
79 // In a real implementation, you might return to a pool
80}
81
82func isHealthy(conn net.Conn) bool {
83 // Simple health check: set short deadline and try to read
84 conn.SetReadDeadline(time.Now().Add(100 * time.Millisecond))
85 one := make([]byte, 1)
86
87 // Try to peek at one byte without consuming it
88 // In a real implementation, you'd use a proper health check protocol
89 _, err := conn.Read(one)
90
91 // Reset deadline
92 conn.SetReadDeadline(time.Time{})
93
94 return err == nil
95}
96
97func startHealthCheckServer() {
98 listener, _ := net.Listen("tcp", "localhost:1010")
99 defer listener.Close()
100
101 for {
102 conn, err := listener.Accept()
103 if err != nil {
104 return
105 }
106
107 go func(c net.Conn) {
108 defer c.Close()
109 buffer := make([]byte, 1024)
110 for {
111 _, err := c.Read(buffer)
112 if err != nil {
113 return
114 }
115 }
116 }(conn)
117 }
118}
Practice Exercises
Exercise 1: Chat Server
Difficulty: Intermediate | Time: 35-45 minutes
Learning Objectives:
- Master concurrent TCP server design with goroutine management
- Implement real-time message broadcasting and client coordination
- Learn graceful connection handling and cleanup patterns
Real-World Context: Chat servers are the foundation of real-time communication systems like Slack, Discord, and WhatsApp. They demonstrate the core patterns of concurrent connection management, message broadcasting, and client state synchronization that are essential for any real-time application.
Build a concurrent chat server where multiple clients can connect and broadcast messages to all connected clients. Your server should handle multiple simultaneous TCP connections, implement message broadcasting to all connected clients, manage client disconnections gracefully, provide username registration, and support chat commands like /list and /quit. This exercise demonstrates the fundamental patterns used in production real-time communication systems where managing concurrent connections and state synchronization is critical for building scalable chat applications.
Solution
1package main
2
3import (
4 "bufio"
5 "fmt"
6 "net"
7 "strings"
8 "sync"
9 "time"
10)
11
12type ChatServer struct {
13 clients map[string]net.Conn
14 broadcast chan Message
15 register chan Client
16 unregister chan string
17 mu sync.RWMutex
18}
19
20type Client struct {
21 username string
22 conn net.Conn
23}
24
25type Message struct {
26 sender string
27 content string
28}
29
30func NewChatServer() *ChatServer {
31 return &ChatServer{
32 clients: make(map[string]net.Conn),
33 broadcast: make(chan Message, 100),
34 register: make(chan Client),
35 unregister: make(chan string),
36 }
37}
38
39func Start(addr string) error {
40 listener, err := net.Listen("tcp", addr)
41 if err != nil {
42 return err
43 }
44 defer listener.Close()
45
46 fmt.Printf("Chat server listening on %s\n", addr)
47
48 // Start broadcast handler
49 go s.handleBroadcasts()
50
51 for {
52 conn, err := listener.Accept()
53 if err != nil {
54 continue
55 }
56
57 go s.handleClient(conn)
58 }
59}
60
61func handleClient(conn net.Conn) {
62 defer conn.Close()
63
64 // Get username
65 conn.Write([]byte("Enter username: "))
66 reader := bufio.NewReader(conn)
67 username, err := reader.ReadString('\n')
68 if err != nil {
69 return
70 }
71 username = strings.TrimSpace(username)
72
73 // Register client
74 s.register <- Client{username: username, conn: conn}
75 defer func() {
76 s.unregister <- username
77 }()
78
79 // Notify others
80 s.broadcast <- Message{
81 sender: "system",
82 content: fmt.Sprintf("%s joined the chat", username),
83 }
84
85 // Handle messages
86 for {
87 line, err := reader.ReadString('\n')
88 if err != nil {
89 break
90 }
91
92 line = strings.TrimSpace(line)
93
94 if strings.HasPrefix(line, "/") {
95 s.handleCommand(username, line, conn)
96 continue
97 }
98
99 s.broadcast <- Message{
100 sender: username,
101 content: line,
102 }
103 }
104
105 s.broadcast <- Message{
106 sender: "system",
107 content: fmt.Sprintf("%s left the chat", username),
108 }
109}
110
111func handleCommand(username, cmd string, conn net.Conn) {
112 parts := strings.Fields(cmd)
113 if len(parts) == 0 {
114 return
115 }
116
117 switch parts[0] {
118 case "/list":
119 s.mu.RLock()
120 users := make([]string, 0, len(s.clients))
121 for user := range s.clients {
122 users = append(users, user)
123 }
124 s.mu.RUnlock()
125
126 conn.Write([]byte(fmt.Sprintf("Users: %s\n", strings.Join(users, ", "))))
127
128 case "/quit":
129 conn.Write([]byte("Goodbye!\n"))
130 conn.Close()
131
132 default:
133 conn.Write([]byte("Unknown command\n"))
134 }
135}
136
137func handleBroadcasts() {
138 for {
139 select {
140 case client := <-s.register:
141 s.mu.Lock()
142 s.clients[client.username] = client.conn
143 s.mu.Unlock()
144
145 case username := <-s.unregister:
146 s.mu.Lock()
147 delete(s.clients, username)
148 s.mu.Unlock()
149
150 case msg := <-s.broadcast:
151 s.mu.RLock()
152 for username, conn := range s.clients {
153 if username != msg.sender {
154 fmt.Fprintf(conn, "[%s] %s\n", msg.sender, msg.content)
155 }
156 }
157 s.mu.RUnlock()
158 }
159 }
160}
161
162func main() {
163 server := NewChatServer()
164
165 // For testing, run in background
166 go func() {
167 if err := server.Start("localhost:6000"); err != nil {
168 fmt.Println("Server error:", err)
169 }
170 }()
171
172 // Simulate clients
173 time.Sleep(100 * time.Millisecond)
174
175 go simulateChatClient("Alice", []string{"Hello everyone!", "/list"})
176 time.Sleep(100 * time.Millisecond)
177
178 go simulateChatClient("Bob", []string{"Hi Alice!", "/list"})
179 time.Sleep(500 * time.Millisecond)
180
181 time.Sleep(2 * time.Second)
182}
183
184func simulateChatClient(username string, messages []string) {
185 conn, err := net.Dial("tcp", "localhost:6000")
186 if err != nil {
187 fmt.Printf("%s: Dial error: %v\n", username, err)
188 return
189 }
190 defer conn.Close()
191
192 reader := bufio.NewReader(conn)
193
194 // Read prompt
195 reader.ReadString('\n')
196
197 // Send username
198 fmt.Fprintf(conn, "%s\n", username)
199
200 // Read and display messages in background
201 go func() {
202 for {
203 line, err := reader.ReadString('\n')
204 if err != nil {
205 return
206 }
207 fmt.Printf("%s received: %s", username, line)
208 }
209 }()
210
211 // Send messages
212 for _, msg := range messages {
213 time.Sleep(200 * time.Millisecond)
214 fmt.Fprintf(conn, "%s\n", msg)
215 }
216
217 time.Sleep(500 * time.Millisecond)
218}
Exercise 2: Load Balancer
Difficulty: Advanced | Time: 40-50 minutes
Learning Objectives:
- Master TCP proxy patterns and bidirectional data forwarding
- Implement health checking and failure detection mechanisms
- Learn round-robin load balancing algorithms and connection management
Real-World Context: Load balancers are essential components in modern infrastructure, distributing traffic across multiple servers to ensure high availability and scalability. Companies like Netflix, Google, and AWS rely on sophisticated load balancers to handle billions of requests daily.
Implement a simple TCP load balancer that distributes connections across multiple backend servers using round-robin algorithm. Your load balancer should accept client connections, maintain a list of healthy backend servers, use round-robin selection for load distribution, forward data bidirectionally between clients and backends, handle backend failures gracefully, and implement health checking to detect and remove failed servers. This exercise demonstrates the core patterns used in production load balancing systems where traffic distribution, health monitoring, and failure handling are critical for building resilient and scalable applications.
Solution
1package main
2
3import (
4 "fmt"
5 "io"
6 "net"
7 "sync"
8 "sync/atomic"
9 "time"
10)
11
12type LoadBalancer struct {
13 backends []*Backend
14 current atomic.Uint32
15 healthCheck time.Duration
16 mu sync.RWMutex
17}
18
19type Backend struct {
20 addr string
21 healthy atomic.Bool
22}
23
24func NewLoadBalancer(backends []string, healthCheck time.Duration) *LoadBalancer {
25 lb := &LoadBalancer{
26 backends: make([]*Backend, len(backends)),
27 healthCheck: healthCheck,
28 }
29
30 for i, addr := range backends {
31 backend := &Backend{addr: addr}
32 backend.healthy.Store(true)
33 lb.backends[i] = backend
34 }
35
36 // Start health checks
37 go lb.startHealthChecks()
38
39 return lb
40}
41
42func Start(addr string) error {
43 listener, err := net.Listen("tcp", addr)
44 if err != nil {
45 return err
46 }
47 defer listener.Close()
48
49 fmt.Printf("Load balancer listening on %s\n", addr)
50 fmt.Printf("Backends: %v\n", lb.getBackendAddrs())
51
52 for {
53 conn, err := listener.Accept()
54 if err != nil {
55 continue
56 }
57
58 go lb.handleConnection(conn)
59 }
60}
61
62func handleConnection(clientConn net.Conn) {
63 defer clientConn.Close()
64
65 // Get next healthy backend
66 backend := lb.nextBackend()
67 if backend == nil {
68 fmt.Println("No healthy backends available")
69 return
70 }
71
72 // Connect to backend
73 backendConn, err := net.DialTimeout("tcp", backend.addr, 5*time.Second)
74 if err != nil {
75 fmt.Printf("Failed to connect to backend %s: %v\n", backend.addr, err)
76 backend.healthy.Store(false)
77 return
78 }
79 defer backendConn.Close()
80
81 fmt.Printf("Proxying connection to %s\n", backend.addr)
82
83 // Bidirectional copy
84 var wg sync.WaitGroup
85 wg.Add(2)
86
87 // Client -> Backend
88 go func() {
89 defer wg.Done()
90 io.Copy(backendConn, clientConn)
91 backendConn.(*net.TCPConn).CloseWrite()
92 }()
93
94 // Backend -> Client
95 go func() {
96 defer wg.Done()
97 io.Copy(clientConn, backendConn)
98 clientConn.(*net.TCPConn).CloseWrite()
99 }()
100
101 wg.Wait()
102}
103
104func nextBackend() *Backend {
105 lb.mu.RLock()
106 defer lb.mu.RUnlock()
107
108 n := len(lb.backends)
109 if n == 0 {
110 return nil
111 }
112
113 // Try to find healthy backend
114 for i := 0; i < n; i++ {
115 idx := lb.current.Add(1) % uint32(n)
116 backend := lb.backends[idx]
117
118 if backend.healthy.Load() {
119 return backend
120 }
121 }
122
123 return nil
124}
125
126func startHealthChecks() {
127 ticker := time.NewTicker(lb.healthCheck)
128 defer ticker.Stop()
129
130 for range ticker.C {
131 lb.mu.RLock()
132 for _, backend := range lb.backends {
133 go lb.checkBackend(backend)
134 }
135 lb.mu.RUnlock()
136 }
137}
138
139func checkBackend(backend *Backend) {
140 conn, err := net.DialTimeout("tcp", backend.addr, 2*time.Second)
141 if err != nil {
142 if backend.healthy.Load() {
143 fmt.Printf("Backend %s is down\n", backend.addr)
144 }
145 backend.healthy.Store(false)
146 return
147 }
148 conn.Close()
149
150 if !backend.healthy.Load() {
151 fmt.Printf("Backend %s is back up\n", backend.addr)
152 }
153 backend.healthy.Store(true)
154}
155
156func getBackendAddrs() []string {
157 lb.mu.RLock()
158 defer lb.mu.RUnlock()
159
160 addrs := make([]string, len(lb.backends))
161 for i, b := range lb.backends {
162 addrs[i] = b.addr
163 }
164 return addrs
165}
166
167func main() {
168 // Start backend servers
169 go startBackendServer("localhost:8001", "Backend-1")
170 go startBackendServer("localhost:8002", "Backend-2")
171 go startBackendServer("localhost:8003", "Backend-3")
172
173 time.Sleep(100 * time.Millisecond)
174
175 // Start load balancer
176 lb := NewLoadBalancer(
177 []string{"localhost:8001", "localhost:8002", "localhost:8003"},
178 3*time.Second,
179 )
180
181 go func() {
182 if err := lb.Start("localhost:9000"); err != nil {
183 fmt.Println("Load balancer error:", err)
184 }
185 }()
186
187 time.Sleep(100 * time.Millisecond)
188
189 // Test with clients
190 for i := 0; i < 6; i++ {
191 go testClient(i)
192 time.Sleep(100 * time.Millisecond)
193 }
194
195 time.Sleep(2 * time.Second)
196}
197
198func startBackendServer(addr, name string) {
199 listener, err := net.Listen("tcp", addr)
200 if err != nil {
201 fmt.Printf("%s error: %v\n", name, err)
202 return
203 }
204 defer listener.Close()
205
206 fmt.Printf("%s listening on %s\n", name, addr)
207
208 for {
209 conn, err := listener.Accept()
210 if err != nil {
211 continue
212 }
213
214 go func(c net.Conn) {
215 defer c.Close()
216
217 // Echo with server name
218 response := fmt.Sprintf("Response from %s\n", name)
219 c.Write([]byte(response))
220 }(conn)
221 }
222}
223
224func testClient(id int) {
225 conn, err := net.Dial("tcp", "localhost:9000")
226 if err != nil {
227 fmt.Printf("Client %d error: %v\n", id, err)
228 return
229 }
230 defer conn.Close()
231
232 // Read response
233 buffer := make([]byte, 1024)
234 n, err := conn.Read(buffer)
235 if err != nil {
236 fmt.Printf("Client %d read error: %v\n", id, err)
237 return
238 }
239
240 fmt.Printf("Client %d: %s", id, string(buffer[:n]))
241}
Exercise 3: HTTP Proxy Server
Difficulty: Advanced | Time: 30-40 minutes
Learning Objectives:
- Build HTTP proxy servers with request forwarding and response handling
- Master HTTP header manipulation and proxy protocol implementation
- Learn request/response logging and performance monitoring patterns
Real-World Context: HTTP proxies are crucial components in web infrastructure, used for caching, filtering, load balancing, and security monitoring. Corporate networks, CDNs, and API gateways all rely on HTTP proxy technology to manage and optimize web traffic.
Build an HTTP proxy server that forwards requests to backend servers with request/response logging. Your proxy should handle HTTP request forwarding, preserve and manipulate headers appropriately, add proxy-specific headers like X-Forwarded-For, log detailed request/response information including timing and status codes, and demonstrate the patterns used in production proxy servers. This exercise shows how to build the fundamental components of API gateways and reverse proxies that are essential for modern web application architecture.
Solution with Explanation
1package main
2
3import (
4 "fmt"
5 "io"
6 "log"
7 "net"
8 "net/http"
9 "net/url"
10 "time"
11)
12
13// HTTPProxy forwards HTTP requests to backend servers
14type HTTPProxy struct {
15 targetURL *url.URL
16}
17
18func NewHTTPProxy(targetURL string) {
19 u, err := url.Parse(targetURL)
20 if err != nil {
21 return nil, err
22 }
23
24 return &HTTPProxy{targetURL: u}, nil
25}
26
27func ServeHTTP(w http.ResponseWriter, r *http.Request) {
28 start := time.Now()
29
30 // Log incoming request
31 log.Printf("[PROXY] %s %s from %s", r.Method, r.URL.Path, r.RemoteAddr)
32
33 // Create proxy request
34 proxyURL := *r.URL
35 proxyURL.Scheme = p.targetURL.Scheme
36 proxyURL.Host = p.targetURL.Host
37
38 proxyReq, err := http.NewRequest(r.Method, proxyURL.String(), r.Body)
39 if err != nil {
40 http.Error(w, "Proxy request failed", http.StatusInternalServerError)
41 log.Printf("[ERROR] Failed to create proxy request: %v", err)
42 return
43 }
44
45 // Copy headers
46 for key, values := range r.Header {
47 for _, value := range values {
48 proxyReq.Header.Add(key, value)
49 }
50 }
51
52 // Set proxy headers
53 proxyReq.Header.Set("X-Forwarded-For", r.RemoteAddr)
54 proxyReq.Header.Set("X-Forwarded-Proto", "http")
55
56 // Send request
57 client := &http.Client{
58 Timeout: 30 * time.Second,
59 }
60
61 resp, err := client.Do(proxyReq)
62 if err != nil {
63 http.Error(w, "Backend unavailable", http.StatusBadGateway)
64 log.Printf("[ERROR] Backend request failed: %v", err)
65 return
66 }
67 defer resp.Body.Close()
68
69 // Copy response headers
70 for key, values := range resp.Header {
71 for _, value := range values {
72 w.Header().Add(key, value)
73 }
74 }
75
76 // Copy status code
77 w.WriteHeader(resp.StatusCode)
78
79 // Copy response body
80 written, err := io.Copy(w, resp.Body)
81 if err != nil {
82 log.Printf("[ERROR] Failed to copy response: %v", err)
83 return
84 }
85
86 duration := time.Since(start)
87 log.Printf("[PROXY] %s %s -> %d",
88 r.Method, r.URL.Path, resp.StatusCode, written, duration)
89}
90
91func main() {
92 // For demo, we'll create a simple backend server first
93 go func() {
94 http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
95 fmt.Fprintf(w, "Backend response for %s\n", r.URL.Path)
96 fmt.Fprintf(w, "Method: %s\n", r.Method)
97 fmt.Fprintf(w, "X-Forwarded-For: %s\n", r.Header.Get("X-Forwarded-For"))
98 })
99 log.Println("Backend server starting on :8081")
100 http.ListenAndServe(":8081", nil)
101 }()
102
103 time.Sleep(100 * time.Millisecond)
104
105 // Create proxy
106 proxy, err := NewHTTPProxy("http://localhost:8081")
107 if err != nil {
108 panic(err)
109 }
110
111 // Start proxy server
112 fmt.Println("HTTP Proxy starting on :8080")
113 fmt.Println("Proxying to http://localhost:8081")
114 fmt.Println("\nTest with: curl http://localhost:8080/test")
115
116 // For demo purposes, simulate a request
117 time.Sleep(100 * time.Millisecond)
118 resp, err := http.Get("http://localhost:8080/test")
119 if err != nil {
120 fmt.Printf("Test request failed: %v\n", err)
121 } else {
122 body, _ := io.ReadAll(resp.Body)
123 resp.Body.Close()
124 fmt.Printf("\nProxy Response:\n%s\n", string(body))
125 }
126}
127// run
Explanation:
This HTTP proxy demonstrates:
- Request Forwarding: Forwards HTTP requests to backend server
- Header Preservation: Copies request and response headers
- X-Forwarded-For: Adds proxy headers for client tracking
- Error Handling: Handles backend failures gracefully
- Logging: Comprehensive request/response logging
- Timeout Management: Sets reasonable timeouts for backend requests
Production enhancements:
- Add load balancing across multiple backends
- Implement connection pooling
- Add caching for static resources
- Implement rate limiting
- Add authentication/authorization
- Support WebSocket proxying
- Add health checks for backends
- Implement circuit breakers
Exercise 4: UDP File Transfer
Difficulty: Advanced | Time: 45-55 minutes
Learning Objectives:
- Master UDP communication with reliability mechanisms
- Implement packet sequencing and acknowledgment systems
- Learn to build reliable file transfer protocols on top of UDP
Real-World Context: While TCP handles reliability automatically, understanding how to build reliability on top of UDP is crucial for applications needing custom control over reliability, speed, and packet handling. Video streaming, gaming, and IoT systems often use custom UDP protocols for better performance.
Build a UDP-based file transfer system with reliability mechanisms. Your system should implement packet sequencing to ensure files are reconstructed correctly, add acknowledgment and retransmission logic for reliable delivery, handle packet loss and corruption gracefully, demonstrate flow control to prevent overwhelming the receiver, and show how to build reliability on top of UDP's connectionless protocol. This exercise demonstrates the fundamental principles behind protocols like QUIC and custom streaming solutions where you need TCP-like reliability with UDP's performance characteristics.
Solution with Explanation
1package main
2
3import (
4 "crypto/rand"
5 "encoding/binary"
6 "fmt"
7 "io"
8 "net"
9 "os"
10 "sync"
11 "time"
12)
13
14const (
15 MaxPacketSize = 1024
16 AckTimeout = 2 * time.Second
17 MaxRetries = 5
18)
19
20// Packet types
21const (
22 PacketTypeData byte = 1
23 PacketTypeAck byte = 2
24 PacketTypeFin byte = 3
25)
26
27// Packet structure: [Type(1)][SeqNum(4)][Payload(N)]
28type Packet struct {
29 Type byte
30 SeqNum uint32
31 Payload []byte
32}
33
34type ReliableFileTransfer struct {
35 conn *net.UDPConn
36 packetBuf sync.Map // seqNum -> packet
37 windowSize int
38 nextSeq uint32
39 lastAck uint32
40 fileSize int64
41 received int64
42}
43
44func NewReliableFileTransfer(conn *net.UDPConn) *ReliableFileTransfer {
45 return &ReliableFileTransfer{
46 conn: conn,
47 windowSize: 10,
48 nextSeq: 1,
49 lastAck: 0,
50 }
51}
52
53func SendFile(filename string, remoteAddr *net.UDPAddr) error {
54 file, err := os.Open(filename)
55 if err != nil {
56 return fmt.Errorf("failed to open file: %w", err)
57 }
58 defer file.Close()
59
60 // Get file size
61 stat, _ := file.Stat()
62 rft.fileSize = stat.Size()
63
64 fmt.Printf("Sending file: %s\n", filename, rft.fileSize)
65
66 // Send file in packets
67 buffer := make([]byte, MaxPacketSize-5) // 5 bytes for header
68 seqNum := uint32(1)
69
70 for {
71 n, err := file.Read(buffer)
72 if err == io.EOF {
73 break
74 }
75 if err != nil {
76 return fmt.Errorf("file read error: %w", err)
77 }
78
79 // Create data packet
80 packet := Packet{
81 Type: PacketTypeData,
82 SeqNum: seqNum,
83 Payload: buffer[:n],
84 }
85
86 // Send with retry logic
87 if err := rft.sendPacketWithRetry(packet, remoteAddr); err != nil {
88 return fmt.Errorf("failed to send packet %d: %w", seqNum, err)
89 }
90
91 fmt.Printf("Sent packet %d\n", seqNum, n)
92 seqNum++
93 time.Sleep(10 * time.Millisecond) // Small delay to avoid overwhelming receiver
94 }
95
96 // Send FIN packet
97 finPacket := Packet{
98 Type: PacketTypeFin,
99 SeqNum: seqNum,
100 }
101 if err := rft.sendPacketWithRetry(finPacket, remoteAddr); err != nil {
102 return fmt.Errorf("failed to send FIN packet: %w", err)
103 }
104
105 fmt.Printf("File transfer completed\n")
106 return nil
107}
108
109func sendPacketWithRetry(packet Packet, addr *net.UDPAddr) error {
110 data := rft.encodePacket(packet)
111
112 for attempt := 0; attempt < MaxRetries; attempt++ {
113 // Send packet
114 if _, err := rft.conn.WriteToUDP(data, addr); err != nil {
115 return err
116 }
117
118 // Wait for ACK
119 if rft.waitForAck(packet.SeqNum) {
120 return nil
121 }
122
123 fmt.Printf("Retry packet %d\n", packet.SeqNum, attempt+1)
124 time.Sleep(AckTimeout)
125 }
126
127 return fmt.Errorf("max retries exceeded for packet %d", packet.SeqNum)
128}
129
130func waitForAck(seqNum uint32) bool {
131 buffer := make([]byte, 1024)
132 rft.conn.SetReadDeadline(time.Now().Add(AckTimeout))
133
134 for {
135 n, addr, err := rft.conn.ReadFromUDP(buffer)
136 if err != nil {
137 return false
138 }
139
140 packet := rft.decodePacket(buffer[:n])
141 if packet.Type == PacketTypeAck && packet.SeqNum == seqNum {
142 return true
143 }
144 }
145}
146
147func ReceiveFile(outputFilename string) error {
148 file, err := os.Create(outputFilename)
149 if err != nil {
150 return fmt.Errorf("failed to create output file: %w", err)
151 }
152 defer file.Close()
153
154 expectedSeq := uint32(1)
155 receivedPackets := make(map[uint32][]byte)
156
157 fmt.Printf("Receiving file to: %s\n", outputFilename)
158
159 for {
160 buffer := make([]byte, MaxPacketSize)
161 rft.conn.SetReadDeadline(time.Now().Add(5 * time.Second))
162
163 n, addr, err := rft.conn.ReadFromUDP(buffer)
164 if err != nil {
165 if netErr, ok := err.(net.Error); ok && netErr.Timeout() {
166 // Check if we've received all packets
167 if len(receivedPackets) == 0 {
168 break // No more packets expected
169 }
170 continue
171 }
172 return err
173 }
174
175 packet := rft.decodePacket(buffer[:n])
176
177 switch packet.Type {
178 case PacketTypeData:
179 if packet.SeqNum == expectedSeq {
180 // Write to file in order
181 file.Write(packet.Payload)
182 fmt.Printf("Received packet %d\n", packet.SeqNum, len(packet.Payload))
183 expectedSeq++
184
185 // Send ACK
186 ackPacket := Packet{Type: PacketTypeAck, SeqNum: packet.SeqNum}
187 ackData := rft.encodePacket(ackPacket)
188 rft.conn.WriteToUDP(ackData, addr)
189
190 // Check for out-of-order packets
191 for i := expectedSeq; i < expectedSeq+10; i++ {
192 if data, exists := receivedPackets[i]; exists {
193 file.Write(data)
194 fmt.Printf("Wrote out-of-order packet %d\n", i)
195 delete(receivedPackets, i)
196 expectedSeq++
197 } else {
198 break
199 }
200 }
201 } else if packet.SeqNum > expectedSeq {
202 // Out-of-order packet, store it
203 receivedPackets[packet.SeqNum] = packet.Payload
204 fmt.Printf("Stored out-of-order packet %d\n", packet.SeqNum)
205
206 // Send ACK
207 ackPacket := Packet{Type: PacketTypeAck, SeqNum: packet.SeqNum}
208 ackData := rft.encodePacket(ackPacket)
209 rft.conn.WriteToUDP(ackData, addr)
210 } else {
211 // Duplicate packet, just ACK it
212 ackPacket := Packet{Type: PacketTypeAck, SeqNum: packet.SeqNum}
213 ackData := rft.encodePacket(ackPacket)
214 rft.conn.WriteToUDP(ackData, addr)
215 }
216
217 case PacketTypeFin:
218 fmt.Printf("Received FIN packet\n")
219 // Send ACK for FIN
220 ackPacket := Packet{Type: PacketTypeAck, SeqNum: packet.SeqNum}
221 ackData := rft.encodePacket(ackPacket)
222 rft.conn.WriteToUDP(ackData, addr)
223 return nil
224 }
225 }
226}
227
228func encodePacket(packet Packet) []byte {
229 data := make([]byte, 5+len(packet.Payload))
230 data[0] = packet.Type
231 binary.BigEndian.PutUint32(data[1:5], packet.SeqNum)
232 copy(data[5:], packet.Payload)
233 return data
234}
235
236func decodePacket(data []byte) Packet {
237 if len(data) < 5 {
238 return Packet{}
239 }
240
241 return Packet{
242 Type: data[0],
243 SeqNum: binary.BigEndian.Uint32(data[1:5]),
244 Payload: data[5:],
245 }
246}
247
248func main() {
249 // Create test file
250 testData := []byte("This is a test file for UDP transfer.\n" +
251 "It contains multiple lines of text to demonstrate packet sequencing.\n" +
252 "Each line should arrive in order despite UDP's connectionless nature.\n" +
253 "Packet loss and reordering should be handled gracefully.\n")
254
255 filename := "test_transfer.txt"
256 if err := os.WriteFile(filename, testData, 0644); err != nil {
257 panic(err)
258 }
259 defer os.Remove(filename)
260
261 // Start receiver
262 receiverAddr, _ := net.ResolveUDPAddr("udp", "localhost:8888")
263 receiverConn, err := net.ListenUDP("udp", receiverAddr)
264 if err != nil {
265 panic(err)
266 }
267 defer receiverConn.Close()
268
269 go func() {
270 receiver := NewReliableFileTransfer(receiverConn)
271 if err := receiver.ReceiveFile("received_file.txt"); err != nil {
272 fmt.Printf("Receive error: %v\n", err)
273 } else {
274 fmt.Printf("File received successfully\n")
275 }
276 }()
277
278 time.Sleep(100 * time.Millisecond)
279
280 // Start sender
281 senderAddr, _ := net.ResolveUDPAddr("udp", "localhost:9999")
282 senderConn, err := net.DialUDP("udp", nil, receiverAddr)
283 if err != nil {
284 panic(err)
285 }
286 defer senderConn.Close()
287
288 sender := NewReliableFileTransfer(senderConn)
289 if err := sender.SendFile(filename, receiverAddr); err != nil {
290 fmt.Printf("Send error: %v\n", err)
291 }
292
293 time.Sleep(1 * time.Second)
294
295 // Verify received file
296 if receivedData, err := os.ReadFile("received_file.txt"); err == nil {
297 fmt.Printf("\nOriginal file size: %d bytes\n", len(testData))
298 fmt.Printf("Received file size: %d bytes\n", len(receivedData))
299 if string(testData) == string(receivedData) {
300 fmt.Printf("✓ File transfer successful\n")
301 } else {
302 fmt.Printf("✗ File transfer failed - data mismatch\n")
303 }
304 os.Remove("received_file.txt")
305 }
306}
307// run
Explanation:
This UDP file transfer system demonstrates:
- Packet Sequencing: Ensures packets are reassembled in correct order
- Reliability Layer: Implements ACK/retry mechanisms on top of UDP
- Flow Control: Handles out-of-order packets and manages transmission windows
- Error Recovery: Detects and handles packet loss with automatic retransmission
- Connection Management: Properly handles file transfer completion and cleanup
Production use cases:
- Custom streaming protocols for video/audio
- Gaming network protocols needing low latency
- IoT sensor data collection with reliability guarantees
- Financial data transmission with custom reliability requirements
Exercise 5: Port Scanner
Difficulty: Intermediate | Time: 25-35 minutes
Learning Objectives:
- Master TCP connection scanning and port checking techniques
- Implement concurrent network probing with goroutine pools
- Learn network service detection and timeout management
Real-World Context: Port scanning is essential for network security auditing, service discovery, and system administration. Security professionals use port scanners to identify open services and potential vulnerabilities, while DevOps engineers use them for service discovery and network troubleshooting.
Build a concurrent port scanner that checks for open ports on target hosts. Your scanner should scan multiple ports concurrently using goroutine pools, handle TCP connection timeouts gracefully, detect common services by attempting to identify service banners, provide configurable scanning options, and demonstrate efficient network probing patterns. This exercise shows how to build network reconnaissance tools while understanding the fundamentals of TCP service discovery and network security assessment.
Solution with Explanation
1package main
2
3import (
4 "bufio"
5 "fmt"
6 "net"
7 "strconv"
8 "strings"
9 "sync"
10 "time"
11)
12
13type PortScanner struct {
14 MaxConcurrent int
15 Timeout time.Duration
16 Delay time.Duration
17}
18
19type ScanResult struct {
20 Port int
21 Open bool
22 Banner string
23 Error error
24}
25
26type ServiceSignature struct {
27 Name string
28 Banner string
29 Probe string
30 Timeout time.Duration
31}
32
33// Common service signatures for detection
34var serviceSignatures = []ServiceSignature{
35 {"HTTP", "HTTP", "GET / HTTP/1.0\r\n\r\n", 2 * time.Second},
36 {"SSH", "SSH-", "", 1 * time.Second},
37 {"FTP", "FTP", "", 1 * time.Second},
38 {"SMTP", "SMTP", "", 2 * time.Second},
39 {"POP3", "POP3", "", 1 * time.Second},
40 {"IMAP", "IMAP", "", 1 * time.Second},
41 {"Telnet", "", "", 1 * time.Second},
42}
43
44func NewPortScanner(maxConcurrent int, timeout time.Duration) *PortScanner {
45 return &PortScanner{
46 MaxConcurrent: maxConcurrent,
47 Timeout: timeout,
48 Delay: 10 * time.Millisecond, // Small delay between connections
49 }
50}
51
52func ScanHost(host string, ports []int) <-chan ScanResult {
53 results := make(chan ScanResult, len(ports))
54
55 // Create semaphore for concurrency control
56 semaphore := make(chan struct{}, ps.MaxConcurrent)
57 var wg sync.WaitGroup
58
59 for _, port := range ports {
60 wg.Add(1)
61 go func(p int) {
62 defer wg.Done()
63 semaphore <- struct{}{} // Acquire
64 defer func() { <-semaphore }() // Release
65
66 result := ps.scanPort(host, p)
67 results <- result
68
69 // Small delay to avoid overwhelming the target
70 time.Sleep(ps.Delay)
71 }(port)
72 }
73
74 // Close results channel when all scans complete
75 go func() {
76 wg.Wait()
77 close(results)
78 }()
79
80 return results
81}
82
83func scanPort(host string, port int) ScanResult {
84 result := ScanResult{Port: port}
85
86 // Connect with timeout
87 address := fmt.Sprintf("%s:%d", host, port)
88 conn, err := net.DialTimeout("tcp", address, ps.Timeout)
89 if err != nil {
90 result.Open = false
91 result.Error = err
92 return result
93 }
94 defer conn.Close()
95
96 result.Open = true
97
98 // Try to detect service and get banner
99 banner, service := ps.detectService(conn)
100 result.Banner = banner
101
102 if service != "" {
103 result.Banner = fmt.Sprintf("[%s] %s", service, banner)
104 }
105
106 return result
107}
108
109func detectService(conn net.Conn) {
110 // Set read timeout for banner detection
111 conn.SetReadDeadline(time.Now().Add(2 * time.Second))
112
113 // Try to read initial banner
114 scanner := bufio.NewScanner(conn)
115 if scanner.Scan() {
116 banner := scanner.Text()
117 return identifyService(banner), banner
118 }
119
120 // Try active probing for services that don't send banners
121 for _, sig := range serviceSignatures {
122 if sig.Probe != "" {
123 conn.SetWriteDeadline(time.Now().Add(sig.Timeout))
124 if _, err := fmt.Fprint(conn, sig.Probe); err != nil {
125 continue
126 }
127 }
128
129 conn.SetReadDeadline(time.Now().Add(sig.Timeout))
130 if scanner.Scan() {
131 banner := scanner.Text()
132 if strings.Contains(banner, sig.Banner) {
133 return sig.Name, banner
134 }
135 }
136 }
137
138 return "", ""
139}
140
141func identifyService(banner string) string {
142 banner = strings.ToUpper(banner)
143
144 serviceMap := map[string]string{
145 "SSH": "SSH",
146 "FTP": "FTP",
147 "HTTP": "HTTP",
148 "SMTP": "SMTP",
149 "POP3": "POP3",
150 "IMAP": "IMAP",
151 }
152
153 for service, identifier := range serviceMap {
154 if strings.Contains(banner, identifier) {
155 return service
156 }
157 }
158
159 return ""
160}
161
162func ScanRange(host string, startPort, endPort int) <-chan ScanResult {
163 ports := make([]int, endPort-startPort+1)
164 for i := range ports {
165 ports[i] = startPort + i
166 }
167 return ps.ScanHost(host, ports)
168}
169
170func ScanCommonPorts(host string) <-chan ScanResult {
171 // Common ports to scan
172 commonPorts := []int{
173 21, // FTP
174 22, // SSH
175 23, // Telnet
176 25, // SMTP
177 53, // DNS
178 80, // HTTP
179 110, // POP3
180 143, // IMAP
181 443, // HTTPS
182 993, // IMAPS
183 995, // POP3S
184 1433, // MSSQL
185 1521, // Oracle
186 3306, // MySQL
187 3389, // RDP
188 5432, // PostgreSQL
189 5900, // VNC
190 6379, // Redis
191 8080, // HTTP Alt
192 8443, // HTTPS Alt
193 }
194
195 return ps.ScanHost(host, commonPorts)
196}
197
198func printResults(results <-chan ScanResult, target string) {
199 fmt.Printf("Scan results for %s:\n", target)
200 fmt.Println(strings.Repeat("=", 50))
201
202 openCount := 0
203 for result := range results {
204 if result.Open {
205 openCount++
206 status := "OPEN"
207 if result.Banner != "" {
208 fmt.Printf("Port %5d: %-8s %s\n", result.Port, status, result.Banner)
209 } else {
210 fmt.Printf("Port %5d: %s\n", result.Port, status)
211 }
212 } else {
213 // Optionally show closed ports with verbose mode
214 // fmt.Printf("Port %5d: CLOSED\n", result.Port, result.Error)
215 }
216 }
217
218 fmt.Println(strings.Repeat("=", 50))
219 fmt.Printf("Scan completed. %d open ports found.\n", openCount)
220}
221
222func main() {
223 // Create scanner
224 scanner := NewPortScanner(50, 2*time.Second)
225
226 // Target to scan
227 target := "localhost"
228
229 fmt.Printf("Starting port scan of %s...\n\n", target)
230
231 // Scan common ports first
232 fmt.Println("=== Scanning Common Ports ===")
233 results := scanner.ScanCommonPorts(target)
234 printResults(results, target)
235
236 // Scan a range of ports
237 fmt.Println("\n=== Scanning Port Range 8000-8010 ===")
238 rangeResults := scanner.ScanRange(target, 8000, 8010)
239 printResults(rangeResults, target)
240
241 // Demonstrate custom port list
242 fmt.Println("\n=== Scanning Custom Port List ===")
243 customPorts := []int{3000, 3001, 5000, 5001, 7000, 7001, 9000, 9001}
244 customResults := scanner.ScanHost(target, customPorts)
245 printResults(customResults, target)
246}
247// run
Explanation:
This port scanner demonstrates:
- Concurrent Scanning: Uses goroutine pools for efficient parallel port checking
- Timeout Management: Properly handles connection timeouts to avoid hanging
- Service Detection: Attempts to identify services through banners and active probing
- Rate Limiting: Includes delays to avoid overwhelming target systems
- Flexible Scanning: Supports port ranges, common ports, and custom port lists
Production considerations:
- Add stealth scanning techniques
- Implement IPv6 support
- Add more comprehensive service signatures database
- Include UDP port scanning capabilities
- Add output formats
- Implement scan result persistence and historical comparison
Further Reading
- Go net package documentation
- Effective Go - Concurrency
- Network Programming with Go
- TCP/IP Illustrated
- Go Concurrency Patterns
- Unix Network Programming
Summary
Network programming in Go is a journey from simple client-server communication to building robust, production-ready distributed systems. The patterns we've explored are the building blocks that power everything from web services to microservices architectures.
Key Takeaways
-
TCP Programming
- Building servers and clients with
net.Listenandnet.Dial - Handling concurrent connections with goroutines
- Binary protocol implementation
- Connection management and timeouts
- Building servers and clients with
-
UDP Programming
- Connectionless communication with
net.ListenUDP - Multicast support for one-to-many communication
- Implementing reliability on top of UDP
- Packet-based communication patterns
- Connectionless communication with
-
DNS Operations
- Hostname and IP address resolution
- MX, TXT, and SRV record lookups
- Custom resolvers with caching
- Service discovery patterns
-
Connection Management
- Connection pooling for resource efficiency
- Timeout strategies
- Keep-alive configuration
- Health checking
-
Custom Protocols
- Text-based request-response protocols
- Binary protocols with length prefixing
- Streaming data protocols
- Protocol versioning
-
Production Patterns
- Graceful shutdown with context
- Circuit breaker for fault tolerance
- Rate limiting to prevent overload
- Connection health monitoring
Best Practices
- Always Use Timeouts: Never perform network operations without timeouts
- Handle Errors Gracefully: Network operations can fail in many ways
- Use Context: Propagate cancellation and timeouts through context
- Pool Connections: Reuse connections for better performance
- Implement Retry Logic: Use exponential backoff for transient failures
- Monitor Health: Regularly check connection and service health
- Graceful Shutdown: Always drain connections before shutting down
- Buffer Management: Be careful with buffer sizes to avoid memory issues
- Concurrency Safety: Protect shared state with proper synchronization
- Test Edge Cases: Network failures, timeouts, and partial reads/writes
Common Pitfalls
- Forgetting to close connections
- Not setting timeouts
- Ignoring partial reads/writes
- Not handling network errors properly
- Blocking operations in handlers
- Not limiting concurrent connections
- Exposing internal implementation in protocols
💡 Final Key Takeaway: Network programming is fundamentally about dealing with failure. Networks are unreliable, connections break, and services fail. The most robust network applications are those that expect and handle these failures gracefully, providing a good user experience even when things go wrong.