# Instrumentation Guide - Observability Platform

Complete guide for instrumenting your applications with metrics, logs, and traces.

## Table of Contents

1. [Metrics Instrumentation](#metrics-instrumentation)
2. [Log Instrumentation](#log-instrumentation)
3. [Trace Instrumentation](#trace-instrumentation)
4. [Best Practices](#best-practices)
5. [Examples](#examples)

---

## Metrics Instrumentation

### Using Prometheus Client Library

```go
import "github.com/prometheus/client_golang/prometheus"

// Define counters
var (
    httpRequestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total HTTP requests",
        },
        []string{"method", "status"},
    )

    httpDurationSeconds = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "http_duration_seconds",
            Help: "HTTP request duration",
        },
        []string{"method"},
    )
)

// Register metrics
func init() {
    prometheus.MustRegister(httpRequestsTotal, httpDurationSeconds)
}
```

### Recording Metrics

```go
func handleRequest(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    status := http.StatusOK

    // Do work...
    w.WriteHeader(status)

    // Record metrics
    httpRequestsTotal.WithLabelValues(r.Method, fmt.Sprintf("%d", status)).Inc()
    httpDurationSeconds.WithLabelValues(r.Method).Observe(time.Since(start).Seconds())
}
```

### Gauge Metrics

```go
var (
    activeConnections = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "active_connections",
            Help: "Number of active connections",
        },
    )
)

// Increment when connection opens
activeConnections.Inc()

// Decrement when connection closes
activeConnections.Dec()

// Set explicit value
activeConnections.Set(float64(count))
```

### Exposing Metrics

```go
import "github.com/prometheus/client_golang/prometheus/promhttp"

func main() {
    router := http.NewServeMux()
    router.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", router)
}
```

---

## Log Instrumentation

### Ingesting Logs

```go
import (
    "bytes"
    "encoding/json"
    "net/http"
)

type LogEntry struct {
    Level   string                 `json:"level"`
    Service string                 `json:"service"`
    Message string                 `json:"message"`
    Fields  map[string]interface{} `json:"fields"`
    TraceID string                 `json:"trace_id,omitempty"`
}

func logMessage(level, message string, fields map[string]interface{}) error {
    entry := LogEntry{
        Level:   level,
        Service: "my-service",
        Message: message,
        Fields:  fields,
    }

    body, _ := json.Marshal(entry)
    resp, err := http.Post(
        "http://localhost:8080/api/logs/ingest",
        "application/json",
        bytes.NewReader(body),
    )
    if err != nil {
        return err
    }
    defer resp.Body.Close()
    return nil
}
```

### Structured Logging Patterns

```go
// Error logs
logMessage("error", "Database connection failed", map[string]interface{}{
    "error":      "connection timeout",
    "host":       "db.example.com",
    "duration_ms": 5000,
    "retry_count": 3,
})

// Info logs
logMessage("info", "Request processed successfully", map[string]interface{}{
    "method":      "POST",
    "path":        "/api/orders",
    "status":      201,
    "duration_ms": 145,
})

// Warn logs
logMessage("warn", "High response time detected", map[string]interface{}{
    "duration_ms": 2500,
    "threshold":   1000,
    "endpoint":    "/api/search",
})

// Debug logs
logMessage("debug", "Cache hit", map[string]interface{}{
    "cache_key": "user:123",
    "ttl_remaining": 3600,
})
```

### Logger Wrapper

```go
type Logger struct {
    serviceName string
    traceID     string
}

func NewLogger(serviceName, traceID string) *Logger {
    return &Logger{serviceName: serviceName, traceID: traceID}
}

func (l *Logger) Error(message string, fields map[string]interface{}) {
    l.log("error", message, fields)
}

func (l *Logger) Warn(message string, fields map[string]interface{}) {
    l.log("warn", message, fields)
}

func (l *Logger) Info(message string, fields map[string]interface{}) {
    l.log("info", message, fields)
}

func (l *Logger) Debug(message string, fields map[string]interface{}) {
    l.log("debug", message, fields)
}

func (l *Logger) log(level, message string, fields map[string]interface{}) {
    if fields == nil {
        fields = make(map[string]interface{})
    }
    if l.traceID != "" {
        fields["trace_id"] = l.traceID
    }
    logMessage(level, message, fields)
}
```

---

## Trace Instrumentation

### OpenTelemetry Setup

```go
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/trace"
)

var tracer trace.Tracer

func init() {
    tracer = otel.Tracer("my-service")
}
```

### Creating Spans

```go
func processOrder(ctx context.Context, orderID string) error {
    ctx, span := tracer.Start(ctx, "processOrder")
    defer span.End()

    // Set attributes
    span.SetAttributes(
        attribute.String("order.id", orderID),
    )

    // Call other services
    if err := validateOrder(ctx, orderID); err != nil {
        span.RecordError(err)
        return err
    }

    if err := processPayment(ctx, orderID); err != nil {
        span.RecordError(err)
        return err
    }

    return nil
}
```

### Child Spans

```go
func validateOrder(ctx context.Context, orderID string) error {
    _, span := tracer.Start(ctx, "validateOrder")
    defer span.End()

    // Validation logic
    return nil
}

func processPayment(ctx context.Context, orderID string) error {
    _, span := tracer.Start(ctx, "processPayment")
    defer span.End()

    // Payment processing
    return nil
}
```

### HTTP Client Instrumentation

```go
import (
    "go.opentelemetry.io/instrumentation/net/http/otelhttp"
)

// Create traced HTTP client
client := &http.Client{
    Transport: otelhttp.NewTransport(http.DefaultTransport),
}

// Use client normally - traces are automatic
resp, err := client.Get("http://api.example.com/data")
```

### HTTP Server Instrumentation

```go
import (
    "go.opentelemetry.io/instrumentation/net/http/otelhttp"
)

// Wrap handler
handler := otelhttp.NewHandler(
    http.HandlerFunc(handleRequest),
    "handleRequest",
)

http.Handle("/api/orders", handler)
```

---

## Best Practices

### 1. Metric Naming

```go
// Good - Clear, hierarchical
http_requests_total
http_request_duration_seconds
database_query_duration_seconds
cache_hits_total

// Avoid - Too vague
requests_total
duration_ms
queries
hits
```

### 2. Label Design

```go
// Good - Reasonable cardinality
httpRequestsTotal.WithLabelValues(method, status)
// Potential values: 5 methods × 5 statuses = 25 combinations

// Avoid - High cardinality (explosion)
httpRequestsTotal.WithLabelValues(method, userID)
// Potential values: 5 methods × millions of users = huge cardinality
```

### 3. Logging Levels

```
DEBUG   - Detailed debugging information
INFO    - General informational messages
WARN    - Warning messages (recoverable issues)
ERROR   - Error messages (serious problems)
```

### 4. Trace Context Propagation

```go
// Extract trace context from request
ctx := r.Context()
span := trace.SpanContextFromContext(ctx)

// Pass context to dependent calls
childSpan := tracer.Start(ctx, "childOperation")

// Log with trace ID
logger.Info("Processing request", map[string]interface{}{
    "trace_id": span.TraceID().String(),
})
```

### 5. Cardinality Management

```go
// Good - Bounded cardinality
status := http.StatusOK  // Limited values
method := r.Method        // Limited values

// Avoid - Unbounded cardinality
userID := r.Header.Get("X-User-ID")  // Could be millions
requestPath := r.URL.Path            // Could be infinite
```

---

## Examples

### Complete Service Example

```go
package main

import (
    "context"
    "fmt"
    "net/http"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "go.opentelemetry.io/otel"
)

var (
    tracer = otel.Tracer("example-service")
    
    httpDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "http_duration_seconds",
            Help: "HTTP request duration",
        },
        []string{"method", "path"},
    )
)

func init() {
    prometheus.MustRegister(httpDuration)
}

func handleOrder(w http.ResponseWriter, r *http.Request) {
    ctx, span := tracer.Start(r.Context(), "handleOrder")
    defer span.End()

    start := time.Now()
    orderID := r.URL.Query().Get("id")
    
    span.SetAttributes(
        attribute.String("order.id", orderID),
    )

    // Validate order
    if err := validateOrder(ctx, orderID); err != nil {
        logMessage("error", "Order validation failed", map[string]interface{}{
            "order_id": orderID,
            "error":    err.Error(),
        })
        w.WriteHeader(http.StatusBadRequest)
        return
    }

    // Process payment
    if err := processPayment(ctx, orderID); err != nil {
        logMessage("error", "Payment processing failed", map[string]interface{}{
            "order_id": orderID,
            "error":    err.Error(),
        })
        w.WriteHeader(http.StatusPaymentRequired)
        return
    }

    // Success
    httpDuration.WithLabelValues("POST", "/orders").Observe(time.Since(start).Seconds())
    logMessage("info", "Order processed successfully", map[string]interface{}{
        "order_id":    orderID,
        "duration_ms": time.Since(start).Milliseconds(),
    })
    w.WriteHeader(http.StatusOK)
    fmt.Fprint(w, "Order processed")
}

func validateOrder(ctx context.Context, orderID string) error {
    _, span := tracer.Start(ctx, "validateOrder")
    defer span.End()
    // Validation logic
    return nil
}

func processPayment(ctx context.Context, orderID string) error {
    _, span := tracer.Start(ctx, "processPayment")
    defer span.End()
    // Payment logic
    return nil
}

func main() {
    http.HandleFunc("/orders", handleOrder)
    http.Handle("/metrics", promhttp.Handler())
    
    fmt.Println("Server started on :8080")
    http.ListenAndServe(":8080", nil)
}
```

### Querying Data

```bash
# Query logs
curl -X POST http://localhost:8080/api/logs/query \
  -H "Content-Type: application/json" \
  -d '{
    "service": "my-service",
    "level": "error",
    "start_time": "2024-01-01T00:00:00Z",
    "end_time": "2024-01-02T00:00:00Z",
    "limit": 100
  }' | jq

# Query metrics
curl -X POST http://localhost:8080/api/metrics/query \
  -H "Content-Type: application/json" \
  -d '{
    "metric": "http_duration_seconds",
    "labels": {"method": "POST"},
    "start_time": "2024-01-01T00:00:00Z",
    "end_time": "2024-01-02T00:00:00Z",
    "step": "60s"
  }' | jq

# Get trace
curl http://localhost:8080/api/traces/abc123def456 | jq

# Get service map
curl http://localhost:8080/api/servicemap | jq
```

---

## Troubleshooting Instrumentation

### Metrics Not Appearing

```go
// Ensure metrics are registered
prometheus.MustRegister(myMetric)

// Ensure /metrics endpoint is exposed
router.Handle("/metrics", promhttp.Handler())

// Verify metric is being recorded
myMetric.Inc()  // For counters
myMetric.Observe(value)  // For histograms
```

### Traces Not Showing

```go
// Verify tracer is initialized
tracer := otel.Tracer("service-name")

// Ensure context is propagated
ctx, span := tracer.Start(ctx, "operation")
defer span.End()

// Set required attributes
span.SetAttributes(attribute.String("key", "value"))
```

### Logs Not Ingested

```go
// Verify endpoint is correct
http://observability-platform:8080/api/logs/ingest

// Ensure JSON is valid
logEntry := LogEntry{
    Level:   "error",
    Service: "service-name",
    Message: "error message",
}

// Check network connectivity
curl -i http://localhost:8080/health
```

---

**For more information, refer to the main README.md or DEPLOYMENT.md**
