注文処理システムの実装: 分散トレースとロギングの一部-Golang-php.cn

ホームページ

バックエンド開発

Golang

注文処理システムの実装: 分散トレースとロギングの一部

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Sep 05, 2024 pm 10:32 PM

Implementing an Order Processing System: Part Distributed Tracing and Logging

1. はじめにと目標

洗練された注文処理システムの実装に関するシリーズの第 5 回へようこそ!これまでの投稿では、基本アーキテクチャの設定から高度なワークフローの実装、包括的な監視までのすべてを取り上げてきました。今日、私たちは分散トレースとロギングの世界に飛び込みます。これは、マイクロサービスアーキテクチャで可観測性を維持するための 2 つの重要なコンポーネントです。

以前の投稿の要約

パート 1 では、プロジェクト構造を設定し、基本的な CRUD API を実装しました。
パート 2 では、複雑なワークフロー向けに Temporal の使用を拡大することに焦点を当てました。
パート 3 では、最適化やシャーディングなどの高度なデータベース操作について詳しく説明しました。
パート 4 では、Prometheus と Grafana を使用した包括的な監視とアラートについて説明しました。

マイクロサービスアーキテクチャにおける分散トレースと分散ログの重要性

マイクロサービスアーキテクチャでは、単一のユーザーリクエストが複数のサービスにまたがることがよくあります。この分散された性質により、リクエストのフローを理解し、問題が発生したときに診断することが困難になります。分散トレースと集中ログは、以下を提供することでこれらの課題に対処します。

サービス全体にわたるリクエストフローのエンドツーエンドの可視性
個々のコンポーネントのパフォーマンスに関する詳細な洞察
さまざまなサービス間でイベントを関連付けることができる機能
システムの動作と健全性の一元的なビュー

OpenTelemetry と ELK スタックの概要

分散トレースとロギングを実装するには、次の 2 つの強力なツールセットを使用します。

OpenTelemetry : アプリケーションから分散トレースとメトリクスをキャプチャするための API、ライブラリ、エージェント、およびコレクターサービスの単一セットを提供するクラウドネイティブソフトウェアの可観測性フレームワーク。
ELK Stack : Elastic の 3 つのオープンソース製品 (Elasticsearch、Logstash、Kibana) のコレクションであり、ログの取り込み、保存、視覚化のための堅牢なプラットフォームを提供します。

シリーズのこのパートの目標

この投稿を終えると、次のことができるようになります:

OpenTelemetry を使用してマイクロサービス全体に分散トレースを実装します
ELK スタックを使用して集中ログをセットアップする
ログ、トレース、メトリクスを相互に関連付けて、システムの動作を統一的に表示します
効果的なログの集約と分析戦略を実装する
マイクロサービスアーキテクチャにログインするためのベストプラクティスを適用する

さあ、飛び込みましょう!

2. 理論的背景と概念

実装を開始する前に、分散トレースとロギングのセットアップに重要ないくつかの重要な概念を確認してみましょう。

分散トレーシングの概要

分散トレーシングは、分散システム内のさまざまなサービスを流れるリクエストを追跡する方法です。これにより、次のようなリクエストのライフサイクル全体を理解する方法が提供されます。

リクエストがシステム内でたどるパス
対話するサービスとリソース
各サービスに費やされた時間

トレースは通常、1 つ以上のスパンで構成されます。スパンは、作業または操作の単位を表します。リクエストによって行われる特定の操作を追跡し、操作の開始時と終了時、およびその他のデータを記録します。

OpenTelemetry プロジェクトとそのコンポーネントについて

OpenTelemetry は、クラウドネイティブソフトウェア用の可観測性フレームワークです。アプリケーションから分散トレースとメトリクスをキャプチャするための API、ライブラリ、エージェント、およびコレクターサービスの単一セットを提供します。主要なコンポーネントは次のとおりです:

API : トレースとメトリクスのためのコアデータ型と操作を提供します。
SDK : API を実装し、動作を構成およびカスタマイズする方法を提供します。
インストルメンテーションライブラリ : 一般的なフレームワークとライブラリに自動インストルメンテーションを提供します。
コレクター : テレメトリデータを受信、処理、エクスポートします。

分散システムにおけるロギングのベストプラクティスの概要

分散システムでの効果的なロギングには慎重な考慮が必要です:

構造化ログ : 解析と分析を容易にするために、ログエントリに一貫した構造化形式 (JSON など) を使用します。
相関 ID : サービス全体でリクエストを追跡するために、ログエントリに一意の識別子を含めます。
コンテキスト情報 : 関連するコンテキスト (ユーザー ID、注文 ID など) をログエントリに含めます。
ログレベル : サービス全体で一貫して適切なログレベル (DEBUG、INFO、WARN、ERROR) を使用します。
集中ログ : すべてのサービスからのログを一元的に集約し、分析を容易にします。

ELK (Elasticsearch、Logstash、Kibana) スタックの概要

ELK スタックはログ管理によく使用されます:

Elasticsearch : 大量のデータを処理できる分散型 RESTful 検索および分析エンジン。
Logstash : 複数のソースからデータを取り込み、変換して、Elasticsearch に送信するサーバー側のデータ処理パイプライン。
Kibana : Elasticsearch 上で動作する視覚化レイヤーで、データの検索、表示、操作のためのユーザーインターフェイスを提供します。

ログの集計と分析の概念

ログの集約には、さまざまなソースからログデータを収集し、一元的な場所に保存することが含まれます。これにより、次のことが可能になります。

複数のサービスにわたるログの検索と分析が簡単になりました
システムのさまざまなコンポーネントにわたるイベントの相関関係
ログデータの長期保存とアーカイブ

ログ分析には、ログデータから有意義な洞察を抽出することが含まれます。これには次のものが含まれます。

パターンと傾向を特定する
異常とエラーの検出
システムの健全性とパフォーマンスの監視
インシデント対応中の根本原因分析をサポート

これらの概念を念頭に置いて、注文処理システムへの分散トレーシングの実装に進みましょう。

3. OpenTelemetry を使用した分散トレーシングの実装

OpenTelemetry を使用して、注文処理システムに分散トレーシングを実装することから始めましょう。

Go サービスで OpenTelemetry をセットアップする

まず、OpenTelemetry を Go サービスに追加する必要があります。次の依存関係を go.mod ファイルに追加します:

require (
    go.opentelemetry.io/otel v1.7.0
    go.opentelemetry.io/otel/exporters/jaeger v1.7.0
    go.opentelemetry.io/otel/sdk v1.7.0
    go.opentelemetry.io/otel/trace v1.7.0
)

ログイン後にコピー

次に、メイン関数でトレーサープロバイダーを設定しましょう:

package main

import (
    "log"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/exporters/jaeger"
    "go.opentelemetry.io/otel/sdk/resource"
    tracesdk "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
)

func initTracer() func() {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint("http://jaeger:14268/api/traces")))
    if err != nil {
        log.Fatal(err)
    }
    tp := tracesdk.NewTracerProvider(
        tracesdk.WithBatcher(exporter),
        tracesdk.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("order-processing-service"),
            attribute.String("environment", "production"),
        )),
    )
    otel.SetTracerProvider(tp)
    return func() {
        if err := tp.Shutdown(context.Background()); err != nil {
            log.Printf("Error shutting down tracer provider: %v", err)
        }
    }
}

func main() {
    cleanup := initTracer()
    defer cleanup()

    // Rest of your main function...
}

ログイン後にコピー

これにより、人気のある分散トレースバックエンドである Jaeger にトレースをエクスポートするトレーサープロバイダーが設定されます。

トレースを使用した注文処理ワークフローの計測化

次に、注文処理ワークフローにトレースを追加しましょう。 CreateOrder 関数から始めます:

import (
    "context"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/trace"
)

func CreateOrder(ctx context.Context, order Order) error {
    tr := otel.Tracer("order-processing")
    ctx, span := tr.Start(ctx, "CreateOrder")
    defer span.End()

    span.SetAttributes(attribute.Int64("order.id", order.ID))
    span.SetAttributes(attribute.Float64("order.total", order.Total))

    // Validate order
    if err := validateOrder(ctx, order); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "Order validation failed")
        return err
    }

    // Process payment
    if err := processPayment(ctx, order); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "Payment processing failed")
        return err
    }

    // Update inventory
    if err := updateInventory(ctx, order); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "Inventory update failed")
        return err
    }

    span.SetStatus(codes.Ok, "Order created successfully")
    return nil
}

ログイン後にコピー

これにより、CreateOrder 関数の新しいスパンが作成され、関連する属性が追加されます。また、プロセスの主要なステップごとに子スパンも作成されます。

サービス境界を越えたコンテキストの伝播

他のサービスを呼び出すときは、トレースコンテキストを伝播する必要があります。これを HTTP クライアントで行う方法の例を次に示します。

import (
    "net/http"

    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

func callExternalService(ctx context.Context, url string) error {
    client := http.Client{Transport: otelhttp.NewTransport(http.DefaultTransport)}
    req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
    if err != nil {
        return err
    }
    _, err = client.Do(req)
    return err
}

ログイン後にコピー

これは otelhttp パッケージを使用して、HTTP ヘッダー内のトレースコンテキストを自動的に伝播します。

非同期操作とバックグラウンドジョブの処理

非同期操作の場合、トレースコンテキストを正しく渡していることを確認する必要があります。以下はワーカープールを使用する例です:

func processOrderAsync(ctx context.Context, order Order) {
    tr := otel.Tracer("order-processing")
    ctx, span := tr.Start(ctx, "processOrderAsync")
    defer span.End()

    workerPool <- func() {
        processCtx := trace.ContextWithSpan(context.Background(), span)
        if err := processOrder(processCtx, order); err != nil {
            span.RecordError(err)
            span.SetStatus(codes.Error, "Async order processing failed")
        } else {
            span.SetStatus(codes.Ok, "Async order processing succeeded")
        }
    }
}

ログイン後にコピー

これにより、非同期操作の新しいスパンが作成され、それがワーカー関数に渡されます。

OpenTelemetry とテンポラルワークフローの統合

OpenTelemetry を Temporal ワークフローと統合するには、 go.opentelemetry.io/contrib/instrumentation/go.temporal.io/temporal/oteltemporalgrpc パッケージを使用できます。

import (
    "go.temporal.io/sdk/client"
    "go.temporal.io/sdk/worker"
    "go.opentelemetry.io/contrib/instrumentation/go.temporal.io/temporal/oteltemporalgrpc"
)

func initTemporalClient() (client.Client, error) {
    return client.NewClient(client.Options{
        HostPort: "temporal:7233",
        ConnectionOptions: client.ConnectionOptions{
            DialOptions: []grpc.DialOption{
                grpc.WithUnaryInterceptor(oteltemporalgrpc.UnaryClientInterceptor()),
                grpc.WithStreamInterceptor(oteltemporalgrpc.StreamClientInterceptor()),
            },
        },
    })
}

func initTemporalWorker(c client.Client, taskQueue string) worker.Worker {
    w := worker.New(c, taskQueue, worker.Options{
        WorkerInterceptors: []worker.WorkerInterceptor{
            oteltemporalgrpc.WorkerInterceptor(),
        },
    })
    return w
}

ログイン後にコピー

これにより、OpenTelemetry インストルメンテーションを使用して Temporal クライアントとワーカーがセットアップされます。

トレースをバックエンド (Jaeger など) にエクスポートする

initTracer 関数のトレースバックエンドとして、Jaeger をすでに設定しました。トレースを視覚化するには、Jaeger を docker-compose.yml に追加する必要があります:

services:
  # ... other services ...

  jaeger:
    image: jaegertracing/all-in-one:1.35
    ports:
      - "16686:16686"
      - "14268:14268"
    environment:
      - COLLECTOR_OTLP_ENABLED=true

ログイン後にコピー

これで、http://localhost:16686 にある Jaeger UI にアクセスして、トレースを表示および分析できるようになります。

次のセクションでは、分散トレースの設定を補完するために、ELK スタックを使用して集中ログを設定します。

4. ELK スタックを使用した集中ログのセットアップ

分散トレースを適切に配置したので、ELK (Elasticsearch、Logstash、Kibana) スタックを使用して集中ログを設定しましょう。

Elasticsearch のインストールと構成

まず、Elasticsearch を docker-compose.yml に追加しましょう。

services:
  # ... other services ...

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data

volumes:
  elasticsearch_data:
    driver: local

ログイン後にコピー

これにより、開発目的で単一ノード Elasticsearch インスタンスがセットアップされます。

Setting up Logstash for Log Ingestion and Processing

Next, let’s add Logstash to our docker-compose.yml:

services:
  # ... other services ...

  logstash:
    image: docker.elastic.co/logstash/logstash:7.14.0
    volumes:
      - ./logstash/pipeline:/usr/share/logstash/pipeline
    ports:
      - "5000:5000/tcp"
      - "5000:5000/udp"
      - "9600:9600"
    depends_on:
      - elasticsearch

ログイン後にコピー

Create a Logstash pipeline configuration file at ./logstash/pipeline/logstash.conf:

input {
  tcp {
    port => 5000
    codec => json
  }
}

filter {
  if [trace_id] {
    mutate {
      add_field => { "[@metadata][trace_id]" => "%{trace_id}" }
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "order-processing-logs-%{+YYYY.MM.dd}"
  }
}

ログイン後にコピー

This configuration sets up Logstash to receive JSON logs over TCP, process them, and forward them to Elasticsearch.

Configuring Kibana for Log Visualization

Now, let’s add Kibana to our docker-compose.yml:

services:
  # ... other services ...

  kibana:
    image: docker.elastic.co/kibana/kibana:7.14.0
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_URL: http://elasticsearch:9200
      ELASTICSEARCH_HOSTS: '["http://elasticsearch:9200"]'
    depends_on:
      - elasticsearch

ログイン後にコピー

You can access the Kibana UI at http://localhost:5601 once it’s up and running.

Implementing Structured Logging in our Go Services

To send structured logs to Logstash, we’ll use the logrus library. First, add it to your go.mod:

go get github.com/sirupsen/logrus

ログイン後にコピー

Now, let’s set up a logger in our main function:

import (
    "github.com/sirupsen/logrus"
    "gopkg.in/sohlich/elogrus.v7"
)

func initLogger() *logrus.Logger {
    log := logrus.New()
    log.SetFormatter(&logrus.JSONFormatter{})

    hook, err := elogrus.NewElasticHook("elasticsearch:9200", "warning", "order-processing-logs")
    if err != nil {
        log.Fatalf("Failed to create Elasticsearch hook: %v", err)
    }
    log.AddHook(hook)

    return log
}

func main() {
    log := initLogger()

    // Rest of your main function...
}

ログイン後にコピー

This sets up a JSON formatter for our logs and adds an Elasticsearch hook to send logs directly to Elasticsearch.

Sending Logs from our Services to the ELK Stack

Now, let’s update our CreateOrder function to use structured logging:

func CreateOrder(ctx context.Context, order Order) error {
    tr := otel.Tracer("order-processing")
    ctx, span := tr.Start(ctx, "CreateOrder")
    defer span.End()

    logger := logrus.WithFields(logrus.Fields{
        "order_id": order.ID,
        "trace_id": span.SpanContext().TraceID().String(),
    })

    logger.Info("Starting order creation")

    // Validate order
    if err := validateOrder(ctx, order); err != nil {
        logger.WithError(err).Error("Order validation failed")
        span.RecordError(err)
        span.SetStatus(codes.Error, "Order validation failed")
        return err
    }

    // Process payment
    if err := processPayment(ctx, order); err != nil {
        logger.WithError(err).Error("Payment processing failed")
        span.RecordError(err)
        span.SetStatus(codes.Error, "Payment processing failed")
        return err
    }

    // Update inventory
    if err := updateInventory(ctx, order); err != nil {
        logger.WithError(err).Error("Inventory update failed")
        span.RecordError(err)
        span.SetStatus(codes.Error, "Inventory update failed")
        return err
    }

    logger.Info("Order created successfully")
    span.SetStatus(codes.Ok, "Order created successfully")
    return nil
}

ログイン後にコピー

This code logs each step of the order creation process, including any errors that occur. It also includes the trace ID in each log entry, which will be crucial for correlating logs with traces.

5. Correlating Logs, Traces, and Metrics

Now that we have both distributed tracing and centralized logging set up, let’s explore how to correlate this information for a unified view of system behavior.

Implementing Correlation IDs Across Logs and Traces

We’ve already included the trace ID in our log entries. To make this correlation even more powerful, we can add a custom field to our spans that includes the log index:

span.SetAttributes(attribute.String("log.index", "order-processing-logs-"+time.Now().Format("2006.01.02")))

ログイン後にコピー

This allows us to easily jump from a span in Jaeger to the corresponding logs in Kibana.

Adding Trace IDs to Log Entries

We’ve already added trace IDs to our log entries in the previous section. This allows us to search for all log entries related to a particular trace in Kibana.

Linking Metrics to Traces Using Exemplars

To link our Prometheus metrics to traces, we can use exemplars. Here’s an example of how to do this:

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
    "go.opentelemetry.io/otel/trace"
)

var (
    orderProcessingDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "order_processing_duration_seconds",
            Help: "Duration of order processing in seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"status"},
    )
)

func CreateOrder(ctx context.Context, order Order) error {
    // ... existing code ...

    start := time.Now()
    // ... process order ...
    duration := time.Since(start)

    orderProcessingDuration.WithLabelValues("success").Observe(duration.Seconds(), prometheus.Labels{
        "trace_id": span.SpanContext().TraceID().String(),
    })

    // ... rest of the function ...
}

ログイン後にコピー

This adds the trace ID as an exemplar to our order processing duration metric.

Creating a Unified View of System Behavior

With logs, traces, and metrics all correlated, we can create a unified view of our system’s behavior:

In Grafana, create a dashboard that includes both Prometheus metrics and Elasticsearch logs.
Use the trace ID to link from a metric to the corresponding trace in Jaeger.
From Jaeger, use the log index attribute to link to the corresponding logs in Kibana.

This allows you to seamlessly navigate between metrics, traces, and logs, providing a comprehensive view of your system’s behavior and making it easier to debug issues.

6. Log Aggregation and Analysis

With our logs centralized in Elasticsearch, let’s explore some strategies for effective log aggregation and analysis.

Designing Effective Log Aggregation Strategies

Use Consistent Log Formats : Ensure all services use the same log format (in our case, JSON) with consistent field names.
Include Relevant Context : Always include relevant context in logs, such as order ID, user ID, and trace ID.
Use Log Levels Appropriately : Use DEBUG for detailed information, INFO for general information, WARN for potential issues, and ERROR for actual errors.
Aggregate Logs by Service : Use different Elasticsearch indices or index patterns for different services to allow for easier analysis.

Implementing Log Sampling for High-Volume Services

For high-volume services, logging every event can be prohibitively expensive. Implement log sampling to reduce the volume while still maintaining visibility:

func shouldLog() bool {
    return rand.Float32() < 0.1 // Log 10% of events
}

func CreateOrder(ctx context.Context, order Order) error {
    // ... existing code ...

    if shouldLog() {
        logger.Info("Order created successfully")
    }

    // ... rest of the function ...
}

ログイン後にコピー

Creating Kibana Dashboards for Log Analysis

In Kibana, create dashboards that provide insights into your system’s behavior. Some useful visualizations might include:

Number of orders created over time
Distribution of order processing times
Error rate by service
Most common error types

Implementing Alerting Based on Log Patterns

Use Kibana’s alerting features to set up alerts based on log patterns. For example:

Alert when the error rate exceeds a certain threshold
Alert on specific error messages that indicate critical issues
Alert when order processing time exceeds a certain duration

Using Machine Learning for Anomaly Detection in Logs

Elasticsearch provides machine learning capabilities that can be used for anomaly detection in logs. You can set up machine learning jobs in Kibana to detect:

Unusual spikes in error rates
Abnormal patterns in order creation
Unexpected changes in log volume

These machine learning insights can help you identify issues before they become critical problems.

In the next sections, we’ll cover best practices for logging in a microservices architecture and explore some advanced OpenTelemetry techniques.

7. Best Practices for Logging in a Microservices Architecture

When implementing logging in a microservices architecture, there are several best practices to keep in mind to ensure your logs are useful, manageable, and secure.

Standardizing Log Formats Across Services

Consistency in log formats across all your services is crucial for effective log analysis. In our Go services, we can create a custom logger that enforces a standard format:

import (
    "github.com/sirupsen/logrus"
)

type StandardLogger struct {
    *logrus.Logger
    ServiceName string
}

func NewStandardLogger(serviceName string) *StandardLogger {
    logger := logrus.New()
    logger.SetFormatter(&logrus.JSONFormatter{
        FieldMap: logrus.FieldMap{
            logrus.FieldKeyTime: "timestamp",
            logrus.FieldKeyLevel: "severity",
            logrus.FieldKeyMsg: "message",
        },
    })
    return &StandardLogger{
        Logger: logger,
        ServiceName: serviceName,
    }
}

func (l *StandardLogger) WithFields(fields logrus.Fields) *logrus.Entry {
    return l.Logger.WithFields(logrus.Fields{
        "service": l.ServiceName,
    }).WithFields(fields)
}

ログイン後にコピー

This logger ensures that all log entries include a “service” field and use consistent field names.

Implementing Contextual Logging

Contextual logging involves including relevant context with each log entry. In a microservices architecture, this often means including a request ID or trace ID that can be used to correlate logs across services:

func CreateOrder(ctx context.Context, logger *StandardLogger, order Order) error {
    tr := otel.Tracer("order-processing")
    ctx, span := tr.Start(ctx, "CreateOrder")
    defer span.End()

    logger := logger.WithFields(logrus.Fields{
        "order_id": order.ID,
        "trace_id": span.SpanContext().TraceID().String(),
    })

    logger.Info("Starting order creation")

    // ... rest of the function ...
}

ログイン後にコピー

Handling Sensitive Information in Logs

It’s crucial to ensure that sensitive information, such as personal data or credentials, is not logged. You can create a custom log hook to redact sensitive information:

type SensitiveDataHook struct{}

func (h *SensitiveDataHook) Levels() []logrus.Level {
    return logrus.AllLevels
}

func (h *SensitiveDataHook) Fire(entry *logrus.Entry) error {
    if entry.Data["credit_card"] != nil {
        entry.Data["credit_card"] = "REDACTED"
    }
    return nil
}

// In your main function:
logger.AddHook(&SensitiveDataHook{})

ログイン後にコピー

Managing Log Retention and Rotation

In a production environment, you need to manage log retention and rotation to control storage costs and comply with data retention policies. While Elasticsearch can handle this to some extent, you might also want to implement log rotation at the application level:

import (
    "gopkg.in/natefinch/lumberjack.v2"
)

func initLogger() *logrus.Logger {
    logger := logrus.New()
    logger.SetOutput(&lumberjack.Logger{
        Filename: "/var/log/myapp.log",
        MaxSize: 100, // megabytes
        MaxBackups: 3,
        MaxAge: 28, //days
        Compress: true,
    })
    return logger
}

ログイン後にコピー

Implementing Audit Logging for Compliance Requirements

For certain operations, you may need to maintain an audit trail for compliance reasons. You can create a separate audit logger for this purpose:

type AuditLogger struct {
    logger *logrus.Logger
}

func NewAuditLogger() *AuditLogger {
    logger := logrus.New()
    logger.SetFormatter(&logrus.JSONFormatter{})
    // Set up a separate output for audit logs
    // This could be a different file, database, or even a separate Elasticsearch index
    return &AuditLogger{logger: logger}
}

func (a *AuditLogger) LogAuditEvent(ctx context.Context, event string, details map[string]interface{}) {
    span := trace.SpanFromContext(ctx)
    a.logger.WithFields(logrus.Fields{
        "event": event,
        "trace_id": span.SpanContext().TraceID().String(),
        "details": details,
    }).Info("Audit event")
}

// Usage:
auditLogger.LogAuditEvent(ctx, "OrderCreated", map[string]interface{}{
    "order_id": order.ID,
    "user_id": order.UserID,
})

ログイン後にコピー

8. Advanced OpenTelemetry Techniques

Now that we have a solid foundation for distributed tracing, let’s explore some advanced techniques to get even more value from OpenTelemetry.

Implementing Custom Span Attributes and Events

Custom span attributes and events can provide additional context to your traces:

func ProcessPayment(ctx context.Context, order Order) error {
    _, span := otel.Tracer("payment-service").Start(ctx, "ProcessPayment")
    defer span.End()

    span.SetAttributes(
        attribute.String("payment.method", order.PaymentMethod),
        attribute.Float64("payment.amount", order.Total),
    )

    // Process payment...

    if paymentSuccessful {
        span.AddEvent("PaymentProcessed", trace.WithAttributes(
            attribute.String("transaction_id", transactionID),
        ))
    } else {
        span.AddEvent("PaymentFailed", trace.WithAttributes(
            attribute.String("error", "Insufficient funds"),
        ))
    }

    return nil
}

ログイン後にコピー

Using OpenTelemetry’s Baggage for Cross-Cutting Concerns

Baggage allows you to propagate key-value pairs across service boundaries:

import (
    "go.opentelemetry.io/otel/baggage"
)

func AddUserInfoToBaggage(ctx context.Context, userID string) context.Context {
    b, _ := baggage.Parse(fmt.Sprintf("user_id=%s", userID))
    return baggage.ContextWithBaggage(ctx, b)
}

func GetUserIDFromBaggage(ctx context.Context) string {
    if b := baggage.FromContext(ctx); b != nil {
        if v := b.Member("user_id"); v.Key() != "" {
            return v.Value()
        }
    }
    return ""
}

ログイン後にコピー

Implementing Sampling Strategies for High-Volume Tracing

For high-volume services, tracing every request can be expensive. Implement a sampling strategy to reduce the volume while still maintaining visibility:

import (
    "go.opentelemetry.io/otel/sdk/trace"
    "go.opentelemetry.io/otel/sdk/trace/sampling"
)

sampler := sampling.ParentBased(
    sampling.TraceIDRatioBased(0.1), // Sample 10% of traces
)

tp := trace.NewTracerProvider(
    trace.WithSampler(sampler),
    // ... other options ...
)

ログイン後にコピー

Creating Custom OpenTelemetry Exporters

While we’ve been using Jaeger as our tracing backend, you might want to create a custom exporter for a different backend or for special processing:

type CustomExporter struct{}

func (e *CustomExporter) ExportSpans(ctx context.Context, spans []trace.ReadOnlySpan) error {
    for _, span := range spans {
        // Process or send the span data as needed
        fmt.Printf("Exporting span: %s\n", span.Name())
    }
    return nil
}

func (e *CustomExporter) Shutdown(ctx context.Context) error {
    // Cleanup logic here
    return nil
}

// Use the custom exporter:
exporter := &CustomExporter{}
tp := trace.NewTracerProvider(
    trace.WithBatcher(exporter),
    // ... other options ...
)

ログイン後にコピー

Integrating OpenTelemetry with Existing Monitoring Tools

OpenTelemetry can be integrated with many existing monitoring tools. For example, to send traces to both Jaeger and Zipkin:

jaegerExporter, _ := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint("http://jaeger:14268/api/traces")))
zipkinExporter, _ := zipkin.New("http://zipkin:9411/api/v2/spans")

tp := trace.NewTracerProvider(
    trace.WithBatcher(jaegerExporter),
    trace.WithBatcher(zipkinExporter),
    // ... other options ...
)

ログイン後にコピー

These advanced techniques will help you get the most out of OpenTelemetry in your order processing system.

In the next sections, we’ll cover performance considerations, testing and validation strategies, and discuss some challenges and considerations when implementing distributed tracing and logging at scale.

9. Performance Considerations

When implementing distributed tracing and logging, it’s crucial to consider the performance impact on your system. Let’s explore some strategies to optimize performance.

Optimizing Logging Performance in High-Throughput Systems

Use Asynchronous Logging : Implement a buffered, asynchronous logger to minimize the impact on request processing:

type AsyncLogger struct {
    ch chan *logrus.Entry
}

func NewAsyncLogger(bufferSize int) *AsyncLogger {
    logger := &AsyncLogger{
        ch: make(chan *logrus.Entry, bufferSize),
    }
    go logger.run()
    return logger
}

func (l *AsyncLogger) run() {
    for entry := range l.ch {
        entry.Logger.Out.Write(entry.Bytes())
    }
}

func (l *AsyncLogger) Log(entry *logrus.Entry) {
    select {
    case l.ch <- entry:
    default:
        // Buffer full, log dropped
    }
}

ログイン後にコピー

Log Sampling : For very high-throughput systems, consider sampling your logs:

func (l *AsyncLogger) SampledLog(entry *logrus.Entry, sampleRate float32) {
    if rand.Float32() < sampleRate {
        l.Log(entry)
    }
}

ログイン後にコピー

Managing the Performance Impact of Distributed Tracing

Use Sampling : Implement a sampling strategy to reduce the volume of traces:

sampler := trace.ParentBased(
    trace.TraceIDRatioBased(0.1), // Sample 10% of traces
)

tp := trace.NewTracerProvider(
    trace.WithSampler(sampler),
    // ... other options ...
)

ログイン後にコピー

Optimize Span Creation : Only create spans for significant operations to reduce overhead:

func ProcessOrder(ctx context.Context, order Order) error {
    ctx, span := tracer.Start(ctx, "ProcessOrder")
    defer span.End()

    // Don't create a span for this quick operation
    validateOrder(order)

    // Create a span for this potentially slow operation
    ctx, paymentSpan := tracer.Start(ctx, "ProcessPayment")
    err := processPayment(ctx, order)
    paymentSpan.End()

    if err != nil {
        return err
    }

    // ... rest of the function
}

ログイン後にコピー

Implementing Buffering and Batching for Trace and Log Export

Use the OpenTelemetry SDK’s built-in batching exporter to reduce the number of network calls:

exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint("http://jaeger:14268/api/traces")))
if err != nil {
    log.Fatalf("Failed to create Jaeger exporter: %v", err)
}

tp := trace.NewTracerProvider(
    trace.WithBatcher(exporter,
        trace.WithMaxExportBatchSize(100),
        trace.WithBatchTimeout(5 * time.Second),
    ),
    // ... other options ...
)

ログイン後にコピー

Scaling the ELK Stack for Large-Scale Systems

Use Index Lifecycle Management : Configure Elasticsearch to automatically manage index lifecycle:

PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "1d"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

ログイン後にコピー

Implement Elasticsearch Clustering : For large-scale systems, set up Elasticsearch in a multi-node cluster for better performance and reliability.

Implementing Caching Strategies for Frequently Accessed Logs and Traces

Use a caching layer like Redis to store frequently accessed logs and traces:

import (
    "github.com/go-redis/redis/v8"
)

func getCachedTrace(traceID string) (*Trace, error) {
    val, err := redisClient.Get(ctx, "trace:"+traceID).Bytes()
    if err == redis.Nil {
        // Trace not in cache, fetch from storage and cache it
        trace, err := fetchTraceFromStorage(traceID)
        if err != nil {
            return nil, err
        }
        redisClient.Set(ctx, "trace:"+traceID, trace, 1*time.Hour)
        return trace, nil
    } else if err != nil {
        return nil, err
    }
    var trace Trace
    json.Unmarshal(val, &trace)
    return &trace, nil
}

ログイン後にコピー

10. Testing and Validation

Proper testing and validation are crucial to ensure the reliability of your distributed tracing and logging implementation.

Unit Testing Trace Instrumentation

Use the OpenTelemetry testing package to unit test your trace instrumentation:

import (
    "testing"

    "go.opentelemetry.io/otel/sdk/trace/tracetest"
)

func TestProcessOrder(t *testing.T) {
    sr := tracetest.NewSpanRecorder()
    tp := trace.NewTracerProvider(trace.WithSpanProcessor(sr))
    otel.SetTracerProvider(tp)

    ctx := context.Background()
    err := ProcessOrder(ctx, Order{ID: "123"})
    if err != nil {
        t.Errorf("ProcessOrder failed: %v", err)
    }

    spans := sr.Ended()
    if len(spans) != 2 {
        t.Errorf("Expected 2 spans, got %d", len(spans))
    }
    if spans[0].Name() != "ProcessOrder" {
        t.Errorf("Expected span named 'ProcessOrder', got '%s'", spans[0].Name())
    }
    if spans[1].Name() != "ProcessPayment" {
        t.Errorf("Expected span named 'ProcessPayment', got '%s'", spans[1].Name())
    }
}

ログイン後にコピー

Integration Testing for the Complete Tracing Pipeline

Set up integration tests that cover your entire tracing pipeline:

func TestTracingPipeline(t *testing.T) {
    // Start a test Jaeger instance
    jaeger := startTestJaeger()
    defer jaeger.Stop()

    // Initialize your application with tracing
    app := initializeApp()

    // Perform some operations that should generate traces
    resp, err := app.CreateOrder(Order{ID: "123"})
    if err != nil {
        t.Fatalf("Failed to create order: %v", err)
    }

    // Wait for traces to be exported
    time.Sleep(5 * time.Second)

    // Query Jaeger for the trace
    traces, err := jaeger.QueryTraces(resp.TraceID)
    if err != nil {
        t.Fatalf("Failed to query traces: %v", err)
    }

    // Validate the trace
    validateTrace(t, traces[0])
}

ログイン後にコピー

Validating Log Parsing and Processing Rules

Test your Logstash configuration to ensure it correctly parses and processes logs:

input {
  generator {
    message => '{"timestamp":"2023-06-01T10:00:00Z","severity":"INFO","message":"Order created","order_id":"123","trace_id":"abc123"}'
    count => 1
  }
}

filter {
  json {
    source => "message"
  }
}

output {
  stdout { codec => rubydebug }
}

ログイン後にコピー

Run this configuration with logstash -f test_config.conf and verify the output.

Load Testing and Observing Tracing Overhead

Perform load tests to understand the performance impact of tracing:

func BenchmarkWithTracing(b *testing.B) {
    // Initialize tracing
    tp := initTracer()
    defer tp.Shutdown(context.Background())

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        ctx, span := tp.Tracer("benchmark").Start(context.Background(), "operation")
        performOperation(ctx)
        span.End()
    }
}

func BenchmarkWithoutTracing(b *testing.B) {
    for i := 0; i < b.N; i++ {
        performOperation(context.Background())
    }
}

ログイン後にコピー

Compare the results to understand the overhead introduced by tracing.

Implementing Trace and Log Monitoring for Quality Assurance

Set up monitoring for your tracing and logging systems:

Monitor trace export errors
Track log ingestion rates
Alert on sudden changes in trace or log volume
Monitor Elasticsearch, Logstash, and Kibana health

11. Challenges and Considerations

As you implement and scale your distributed tracing and logging system, keep these challenges and considerations in mind:

データの保持とストレージのコストの管理

コンプライアンス要件とストレージコストのバランスをとるデータ保持ポリシーを実装します
階層型ストレージソリューションを使用し、古いデータを安価なストレージオプションに移動します
データ保持戦略を定期的に見直し、最適化します

ログとトレースにおけるデータのプライバシーとコンプライアンスの確保

機密情報に対する堅牢なデータマスキングを実装します
忘れられる権利を含む、GDPR などの規制の遵守を確保します
ログとトレースを定期的に監査して、機密データが不用意に収集されていないことを確認します

トレースデータのバージョン管理と下位互換性の処理

トレースデータ形式にセマンティックバージョニングを使用する
可能であれば、下位互換性のある変更を実装します
重大な変更が必要な場合は、トレースデータのバージョンを管理し、移行期間中は複数のバージョンのサポートを維持します

分散トレースのタイムスタンプにおけるクロックスキューの処理

すべてのサービスで NTP などの時刻同期プロトコルを使用します
実時間に加えて論理クロックの使用を検討してください
トレース解析ツールに少量のクロックスキューに対する許容値を実装します

ELK スタックのアクセス制御とセキュリティの実装

Elasticsearch、Logstash、Kibana に強力な認証を使用する
さまざまなユーザータイプにロールベースのアクセス制御 (RBAC) を実装します
転送中および保存中のデータを暗号化します
ELK スタックのすべてのコンポーネントを定期的に更新し、パッチを適用します

12. 次のステップとパート 6 のプレビュー

この投稿では、注文処理システムの包括的な分散トレースとログについて説明しました。 OpenTelemetry を使用したトレースを実装し、ELK スタックを使用して集中ログを設定し、ログとトレースを関連付け、高度な技術と考慮事項を検討しました。

シリーズの次の最終回では、本番環境の準備とスケーラビリティに焦点を当てます。以下について説明します:

認証と認可の実装
構成管理の処理
レート制限とスロットルの実装
高い同時実行性を実現するための最適化
キャッシュ戦略の実装
水平スケーリングの準備
パフォーマンスのテストと最適化の実施

洗練された注文処理システムが大規模な本番環境で使用できるように最終仕上げを行っていきますので、ご期待ください!

助けが必要ですか?

困難な問題に直面していますか、それとも新しいアイデアやプロジェクトに関して外部の視点が必要ですか?お手伝いできます！大規模な投資を行う前にテクノロジーの概念実証を構築したい場合でも、難しい問題についてのガイダンスが必要な場合でも、私がお手伝いいたします。

提供されるサービス:

問題解決: 革新的なソリューションで複雑な問題に取り組みます。
コンサルティング: プロジェクトに関する専門家のアドバイスと新鮮な視点を提供します。
概念実証: アイデアをテストおよび検証するための予備モデルを開発します。

私と協力することに興味がある場合は、hungaikevin@gmail.com まで電子メールでご連絡ください。

課題をチャンスに変えましょう!

以上が注文処理システムの実装: 分散トレースとロギングの一部の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

このウェブサイトの声明

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

ホットAIツール

Undresser.AI Undress

リアルなヌード写真を作成する AI 搭載アプリ

AI Clothes Remover

写真から衣服を削除するオンライン AI ツール。

Undress AI Tool

脱衣画像を無料で

Clothoff.io

AI衣類リムーバー

Video Face Swap

完全無料の AI 顔交換ツールを使用して、あらゆるビデオの顔を簡単に交換できます。

ホットツール

メモ帳++7.3.1

使いやすく無料のコードエディター

SublimeText3 中国語版

中国語版、とても使いやすい

ゼンドスタジオ 13.0.1

強力な PHP 統合開発環境

ドリームウィーバー CS6

ビジュアル Web 開発ツール

SublimeText3 Mac版

神レベルのコード編集ソフト（SublimeText3）

ホットトピック

Java チュートリアル

1673

CakePHP チュートリアル

1429

Laravel チュートリアル

1333

PHP チュートリアル

1278

C# チュートリアル

1257

Related knowledge

Golang vs. Python：パフォーマンスとスケーラビリティ Apr 19, 2025 am 12:18 AM

Golangは、パフォーマンスとスケーラビリティの点でPythonよりも優れています。 1）Golangのコンピレーションタイプの特性と効率的な並行性モデルにより、高い並行性シナリオでうまく機能します。 2）Pythonは解釈された言語として、ゆっくりと実行されますが、Cythonなどのツールを介してパフォーマンスを最適化できます。

Golang and C：Concurrency vs. Raw Speed Apr 21, 2025 am 12:16 AM

Golangは並行性がCよりも優れていますが、Cは生の速度ではGolangよりも優れています。 1）Golangは、GoroutineとChannelを通じて効率的な並行性を達成します。これは、多数の同時タスクの処理に適しています。 2）Cコンパイラの最適化と標準ライブラリを介して、極端な最適化を必要とするアプリケーションに適したハードウェアに近い高性能を提供します。

ゴーを始めましょう：初心者のガイド Apr 26, 2025 am 12:21 AM

goisidealforforbeginnersandsutable forcloudnetworkservicesduetoitssimplicity、andconcurrencyfeatures.1）installgofromtheofficialwebsiteandverify with'goversion'.2）

Golang vs. C：パフォーマンスと速度の比較 Apr 21, 2025 am 12:13 AM

Golangは迅速な発展と同時シナリオに適しており、Cは極端なパフォーマンスと低レベルの制御が必要なシナリオに適しています。 1）Golangは、ごみ収集と並行機関のメカニズムを通じてパフォーマンスを向上させ、高配列Webサービス開発に適しています。 2）Cは、手動のメモリ管理とコンパイラの最適化を通じて究極のパフォーマンスを実現し、埋め込みシステム開発に適しています。

Golang vs. Python：重要な違いと類似点 Apr 17, 2025 am 12:15 AM

GolangとPythonにはそれぞれ独自の利点があります。Golangは高性能と同時プログラミングに適していますが、PythonはデータサイエンスとWeb開発に適しています。 Golangは同時性モデルと効率的なパフォーマンスで知られていますが、Pythonは簡潔な構文とリッチライブラリエコシステムで知られています。

GolangとC：パフォーマンスのトレードオフ Apr 17, 2025 am 12:18 AM

GolangとCのパフォーマンスの違いは、主にメモリ管理、コンピレーションの最適化、ランタイム効率に反映されています。 1）Golangのゴミ収集メカニズムは便利ですが、パフォーマンスに影響を与える可能性があります。

パフォーマンスレース：ゴラン対c Apr 16, 2025 am 12:07 AM

GolangとCにはそれぞれパフォーマンス競争において独自の利点があります。1）Golangは、高い並行性と迅速な発展に適しており、2）Cはより高いパフォーマンスと微細な制御を提供します。選択は、プロジェクトの要件とチームテクノロジースタックに基づいている必要があります。

Golang vs. Python：長所と短所 Apr 21, 2025 am 12:17 AM

GolangisidealforBuildingsCalables Systemsduetoitsefficiency andConcurrency、Whilepythonexcelsinquickscriptinganddataanalysisduetoitssimplicityand vastecosystem.golang'ssignencouragesclean、readisinediteNeditinesinedinediseNabletinedinedinedisedisedioncourase

See all articles

注文処理システムの実装: 分散トレースとロギングの一部

1. はじめにと目標

以前の投稿の要約

マイクロサービス アーキテクチャにおける分散トレースと分散ログの重要性

OpenTelemetry と ELK スタックの概要

シリーズのこのパートの目標

2. 理論的背景と概念

分散トレーシングの概要

OpenTelemetry プロジェクトとそのコンポーネントについて

分散システムにおけるロギングのベスト プラクティスの概要

ELK (Elasticsearch、Logstash、Kibana) スタックの概要

ログの集計と分析の概念

3. OpenTelemetry を使用した分散トレーシングの実装

Go サービスで OpenTelemetry をセットアップする

トレースを使用した注文処理ワークフローの計測化

サービス境界を越えたコンテキストの伝播

非同期操作とバックグラウンド ジョブの処理

OpenTelemetry とテンポラル ワークフローの統合

トレースをバックエンド (Jaeger など) にエクスポートする

4. ELK スタックを使用した集中ログのセットアップ

Elasticsearch のインストールと構成

Setting up Logstash for Log Ingestion and Processing

Configuring Kibana for Log Visualization

Implementing Structured Logging in our Go Services

Sending Logs from our Services to the ELK Stack

5. Correlating Logs, Traces, and Metrics

Implementing Correlation IDs Across Logs and Traces

Adding Trace IDs to Log Entries

Linking Metrics to Traces Using Exemplars

Creating a Unified View of System Behavior

6. Log Aggregation and Analysis

Designing Effective Log Aggregation Strategies

Implementing Log Sampling for High-Volume Services

Creating Kibana Dashboards for Log Analysis

Implementing Alerting Based on Log Patterns

Using Machine Learning for Anomaly Detection in Logs

7. Best Practices for Logging in a Microservices Architecture

Standardizing Log Formats Across Services

Implementing Contextual Logging

Handling Sensitive Information in Logs

Managing Log Retention and Rotation

Implementing Audit Logging for Compliance Requirements

8. Advanced OpenTelemetry Techniques

Implementing Custom Span Attributes and Events

Using OpenTelemetry’s Baggage for Cross-Cutting Concerns

Implementing Sampling Strategies for High-Volume Tracing

Creating Custom OpenTelemetry Exporters

Integrating OpenTelemetry with Existing Monitoring Tools

9. Performance Considerations

Optimizing Logging Performance in High-Throughput Systems

Managing the Performance Impact of Distributed Tracing

Implementing Buffering and Batching for Trace and Log Export

Scaling the ELK Stack for Large-Scale Systems

Implementing Caching Strategies for Frequently Accessed Logs and Traces

10. Testing and Validation

Unit Testing Trace Instrumentation

Integration Testing for the Complete Tracing Pipeline

Validating Log Parsing and Processing Rules

Load Testing and Observing Tracing Overhead

Implementing Trace and Log Monitoring for Quality Assurance

11. Challenges and Considerations

データの保持とストレージのコストの管理

ログとトレースにおけるデータのプライバシーとコンプライアンスの確保

トレース データのバージョン管理と下位互換性の処理

分散トレースのタイムスタンプにおけるクロック スキューの処理

ELK スタックのアクセス制御とセキュリティの実装

12. 次のステップとパート 6 のプレビュー

助けが必要ですか?

提供されるサービス:

ホットAIツール

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

人気の記事

ホットツール

メモ帳++7.3.1

SublimeText3 中国語版

ゼンドスタジオ 13.0.1

マイクロサービスアーキテクチャにおける分散トレースと分散ログの重要性

分散システムにおけるロギングのベストプラクティスの概要

非同期操作とバックグラウンドジョブの処理

OpenTelemetry とテンポラルワークフローの統合

トレースデータのバージョン管理と下位互換性の処理

分散トレースのタイムスタンプにおけるクロックスキューの処理