欢迎来到我们关于实施复杂订单处理系统系列的第五部分!在我们之前的文章中,我们涵盖了从设置基本架构到实施高级工作流程和全面监控的所有内容。今天,我们将深入探讨分布式跟踪和日志记录的世界,这是维护微服务架构中可观察性的两个关键组件。
在微服务架构中,单个用户请求通常跨越多个服务。这种分布式特性使得理解请求流并在出现问题时诊断问题变得困难。分布式跟踪和集中式日志记录通过提供以下功能来解决这些挑战:
为了实现分布式跟踪和日志记录,我们将使用两个强大的工具集:
OpenTelemetry:云原生软件的可观察性框架,提供一组 API、库、代理和收集器服务,用于从应用程序捕获分布式跟踪和指标。
ELK Stack:来自 Elastic 的三个开源产品(Elasticsearch、Logstash 和 Kibana)的集合,它们共同提供了一个用于日志摄取、存储和可视化的强大平台。
读完本文,您将能够:
让我们开始吧!
在开始实施之前,让我们回顾一些对于我们的分布式跟踪和日志记录设置至关重要的关键概念。
分布式跟踪是一种跟踪请求流经分布式系统中的各种服务的方法。它提供了一种了解请求的完整生命周期的方法,包括:
一条迹线通常由一个或多个跨度组成。跨度代表一个工作或操作单元。它跟踪请求进行的特定操作,记录操作何时开始和结束,以及其他数据。
OpenTelemetry 是云原生软件的可观察性框架。它提供一组 API、库、代理和收集器服务,用于从应用程序捕获分布式跟踪和指标。关键组件包括:
分布式系统中的有效日志记录需要仔细考虑:
ELK 堆栈是日志管理的流行选择:
日志聚合涉及从各种来源收集日志数据并将其存储在集中位置。这允许:
日志分析涉及从日志数据中提取有意义的见解,其中可以包括:
记住这些概念,让我们继续在我们的订单处理系统中实现分布式跟踪。
让我们首先使用 OpenTelemetry 在我们的订单处理系统中实现分布式跟踪。
首先,我们需要将 OpenTelemetry 添加到我们的 Go 服务中。将以下依赖项添加到您的 go.mod 文件中:
require ( go.opentelemetry.io/otel v1.7.0 go.opentelemetry.io/otel/exporters/jaeger v1.7.0 go.opentelemetry.io/otel/sdk v1.7.0 go.opentelemetry.io/otel/trace v1.7.0 )
接下来,让我们在主函数中设置一个跟踪器提供程序:
package main import ( "log" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/exporters/jaeger" "go.opentelemetry.io/otel/sdk/resource" tracesdk "go.opentelemetry.io/otel/sdk/trace" semconv "go.opentelemetry.io/otel/semconv/v1.4.0" ) func initTracer() func() { exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint("http://jaeger:14268/api/traces"))) if err != nil { log.Fatal(err) } tp := tracesdk.NewTracerProvider( tracesdk.WithBatcher(exporter), tracesdk.WithResource(resource.NewWithAttributes( semconv.SchemaURL, semconv.ServiceNameKey.String("order-processing-service"), attribute.String("environment", "production"), )), ) otel.SetTracerProvider(tp) return func() { if err := tp.Shutdown(context.Background()); err != nil { log.Printf("Error shutting down tracer provider: %v", err) } } } func main() { cleanup := initTracer() defer cleanup() // Rest of your main function... }
这会设置一个跟踪器提供程序,将跟踪导出到 Jaeger(一种流行的分布式跟踪后端)。
现在,让我们将跟踪添加到订单处理工作流程中。我们将从 CreateOrder 函数开始:
import ( "context" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/trace" ) func CreateOrder(ctx context.Context, order Order) error { tr := otel.Tracer("order-processing") ctx, span := tr.Start(ctx, "CreateOrder") defer span.End() span.SetAttributes(attribute.Int64("order.id", order.ID)) span.SetAttributes(attribute.Float64("order.total", order.Total)) // Validate order if err := validateOrder(ctx, order); err != nil { span.RecordError(err) span.SetStatus(codes.Error, "Order validation failed") return err } // Process payment if err := processPayment(ctx, order); err != nil { span.RecordError(err) span.SetStatus(codes.Error, "Payment processing failed") return err } // Update inventory if err := updateInventory(ctx, order); err != nil { span.RecordError(err) span.SetStatus(codes.Error, "Inventory update failed") return err } span.SetStatus(codes.Ok, "Order created successfully") return nil }
这会为 CreateOrder 函数创建一个新的跨度并添加相关属性。它还为流程中的每个主要步骤创建子跨度。
当调用其他服务时,我们需要传播跟踪上下文。以下是如何使用 HTTP 客户端执行此操作的示例:
import ( "net/http" "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp" ) func callExternalService(ctx context.Context, url string) error { client := http.Client{Transport: otelhttp.NewTransport(http.DefaultTransport)} req, err := http.NewRequestWithContext(ctx, "GET", url, nil) if err != nil { return err } _, err = client.Do(req) return err }
这使用 otelhttp 包自动传播 HTTP 标头中的跟踪上下文。
对于异步操作,我们需要确保正确传递跟踪上下文。这是使用工作池的示例:
func processOrderAsync(ctx context.Context, order Order) { tr := otel.Tracer("order-processing") ctx, span := tr.Start(ctx, "processOrderAsync") defer span.End() workerPool <- func() { processCtx := trace.ContextWithSpan(context.Background(), span) if err := processOrder(processCtx, order); err != nil { span.RecordError(err) span.SetStatus(codes.Error, "Async order processing failed") } else { span.SetStatus(codes.Ok, "Async order processing succeeded") } } }
这会为异步操作创建一个新的范围并将其传递给工作函数。
要将 OpenTelemetry 与 Temporal 工作流程集成,我们可以使用 go.opentelemetry.io/contrib/instrumentation/go.temporal.io/temporal/oteltemporalgrpc 包:
import ( "go.temporal.io/sdk/client" "go.temporal.io/sdk/worker" "go.opentelemetry.io/contrib/instrumentation/go.temporal.io/temporal/oteltemporalgrpc" ) func initTemporalClient() (client.Client, error) { return client.NewClient(client.Options{ HostPort: "temporal:7233", ConnectionOptions: client.ConnectionOptions{ DialOptions: []grpc.DialOption{ grpc.WithUnaryInterceptor(oteltemporalgrpc.UnaryClientInterceptor()), grpc.WithStreamInterceptor(oteltemporalgrpc.StreamClientInterceptor()), }, }, }) } func initTemporalWorker(c client.Client, taskQueue string) worker.Worker { w := worker.New(c, taskQueue, worker.Options{ WorkerInterceptors: []worker.WorkerInterceptor{ oteltemporalgrpc.WorkerInterceptor(), }, }) return w }
这将使用 OpenTelemetry 工具设置 Temporal 客户端和工作人员。
我们已经在 initTracer 函数中将 Jaeger 设置为跟踪后端。为了可视化我们的痕迹,我们需要将 Jaeger 添加到我们的 docker-compose.yml 中:
services: # ... other services ... jaeger: image: jaegertracing/all-in-one:1.35 ports: - "16686:16686" - "14268:14268" environment: - COLLECTOR_OTLP_ENABLED=true
现在您可以通过 http://localhost:16686 访问 Jaeger UI 来查看和分析您的痕迹。
在下一节中,我们将使用 ELK 堆栈设置集中式日志记录,以补充我们的分布式跟踪设置。
现在我们已经有了分布式跟踪,让我们使用 ELK(Elasticsearch、Logstash、Kibana)堆栈设置集中式日志记录。
首先,让我们将 Elasticsearch 添加到我们的 docker-compose.yml 中:
services: # ... other services ... elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0 environment: - discovery.type=single-node - "ES_JAVA_OPTS=-Xms512m -Xmx512m" ports: - "9200:9200" volumes: - elasticsearch_data:/usr/share/elasticsearch/data volumes: elasticsearch_data: driver: local
这会设置一个单节点 Elasticsearch 实例以用于开发目的。
Next, let’s add Logstash to our docker-compose.yml:
services: # ... other services ... logstash: image: docker.elastic.co/logstash/logstash:7.14.0 volumes: - ./logstash/pipeline:/usr/share/logstash/pipeline ports: - "5000:5000/tcp" - "5000:5000/udp" - "9600:9600" depends_on: - elasticsearch
Create a Logstash pipeline configuration file at ./logstash/pipeline/logstash.conf:
input { tcp { port => 5000 codec => json } } filter { if [trace_id] { mutate { add_field => { "[@metadata][trace_id]" => "%{trace_id}" } } } } output { elasticsearch { hosts => ["elasticsearch:9200"] index => "order-processing-logs-%{+YYYY.MM.dd}" } }
This configuration sets up Logstash to receive JSON logs over TCP, process them, and forward them to Elasticsearch.
Now, let’s add Kibana to our docker-compose.yml:
services: # ... other services ... kibana: image: docker.elastic.co/kibana/kibana:7.14.0 ports: - "5601:5601" environment: ELASTICSEARCH_URL: http://elasticsearch:9200 ELASTICSEARCH_HOSTS: '["http://elasticsearch:9200"]' depends_on: - elasticsearch
You can access the Kibana UI at http://localhost:5601 once it’s up and running.
To send structured logs to Logstash, we’ll use the logrus library. First, add it to your go.mod:
go get github.com/sirupsen/logrus
Now, let’s set up a logger in our main function:
import ( "github.com/sirupsen/logrus" "gopkg.in/sohlich/elogrus.v7" ) func initLogger() *logrus.Logger { log := logrus.New() log.SetFormatter(&logrus.JSONFormatter{}) hook, err := elogrus.NewElasticHook("elasticsearch:9200", "warning", "order-processing-logs") if err != nil { log.Fatalf("Failed to create Elasticsearch hook: %v", err) } log.AddHook(hook) return log } func main() { log := initLogger() // Rest of your main function... }
This sets up a JSON formatter for our logs and adds an Elasticsearch hook to send logs directly to Elasticsearch.
Now, let’s update our CreateOrder function to use structured logging:
func CreateOrder(ctx context.Context, order Order) error { tr := otel.Tracer("order-processing") ctx, span := tr.Start(ctx, "CreateOrder") defer span.End() logger := logrus.WithFields(logrus.Fields{ "order_id": order.ID, "trace_id": span.SpanContext().TraceID().String(), }) logger.Info("Starting order creation") // Validate order if err := validateOrder(ctx, order); err != nil { logger.WithError(err).Error("Order validation failed") span.RecordError(err) span.SetStatus(codes.Error, "Order validation failed") return err } // Process payment if err := processPayment(ctx, order); err != nil { logger.WithError(err).Error("Payment processing failed") span.RecordError(err) span.SetStatus(codes.Error, "Payment processing failed") return err } // Update inventory if err := updateInventory(ctx, order); err != nil { logger.WithError(err).Error("Inventory update failed") span.RecordError(err) span.SetStatus(codes.Error, "Inventory update failed") return err } logger.Info("Order created successfully") span.SetStatus(codes.Ok, "Order created successfully") return nil }
This code logs each step of the order creation process, including any errors that occur. It also includes the trace ID in each log entry, which will be crucial for correlating logs with traces.
Now that we have both distributed tracing and centralized logging set up, let’s explore how to correlate this information for a unified view of system behavior.
We’ve already included the trace ID in our log entries. To make this correlation even more powerful, we can add a custom field to our spans that includes the log index:
span.SetAttributes(attribute.String("log.index", "order-processing-logs-"+time.Now().Format("2006.01.02")))
This allows us to easily jump from a span in Jaeger to the corresponding logs in Kibana.
We’ve already added trace IDs to our log entries in the previous section. This allows us to search for all log entries related to a particular trace in Kibana.
To link our Prometheus metrics to traces, we can use exemplars. Here’s an example of how to do this:
import ( "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto" "go.opentelemetry.io/otel/trace" ) var ( orderProcessingDuration = promauto.NewHistogramVec( prometheus.HistogramOpts{ Name: "order_processing_duration_seconds", Help: "Duration of order processing in seconds", Buckets: prometheus.DefBuckets, }, []string{"status"}, ) ) func CreateOrder(ctx context.Context, order Order) error { // ... existing code ... start := time.Now() // ... process order ... duration := time.Since(start) orderProcessingDuration.WithLabelValues("success").Observe(duration.Seconds(), prometheus.Labels{ "trace_id": span.SpanContext().TraceID().String(), }) // ... rest of the function ... }
This adds the trace ID as an exemplar to our order processing duration metric.
With logs, traces, and metrics all correlated, we can create a unified view of our system’s behavior:
This allows you to seamlessly navigate between metrics, traces, and logs, providing a comprehensive view of your system’s behavior and making it easier to debug issues.
With our logs centralized in Elasticsearch, let’s explore some strategies for effective log aggregation and analysis.
For high-volume services, logging every event can be prohibitively expensive. Implement log sampling to reduce the volume while still maintaining visibility:
func shouldLog() bool { return rand.Float32() < 0.1 // Log 10% of events } func CreateOrder(ctx context.Context, order Order) error { // ... existing code ... if shouldLog() { logger.Info("Order created successfully") } // ... rest of the function ... }
In Kibana, create dashboards that provide insights into your system’s behavior. Some useful visualizations might include:
Use Kibana’s alerting features to set up alerts based on log patterns. For example:
Elasticsearch provides machine learning capabilities that can be used for anomaly detection in logs. You can set up machine learning jobs in Kibana to detect:
These machine learning insights can help you identify issues before they become critical problems.
In the next sections, we’ll cover best practices for logging in a microservices architecture and explore some advanced OpenTelemetry techniques.
When implementing logging in a microservices architecture, there are several best practices to keep in mind to ensure your logs are useful, manageable, and secure.
Consistency in log formats across all your services is crucial for effective log analysis. In our Go services, we can create a custom logger that enforces a standard format:
import ( "github.com/sirupsen/logrus" ) type StandardLogger struct { *logrus.Logger ServiceName string } func NewStandardLogger(serviceName string) *StandardLogger { logger := logrus.New() logger.SetFormatter(&logrus.JSONFormatter{ FieldMap: logrus.FieldMap{ logrus.FieldKeyTime: "timestamp", logrus.FieldKeyLevel: "severity", logrus.FieldKeyMsg: "message", }, }) return &StandardLogger{ Logger: logger, ServiceName: serviceName, } } func (l *StandardLogger) WithFields(fields logrus.Fields) *logrus.Entry { return l.Logger.WithFields(logrus.Fields{ "service": l.ServiceName, }).WithFields(fields) }
This logger ensures that all log entries include a “service” field and use consistent field names.
Contextual logging involves including relevant context with each log entry. In a microservices architecture, this often means including a request ID or trace ID that can be used to correlate logs across services:
func CreateOrder(ctx context.Context, logger *StandardLogger, order Order) error { tr := otel.Tracer("order-processing") ctx, span := tr.Start(ctx, "CreateOrder") defer span.End() logger := logger.WithFields(logrus.Fields{ "order_id": order.ID, "trace_id": span.SpanContext().TraceID().String(), }) logger.Info("Starting order creation") // ... rest of the function ... }
It’s crucial to ensure that sensitive information, such as personal data or credentials, is not logged. You can create a custom log hook to redact sensitive information:
type SensitiveDataHook struct{} func (h *SensitiveDataHook) Levels() []logrus.Level { return logrus.AllLevels } func (h *SensitiveDataHook) Fire(entry *logrus.Entry) error { if entry.Data["credit_card"] != nil { entry.Data["credit_card"] = "REDACTED" } return nil } // In your main function: logger.AddHook(&SensitiveDataHook{})
In a production environment, you need to manage log retention and rotation to control storage costs and comply with data retention policies. While Elasticsearch can handle this to some extent, you might also want to implement log rotation at the application level:
import ( "gopkg.in/natefinch/lumberjack.v2" ) func initLogger() *logrus.Logger { logger := logrus.New() logger.SetOutput(&lumberjack.Logger{ Filename: "/var/log/myapp.log", MaxSize: 100, // megabytes MaxBackups: 3, MaxAge: 28, //days Compress: true, }) return logger }
For certain operations, you may need to maintain an audit trail for compliance reasons. You can create a separate audit logger for this purpose:
type AuditLogger struct { logger *logrus.Logger } func NewAuditLogger() *AuditLogger { logger := logrus.New() logger.SetFormatter(&logrus.JSONFormatter{}) // Set up a separate output for audit logs // This could be a different file, database, or even a separate Elasticsearch index return &AuditLogger{logger: logger} } func (a *AuditLogger) LogAuditEvent(ctx context.Context, event string, details map[string]interface{}) { span := trace.SpanFromContext(ctx) a.logger.WithFields(logrus.Fields{ "event": event, "trace_id": span.SpanContext().TraceID().String(), "details": details, }).Info("Audit event") } // Usage: auditLogger.LogAuditEvent(ctx, "OrderCreated", map[string]interface{}{ "order_id": order.ID, "user_id": order.UserID, })
Now that we have a solid foundation for distributed tracing, let’s explore some advanced techniques to get even more value from OpenTelemetry.
Custom span attributes and events can provide additional context to your traces:
func ProcessPayment(ctx context.Context, order Order) error { _, span := otel.Tracer("payment-service").Start(ctx, "ProcessPayment") defer span.End() span.SetAttributes( attribute.String("payment.method", order.PaymentMethod), attribute.Float64("payment.amount", order.Total), ) // Process payment... if paymentSuccessful { span.AddEvent("PaymentProcessed", trace.WithAttributes( attribute.String("transaction_id", transactionID), )) } else { span.AddEvent("PaymentFailed", trace.WithAttributes( attribute.String("error", "Insufficient funds"), )) } return nil }
Baggage allows you to propagate key-value pairs across service boundaries:
import ( "go.opentelemetry.io/otel/baggage" ) func AddUserInfoToBaggage(ctx context.Context, userID string) context.Context { b, _ := baggage.Parse(fmt.Sprintf("user_id=%s", userID)) return baggage.ContextWithBaggage(ctx, b) } func GetUserIDFromBaggage(ctx context.Context) string { if b := baggage.FromContext(ctx); b != nil { if v := b.Member("user_id"); v.Key() != "" { return v.Value() } } return "" }
For high-volume services, tracing every request can be expensive. Implement a sampling strategy to reduce the volume while still maintaining visibility:
import ( "go.opentelemetry.io/otel/sdk/trace" "go.opentelemetry.io/otel/sdk/trace/sampling" ) sampler := sampling.ParentBased( sampling.TraceIDRatioBased(0.1), // Sample 10% of traces ) tp := trace.NewTracerProvider( trace.WithSampler(sampler), // ... other options ... )
While we’ve been using Jaeger as our tracing backend, you might want to create a custom exporter for a different backend or for special processing:
type CustomExporter struct{} func (e *CustomExporter) ExportSpans(ctx context.Context, spans []trace.ReadOnlySpan) error { for _, span := range spans { // Process or send the span data as needed fmt.Printf("Exporting span: %s\n", span.Name()) } return nil } func (e *CustomExporter) Shutdown(ctx context.Context) error { // Cleanup logic here return nil } // Use the custom exporter: exporter := &CustomExporter{} tp := trace.NewTracerProvider( trace.WithBatcher(exporter), // ... other options ... )
OpenTelemetry can be integrated with many existing monitoring tools. For example, to send traces to both Jaeger and Zipkin:
jaegerExporter, _ := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint("http://jaeger:14268/api/traces"))) zipkinExporter, _ := zipkin.New("http://zipkin:9411/api/v2/spans") tp := trace.NewTracerProvider( trace.WithBatcher(jaegerExporter), trace.WithBatcher(zipkinExporter), // ... other options ... )
These advanced techniques will help you get the most out of OpenTelemetry in your order processing system.
In the next sections, we’ll cover performance considerations, testing and validation strategies, and discuss some challenges and considerations when implementing distributed tracing and logging at scale.
When implementing distributed tracing and logging, it’s crucial to consider the performance impact on your system. Let’s explore some strategies to optimize performance.
type AsyncLogger struct { ch chan *logrus.Entry } func NewAsyncLogger(bufferSize int) *AsyncLogger { logger := &AsyncLogger{ ch: make(chan *logrus.Entry, bufferSize), } go logger.run() return logger } func (l *AsyncLogger) run() { for entry := range l.ch { entry.Logger.Out.Write(entry.Bytes()) } } func (l *AsyncLogger) Log(entry *logrus.Entry) { select { case l.ch <- entry: default: // Buffer full, log dropped } }
func (l *AsyncLogger) SampledLog(entry *logrus.Entry, sampleRate float32) { if rand.Float32() < sampleRate { l.Log(entry) } }
sampler := trace.ParentBased( trace.TraceIDRatioBased(0.1), // Sample 10% of traces ) tp := trace.NewTracerProvider( trace.WithSampler(sampler), // ... other options ... )
func ProcessOrder(ctx context.Context, order Order) error { ctx, span := tracer.Start(ctx, "ProcessOrder") defer span.End() // Don't create a span for this quick operation validateOrder(order) // Create a span for this potentially slow operation ctx, paymentSpan := tracer.Start(ctx, "ProcessPayment") err := processPayment(ctx, order) paymentSpan.End() if err != nil { return err } // ... rest of the function }
Use the OpenTelemetry SDK’s built-in batching exporter to reduce the number of network calls:
exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint("http://jaeger:14268/api/traces"))) if err != nil { log.Fatalf("Failed to create Jaeger exporter: %v", err) } tp := trace.NewTracerProvider( trace.WithBatcher(exporter, trace.WithMaxExportBatchSize(100), trace.WithBatchTimeout(5 * time.Second), ), // ... other options ... )
PUT _ilm/policy/logs_policy { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "50GB", "max_age": "1d" } } }, "delete": { "min_age": "30d", "actions": { "delete": {} } } } } }
Use a caching layer like Redis to store frequently accessed logs and traces:
import ( "github.com/go-redis/redis/v8" ) func getCachedTrace(traceID string) (*Trace, error) { val, err := redisClient.Get(ctx, "trace:"+traceID).Bytes() if err == redis.Nil { // Trace not in cache, fetch from storage and cache it trace, err := fetchTraceFromStorage(traceID) if err != nil { return nil, err } redisClient.Set(ctx, "trace:"+traceID, trace, 1*time.Hour) return trace, nil } else if err != nil { return nil, err } var trace Trace json.Unmarshal(val, &trace) return &trace, nil }
Proper testing and validation are crucial to ensure the reliability of your distributed tracing and logging implementation.
Use the OpenTelemetry testing package to unit test your trace instrumentation:
import ( "testing" "go.opentelemetry.io/otel/sdk/trace/tracetest" ) func TestProcessOrder(t *testing.T) { sr := tracetest.NewSpanRecorder() tp := trace.NewTracerProvider(trace.WithSpanProcessor(sr)) otel.SetTracerProvider(tp) ctx := context.Background() err := ProcessOrder(ctx, Order{ID: "123"}) if err != nil { t.Errorf("ProcessOrder failed: %v", err) } spans := sr.Ended() if len(spans) != 2 { t.Errorf("Expected 2 spans, got %d", len(spans)) } if spans[0].Name() != "ProcessOrder" { t.Errorf("Expected span named 'ProcessOrder', got '%s'", spans[0].Name()) } if spans[1].Name() != "ProcessPayment" { t.Errorf("Expected span named 'ProcessPayment', got '%s'", spans[1].Name()) } }
Set up integration tests that cover your entire tracing pipeline:
func TestTracingPipeline(t *testing.T) { // Start a test Jaeger instance jaeger := startTestJaeger() defer jaeger.Stop() // Initialize your application with tracing app := initializeApp() // Perform some operations that should generate traces resp, err := app.CreateOrder(Order{ID: "123"}) if err != nil { t.Fatalf("Failed to create order: %v", err) } // Wait for traces to be exported time.Sleep(5 * time.Second) // Query Jaeger for the trace traces, err := jaeger.QueryTraces(resp.TraceID) if err != nil { t.Fatalf("Failed to query traces: %v", err) } // Validate the trace validateTrace(t, traces[0]) }
Test your Logstash configuration to ensure it correctly parses and processes logs:
input { generator { message => '{"timestamp":"2023-06-01T10:00:00Z","severity":"INFO","message":"Order created","order_id":"123","trace_id":"abc123"}' count => 1 } } filter { json { source => "message" } } output { stdout { codec => rubydebug } }
Run this configuration with logstash -f test_config.conf and verify the output.
Perform load tests to understand the performance impact of tracing:
func BenchmarkWithTracing(b *testing.B) { // Initialize tracing tp := initTracer() defer tp.Shutdown(context.Background()) b.ResetTimer() for i := 0; i < b.N; i++ { ctx, span := tp.Tracer("benchmark").Start(context.Background(), "operation") performOperation(ctx) span.End() } } func BenchmarkWithoutTracing(b *testing.B) { for i := 0; i < b.N; i++ { performOperation(context.Background()) } }
Compare the results to understand the overhead introduced by tracing.
Set up monitoring for your tracing and logging systems:
As you implement and scale your distributed tracing and logging system, keep these challenges and considerations in mind:
在这篇文章中,我们介绍了订单处理系统的全面分布式跟踪和日志记录。我们使用 OpenTelemetry 实现了跟踪,使用 ELK 堆栈设置集中式日志记录、关联日志和跟踪,并探索了高级技术和注意事项。
在我们系列的下一部分也是最后一部分,我们将重点关注生产就绪性和可扩展性。我们将介绍:
请继续关注我们对复杂的订单处理系统进行最后的修饰,确保其准备好大规模生产使用!
您是否面临着具有挑战性的问题,或者需要外部视角来看待新想法或项目?我可以帮忙!无论您是想在进行更大投资之前建立技术概念验证,还是需要解决困难问题的指导,我都会为您提供帮助。
如果您有兴趣与我合作,请通过电子邮件与我联系:hungaikevin@gmail.com。
让我们将您的挑战转化为机遇!
以上是实现订单处理系统:部分分布式跟踪和日志记录的详细内容。更多信息请关注PHP中文网其他相关文章!