Unify the scope of different services using OpenTelemetry

WBOY
Release: 2024-02-13 23:00:11
forward
482 people have browsed it

使用 OpenTelemetry 统一不同服务的范围

php editor Xiaoxin today introduces to you a powerful tool-OpenTelemetry, which can help developers achieve unified scope management in different services. In modern distributed systems, applications are often composed of multiple microservices, each with its own logs, metrics, and tracing information. OpenTelemetry provides a simple and powerful way to integrate and manage this information, allowing developers to better understand and debug the performance and behavior of the entire system. Whether in a local development environment or in a production environment, OpenTelemetry helps developers better understand and optimize their applications.

Question content

I just started using opentelemetry and created two (micro)services for it: standard and geomap.

The end user sends a request to the standard service, which in turn sends a request to geomap to obtain the information, which in turn returns the results to the end user. I use grpc for all communication.

I've done this detection on my function:

For standards:

type standardservice struct {
    pb.unimplementedstandardserviceserver
}

func (s *standardservice) getstandard(ctx context.context, in *pb.getstandardrequest) (*pb.getstandardresponse, error) {

    conn, _:= createclient(ctx, geomapsvcaddr)
    defer conn1.close()

    newctx, span1 := otel.tracer(name).start(ctx, "getstandard")
    defer span1.end()

    countryinfo, err := pb.newgeomapserviceclient(conn).getcountry(newctx,
        &pb.getcountryrequest{
            name: in.name,
        })

    //...

    return &pb.getstandardresponse{
        standard: standard,
    }, nil

}

func createclient(ctx context.context, svcaddr string) (*grpc.clientconn, error) {
    return grpc.dialcontext(ctx, svcaddr,
        grpc.withtransportcredentials(insecure.newcredentials()),
        grpc.withunaryinterceptor(otelgrpc.unaryclientinterceptor()),
    )
}
Copy after login

For Geographic Map:

type geomapservice struct {
    pb.unimplementedgeomapserviceserver
}

func (s *geomapservice) getcountry(ctx context.context, in *pb.getcountryrequest) (*pb.getcountryresponse, error) {

    _, span := otel.tracer(name).start(ctx, "getcountry")
    defer span.end()

    span.setattributes(attribute.string("country", in.name))

    span.addevent("retrieving country info")

    //...
    
    span.addevent("country info retrieved")

    return &pb.getcountryresponse{
        country: &country,
    }, nil

}
Copy after login

Both services are configured to send their spans to the jaeger backend and share almost the same main functionality (minor differences are noted in the comments):

const (
    name        = "mapedia"
    service     = "geomap" //or standard
    environment = "production"
    id          = 1
)

func tracerProvider(url string) (*tracesdk.TracerProvider, error) {
    // Create the Jaeger exporter
    exp, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url)))
    if err != nil {
        return nil, err
    }
    tp := tracesdk.NewTracerProvider(
        // Always be sure to batch in production.
        tracesdk.WithBatcher(exp),
        // Record information about this application in a Resource.
        tracesdk.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(service),
            attribute.String("environment", environment),
            attribute.Int64("ID", id),
        )),
    )
    return tp, nil
}

func main() {

    tp, err := tracerProvider("http://localhost:14268/api/traces")
    if err != nil {
        log.Fatal(err)
    }

    defer func() {
        if err := tp.Shutdown(context.Background()); err != nil {
            log.Fatal(err)
        }
    }()
    otel.SetTracerProvider(tp)

    listener, err := net.Listen("tcp", ":"+port)
    if err != nil {
        panic(err)
    }

    s := grpc.NewServer(
        grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()),
    )
    reflection.Register(s)
    pb.RegisterGeoMapServiceServer(s, &geomapService{}) // or pb.RegisterStandardServiceServer(s, &standardService{})
    if err := s.Serve(listener); err != nil {
        log.Fatalf("Failed to serve: %v", err)
    }
}
Copy after login

When I look at the trace generated by the end user's request to the standard service, I can see that it is, as expected, calling its geomap service:

However, I don't see any properties or events that have been added to the subrange (I added one property and 2 events when instrumenting geomap<'s < 的 getcountry function/em>) .

However, I noticed that these properties are available in another separate trace (available under the "geomap" service in jaeger) whose span ids are completely unrelated to the subspans in the standard service:

Now what I expect is to have a trace and see all properties/events related to the geomap in subscopes within the standard scope. How do I get the expected result from here?

Workaround

The span context (containing the tracking id and span id as described in "service instrumentation & term") should be propagated from the parent span to the child span , so that they are part of the same trace.

Using opentelemetry, this is usually done automatically by instrumenting the code using plugins provided for various libraries, including grpc.
However, propagation doesn't seem to be working properly in your case.

In your code, you would start a new scope in the getstandard function and then use that context (newctx) when making the getcountry request . This is correct because the new context should contain the span context of the parent span (getstandard).
But the problem may be related to your createclient function:

func createclient(ctx context.context, svcaddr string) (*grpc.clientconn, error) {
    return grpc.dialcontext(ctx, svcaddr,
        grpc.withtransportcredentials(insecure.newcredentials()),
        grpc.withunaryinterceptor(otelgrpc.unaryclientinterceptor()),
    )
}
Copy after login

You are correctly using otelgrpc.unaryclientinterceptor here, which should ensure that the context is propagated correctly, but it is not clear when this function is called. If it is called before calling the getstandard function, the context used to create the client will not contain the span context from getstandard.

For testing, try to ensure that the client is created after calling the getstandard function, and that the same context is used throughout the request.

You can do this by passing newctx directly to the getcountry function, as shown in a modified version of the getstandard function:

func (s *standardservice) getstandard(ctx context.context, in *pb.getstandardrequest) (*pb.getstandardresponse, error) {
    newctx, span1 := otel.tracer(name).start(ctx, "getstandard")
    defer span1.end()

    conn, _:= createclient(newctx, geomapsvcaddr)
    defer conn.close()

    countryinfo, err := pb.newgeomapserviceclient(conn).getcountry(newctx,
        &pb.getcountryrequest{
            name: in.name,
        })

    //...

    return &pb.getstandardresponse{
        standard: standard,
    }, nil
}
Copy after login

The context used to create the client and make the getcountry request will now include the span context from getstandard and they should appear as part of the same trace in jaeger.

(As always, check for errors returned by functions such as createclient and getcountry, which are not shown here for brevity).

also:

  • Also check your propagator: make sure you use the same Context propagator In both services, the best is w3c tracecontextpropagator, which is opentelemetry in the default.

    You can set the propagator explicitly as follows:

    otel.settextmappropagator(propagation.tracecontext{})
    
    Copy after login

    Add the above lines to both services at the beginning of the main function.

  • Make sure metadata is passed: The grpc interceptor should automatically inject/extract tracing context from the request's metadata, but double-check to make sure it's working properly.

    After starting the span in the getcountry function, you can log the tracking id and span id:

    ctx, span := otel.tracer(name).start(ctx, "getcountry")
    sc := trace.spancontextfromcontext(ctx)
    log.printf("trace id: %s, span id: %s", sc.traceid(), sc.spanid())
    defer span.end()
    
    Copy after login

    并在 getstandard 函数中执行相同的操作:

    newCtx, span1 := otel.Tracer(name).Start(ctx, "GetStandard")
    sc := trace.SpanContextFromContext(newCtx)
    log.Printf("Trace ID: %s, Span ID: %s", sc.TraceID(), sc.SpanID())
    defer span1.End()
    
    Copy after login

    如果上下文正确传播,两个服务中的跟踪 id 应该匹配。

    The above is the detailed content of Unify the scope of different services using OpenTelemetry. For more information, please follow other related articles on the PHP Chinese website!

source:stackoverflow.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template