php editor Xiaoxin today introduces to you a powerful tool-OpenTelemetry, which can help developers achieve unified scope management in different services. In modern distributed systems, applications are often composed of multiple microservices, each with its own logs, metrics, and tracing information. OpenTelemetry provides a simple and powerful way to integrate and manage this information, allowing developers to better understand and debug the performance and behavior of the entire system. Whether in a local development environment or in a production environment, OpenTelemetry helps developers better understand and optimize their applications.
I just started using opentelemetry and created two (micro)services for it: standard and geomap.
The end user sends a request to the standard service, which in turn sends a request to geomap to obtain the information, which in turn returns the results to the end user. I use grpc for all communication.
I've done this detection on my function:
For standards:
type standardservice struct { pb.unimplementedstandardserviceserver } func (s *standardservice) getstandard(ctx context.context, in *pb.getstandardrequest) (*pb.getstandardresponse, error) { conn, _:= createclient(ctx, geomapsvcaddr) defer conn1.close() newctx, span1 := otel.tracer(name).start(ctx, "getstandard") defer span1.end() countryinfo, err := pb.newgeomapserviceclient(conn).getcountry(newctx, &pb.getcountryrequest{ name: in.name, }) //... return &pb.getstandardresponse{ standard: standard, }, nil } func createclient(ctx context.context, svcaddr string) (*grpc.clientconn, error) { return grpc.dialcontext(ctx, svcaddr, grpc.withtransportcredentials(insecure.newcredentials()), grpc.withunaryinterceptor(otelgrpc.unaryclientinterceptor()), ) }
For Geographic Map:
type geomapservice struct { pb.unimplementedgeomapserviceserver } func (s *geomapservice) getcountry(ctx context.context, in *pb.getcountryrequest) (*pb.getcountryresponse, error) { _, span := otel.tracer(name).start(ctx, "getcountry") defer span.end() span.setattributes(attribute.string("country", in.name)) span.addevent("retrieving country info") //... span.addevent("country info retrieved") return &pb.getcountryresponse{ country: &country, }, nil }
Both services are configured to send their spans to the jaeger backend and share almost the same main functionality (minor differences are noted in the comments):
const ( name = "mapedia" service = "geomap" //or standard environment = "production" id = 1 ) func tracerProvider(url string) (*tracesdk.TracerProvider, error) { // Create the Jaeger exporter exp, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url))) if err != nil { return nil, err } tp := tracesdk.NewTracerProvider( // Always be sure to batch in production. tracesdk.WithBatcher(exp), // Record information about this application in a Resource. tracesdk.WithResource(resource.NewWithAttributes( semconv.SchemaURL, semconv.ServiceName(service), attribute.String("environment", environment), attribute.Int64("ID", id), )), ) return tp, nil } func main() { tp, err := tracerProvider("http://localhost:14268/api/traces") if err != nil { log.Fatal(err) } defer func() { if err := tp.Shutdown(context.Background()); err != nil { log.Fatal(err) } }() otel.SetTracerProvider(tp) listener, err := net.Listen("tcp", ":"+port) if err != nil { panic(err) } s := grpc.NewServer( grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()), ) reflection.Register(s) pb.RegisterGeoMapServiceServer(s, &geomapService{}) // or pb.RegisterStandardServiceServer(s, &standardService{}) if err := s.Serve(listener); err != nil { log.Fatalf("Failed to serve: %v", err) } }
When I look at the trace generated by the end user's request to the standard service, I can see that it is, as expected, calling its geomap service:
However, I don't see any properties or events that have been added to the subrange (I added one property and 2 events when instrumenting geomap<'s < 的 getcountry function/em>) .
However, I noticed that these properties are available in another separate trace (available under the "geomap" service in jaeger) whose span ids are completely unrelated to the subspans in the standard service:
Now what I expect is to have a trace and see all properties/events related to the geomap in subscopes within the standard scope. How do I get the expected result from here?
The span context (containing the tracking id and span id as described in "service instrumentation & term") should be propagated from the parent span to the child span , so that they are part of the same trace.
Using opentelemetry, this is usually done automatically by instrumenting the code using plugins provided for various libraries, including grpc.
However, propagation doesn't seem to be working properly in your case.
In your code, you would start a new scope in the getstandard
function and then use that context (newctx
) when making the getcountry
request . This is correct because the new context should contain the span context of the parent span (getstandard
).
But the problem may be related to your createclient
function:
func createclient(ctx context.context, svcaddr string) (*grpc.clientconn, error) { return grpc.dialcontext(ctx, svcaddr, grpc.withtransportcredentials(insecure.newcredentials()), grpc.withunaryinterceptor(otelgrpc.unaryclientinterceptor()), ) }
You are correctly using otelgrpc.unaryclientinterceptor
here, which should ensure that the context is propagated correctly, but it is not clear when this function is called. If it is called before calling the getstandard
function, the context used to create the client will not contain the span context from getstandard
.
For testing, try to ensure that the client is created after calling the getstandard
function, and that the same context is used throughout the request.
You can do this by passing newctx
directly to the getcountry
function, as shown in a modified version of the getstandard
function:
func (s *standardservice) getstandard(ctx context.context, in *pb.getstandardrequest) (*pb.getstandardresponse, error) { newctx, span1 := otel.tracer(name).start(ctx, "getstandard") defer span1.end() conn, _:= createclient(newctx, geomapsvcaddr) defer conn.close() countryinfo, err := pb.newgeomapserviceclient(conn).getcountry(newctx, &pb.getcountryrequest{ name: in.name, }) //... return &pb.getstandardresponse{ standard: standard, }, nil }
The context used to create the client and make the getcountry
request will now include the span context from getstandard
and they should appear as part of the same trace in jaeger.
(As always, check for errors returned by functions such as createclient
and getcountry
, which are not shown here for brevity).
also:
Also check your propagator: make sure you use the same Context propagator a> In both services, the best is w3c tracecontextpropagator, which is opentelemetry in the default.
You can set the propagator explicitly as follows:
otel.settextmappropagator(propagation.tracecontext{})
Add the above lines to both services at the beginning of the main
function.
Make sure metadata is passed: The grpc interceptor should automatically inject/extract tracing context from the request's metadata, but double-check to make sure it's working properly.
After starting the span in the getcountry
function, you can log the tracking id and span id:
ctx, span := otel.tracer(name).start(ctx, "getcountry") sc := trace.spancontextfromcontext(ctx) log.printf("trace id: %s, span id: %s", sc.traceid(), sc.spanid()) defer span.end()
并在 getstandard
函数中执行相同的操作:
newCtx, span1 := otel.Tracer(name).Start(ctx, "GetStandard") sc := trace.SpanContextFromContext(newCtx) log.Printf("Trace ID: %s, Span ID: %s", sc.TraceID(), sc.SpanID()) defer span1.End()
如果上下文正确传播,两个服务中的跟踪 id 应该匹配。
The above is the detailed content of Unify the scope of different services using OpenTelemetry. For more information, please follow other related articles on the PHP Chinese website!