Tracing
Summary
Pomerium has comprehensive support for OpenTelemetry (OTel) tracing, allowing detailed introspection into requests and authorization flows. You can use tracing to debug errors and latency issues in your applications.
Configuration
- Core
- Ingress Controller
- Enterprise
- Zero
Environment Variables
The recommended way to configure tracing is by using the standard OpenTelemetry environment variables:
The following environment variables are supported in Pomerium:
Name | Description | Type | Default |
---|---|---|---|
OTEL_TRACES_EXPORTER | Trace exporter to be used. Valid values are "otlp" or "none" . | Enum | "none" |
OTEL_EXPORTER_OTLP_ENDPOINT or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT | An OTLP/gRPC or OTLP/HTTP base endpoint URL with optional port. Example: http://localhost:4318 See OTLP Endpoint Configuration for more details. | String | |
OTEL_EXPORTER_OTLP_PROTOCOL or OTEL_EXPORTER_OTLP_TRACES_PROTOCOL | Valid values are "grpc" or "http/protobuf" . If unset, Pomerium will attempt to determine the protocol based on the endpoint port number (the standard ports are 4317 for GRPC, 4318 for HTTP), otherwise it will default to "http/protobuf" See OTLP Exporter Configuration for more details. | String | |
OTEL_EXPORTER_OTLP_HEADERS or OTEL_EXPORTER_OTLP_TRACES_HEADERS | Comma-separated key=value pairs containing HTTP headers to add when exporting trace data. Example: Authentication=Bearer xxx,Foo=Bar See OTLP Endpoint Configuration for more details. | String | |
OTEL_EXPORTER_OTLP_TIMEOUT or OTEL_EXPORTER_OTLP_TRACES_TIMEOUT | Export request timeout duration in milliseconds. See OTLP Endpoint Configuration for more details. | Duration | 10000 (ms) |
OTEL_TRACES_SAMPLER_ARG | Sampling probability, a number in the [0..1] range. Examples: 1.0 (sample all traces), 0.25 (sample 25% of traces). | Float | 1.0 |
OTEL_RESOURCE_ATTRIBUTES | Comma-separated key=value pairs of additional resource attributes. Example: key1=value1,key2=value2 | String | |
OTEL_BSP_SCHEDULE_DELAY | Interval in milliseconds at which pending spans (if any) are exported. Minimum 100 (ms). | Duration | 5000 (ms) |
OTEL_BSP_MAX_EXPORT_BATCH_SIZE | Number of pending spans needed to automatically trigger an export. Note that spans are still exported on a fixed interval, even if this limit is not reached. Minimum 1. | Integer | 512 |
OTEL_LOG_LEVEL | OTel SDK internal log level. | Enum | "info" |
OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT | Maximum allowed attribute value size. Increase this to prevent long URLs in span attributes from being truncated. | Integer | 256 |
Config file
Tracing can also be configured using the Pomerium config file if desired. The field names in the config file are identical to the environment variable names, but in lowercase.
Config Key | Equivalent Environment Variable | Type |
---|---|---|
otel_traces_exporter | OTEL_TRACES_EXPORTER | string |
otel_exporter_otlp_endpoint | OTEL_EXPORTER_OTLP_ENDPOINT | string |
otel_exporter_otlp_traces_endpoint | OTEL_EXPORTER_OTLP_TRACES_ENDPOINT | string |
otel_exporter_otlp_protocol | OTEL_EXPORTER_OTLP_PROTOCOL | string |
otel_exporter_otlp_traces_protocol | OTEL_EXPORTER_OTLP_TRACES_PROTOCOL | string |
otel_exporter_otlp_headers | OTEL_EXPORTER_OTLP_HEADERS | []string |
otel_exporter_otlp_traces_headers | OTEL_EXPORTER_OTLP_TRACES_HEADERS | []string |
otel_exporter_otlp_timeout | OTEL_EXPORTER_OTLP_TIMEOUT | int64 |
otel_exporter_otlp_traces_timeout | OTEL_EXPORTER_OTLP_TRACES_TIMEOUT | int64 |
otel_traces_sampler_arg | OTEL_TRACES_SAMPLER_ARG | float64 |
otel_resource_attributes | OTEL_RESOURCE_ATTRIBUTES | []string |
otel_bsp_schedule_delay | OTEL_BSP_SCHEDULE_DELAY | int64 |
otel_bsp_max_export_batch_size | OTEL_BSP_MAX_EXPORT_BATCH_SIZE | int32 |
otel_log_level | OTEL_LOG_LEVEL | string |
otel_attribute_value_length_limit | OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT | int32 |
OpenTelemetry tracing options can be configured in the Pomerium
CRD, for example:
apiVersion: ingress.pomerium.io/v1
kind: Pomerium
metadata:
name: example
spec:
...
otel:
endpoint: http://jaeger-collector:4317
protocol: grpc
headers:
key: value
timeout: 10s
sampling: 1.0
resourceAttributes:
key: value
bspScheduleDelay: 5s
bspMaxExportBatchSize: 512
logLevel: info
See the CRD Definition for additional details.
-
In the Enterprise Console, navigate to Settings > Tracing
-
In the "Tracing Provider" dropdown, select "OTLP"
-
Enter your desired sample rate and OTLP endpoint
-
Optionally, enter a protocol ("grpc" or "http/protobuf"). If the endpoint uses port 4317 or 4318, the protocol will be selected automatically. Port 4317 is the standard for OTLP GRPC, and 4318 for OTLP HTTP.
-
In the Zero dashboard, navigate to Settings > Tracing
-
In the "Tracing Provider" dropdown, select "OTLP"
-
Enter your desired sample rate and OTLP endpoint
-
Optionally, enter a protocol ("grpc" or "http/protobuf"). If the endpoint uses port 4317 or 4318, the protocol will be selected automatically. Port 4317 is the standard for OTLP GRPC, and 4318 for OTLP HTTP.
Examples
Using Jaeger to visualize trace data
Jaeger is a popular open-source tracing platform. It can be used to collect trace data and visualize it in the browser.
- Run Jaeger in all-in-one mode with Docker:
$ docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/jaeger:latest
- Run Pomerium with OpenTelemetry environment variables set:
$ OTEL_TRACES_EXPORTER=otlp OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 pomerium --config path/to/your/config.yaml`
-
Navigate to a Pomerium route defined in the config file
-
Open your browser to http://localhost:16686 to view traces in the Jaeger UI.
Tracing errors
A typo in the OAuth2 issuer URL configuration is a common mistake that can lead to unexpected errors. A user attempting to navigate to a Pomerium route that requires authentication might see an error page instead of being redirected to the Identity Provider. In the Jaeger UI, traces that contain errors are highlighted and easy to find:
Clicking on this trace will show us the original unauthenticated request (GET https://verify.localhost.pomerium.io/
) and that it was redirected to sign in. When attempting to initiate the auth flow, an error was encountered, which was recorded in the trace:
Clicking on the span that recorded the error will show the error message - we are missing a trailing slash in the issuer URL!
Tracing upstream applications
If upstream applications also have OpenTelemetry support, traces will propagate through Pomerium to those applications and the combined trace data will be visible.
Example: Grafana
Grafana is a good example of an upstream application that has tracing support and is easily integrated with Pomerium.
The Securing Grafana with Pomerium guide can help you get started with a new Grafana deployment.
To enable OpenTelemetry traces in Grafana, set the environment variable GF_TRACING_OPENTELEMETRY_OTLP_ADDRESS
to the same ip:port (without scheme) as the OTLP endpoint configured in Pomerium.
Alternatively, this can be set in the Grafana config file:
# grafana.ini
[tracing.opentelemetry.otlp]
address = x.x.x.x:4317
Note that at the time of writing, Grafana only supports exporting OTLP traces with the GRPC protocol. Most tracing backends, like Jaeger or the OTel Collector, support both protocols however.
With tracing enabled in both Pomerium and Grafana, navigate to your Grafana route. After a few seconds, the combined traces should be visible in Jaeger:
The bottom trace (occurred first) is the initial unauthenticated request to Pomerium. The top trace is the authenticated request, after the user signed in and was redirected. This trace includes spans exported by Grafana itself, which we can see in detail:
Grafana exports very detailed traces, which can be helpful in debugging complex issues. The combined trace data helps easily visualize the request flow between Pomerium and Grafana, or any other upstream application.
Visualizing the Pomerium auth flow
Pomerium can trace a request's entire journey through the authentication process, across multiple individual redirects between Pomerium services and the Identity Provider.
For example, this trace shows an unauthenticated request (GET https://verify.localhost.pomerium.io/
) that triggered a sequence of redirects to perform the auth flow:
The trace above ends with a final redirect to repeat the original request, but this time the user is authenticated:
This trace ends with the proxied request to the upstream server.