Observe - Canton Network Docs

This section describes observability features of PQS, which are designed to help you monitor health and performance of the application.

Approach to observability

PQS opted to incorporate OpenTelemetry APIs to provide its observability features. All three sources of signals (traces, metrics, and logs) can be exported to various backends by providing appropriate configuration defined by OpenTelemetry protocols and guidelines. This makes PQS flexible in terms of observability backends, allowing users to choose what fits their needs and established infrastructure without being overly prescriptive. To have PQS emit observability data, an OpenTelemetry Java Agent must be attached to the JVM running PQS. OpenTelemetry’s documentation page on Java Agent Configuration¹ has all the necessary information to get started. As a frequently requested shortcut (only metrics over Prometheus exposition endpoint embedded by PQS), the following snippet can help you get started. For more details, refer to the official documentation:

$ export OTEL_SERVICE_NAME=pqs
$ export OTEL_TRACES_EXPORTER=none
$ export OTEL_LOGS_EXPORTER=none
$ export OTEL_METRICS_EXPORTER=prometheus
$ export OTEL_EXPORTER_PROMETHEUS_PORT=9090
$ export JDK_JAVA_OPTIONS="-javaagent:path/to/opentelemetry-javaagent.jar"
$ ./scribe.jar pipeline ledger postgres-document ...

PQS Docker images already come pre-configured this way, but users are free to override these values as they see fit for their environments.

Logging

Log level

Set log level with --logger-level. Possible value are All, Fatal, Error, Warning, Info (default), Debug, Trace:

--logger-level=Debug

Per-logger log level

Use --logger-mappings to adjust the log level for individual loggers. For example, to remove Netty network traffic from a more detailed overall log:

--logger-mappings-io.netty=Warning \
--logger-mappings-io.grpc.netty=Trace

Log pattern

With --logger-pattern, use one of the predefined patterns, such as Plain (default), Standard (standard format used in DA applications), Structured, or set your own. Check Log Format Configuration² for more details. To use your custom format, provide its string representation, such as:

--logger-pattern="%highlight{%fixed{1}{%level}} [%fiberId] %name:%line %highlight{%message} %highlight{%cause} %kvs"

Log format for console output

Use --logger-format to set the log format. Possible values are Plain (default) or Json. These formats can be used for the pipeline command.

Log format for file output

Use --logger-format to set the log format. Possible values are Plain (default), Json, PlainAsync and JsonAsync. They can be used for the interactive commands, such as prune. For PlainAsync and JsonAsync, log entries are written to the destination file asynchronously.

Destination file for file output

Use --logger-destination to set the path to the destination file (default: output.log) for interactive commands, such as prune.

Log format and log pattern combinations

Plain / Plain

00:23.737 I [zio-fiber-0] com.digitalasset.scribe.pipeline.pipeline.Impl:34 Starting pipeline on behalf of 'Alice_1::12209982174bbaf1e6283234ab828bcab9b73fbe313315b181134bcae9566d3bbf1b'  application=scribe
00:24.658 I [zio-fiber-0] com.digitalasset.scribe.pipeline.pipeline.Impl:61 Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'  application=scribe
00:25.043 I [zio-fiber-895] com.digitalasset.zio.daml.ledgerapi.package:201 Contract filter inclusive of 1 templates and 0 interfaces  application=scribe
00:25.724 I [zio-fiber-0] com.digitalasset.scribe.pipeline.pipeline.Impl:85 Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'  application=scribe

Plain / Standard

component=scribe instance_uuid=5f707d27-8188-4a44-904e-2f98ee9f4177 timestamp=2024-01-16T23:42:38.902+0000 level=INFO correlation_id=tbd description=Starting pipeline on behalf of 'Alice_1::1220c6d22d46d59c8454bd245e5a3bc238e5024d37bfd843dbad6885674f3a9673c5'  scribe=application=scribe
component=scribe instance_uuid=5f707d27-8188-4a44-904e-2f98ee9f4177 timestamp=2024-01-16T23:42:39.734+0000 level=INFO correlation_id=tbd description=Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'  scribe=application=scribe
component=scribe instance_uuid=5f707d27-8188-4a44-904e-2f98ee9f4177 timestamp=2024-01-16T23:42:39.982+0000 level=INFO correlation_id=tbd description=Contract filter inclusive of 1 templates and 0 interfaces  scribe=application=scribe
component=scribe instance_uuid=5f707d27-8188-4a44-904e-2f98ee9f4177 timestamp=2024-01-16T23:42:40.476+0000 level=INFO correlation_id=tbd description=Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'  scribe=application=scribe

Plain / Custom

--logger-pattern=%timestamp{yyyy-MM-dd'T'HH:mm:ss} %level %name:%line %highlight{%message} %highlight{%cause} %kvs

2024-01-16T23:55:52 INFO com.digitalasset.scribe.pipeline.pipeline.Impl:34 Starting pipeline on behalf of 'Alice_1::1220444f494b31c0a40c2f393edac3f5900325028c6f810a203a0334cd830ec230c8'  application=scribe
2024-01-16T23:55:53 INFO com.digitalasset.scribe.pipeline.pipeline.Impl:61 Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'  application=scribe
2024-01-16T23:55:53 INFO com.digitalasset.zio.daml.ledgerapi.package:201 Contract filter inclusive of 1 templates and 0 interfaces  application=scribe
2024-01-16T23:55:53 INFO com.digitalasset.scribe.pipeline.pipeline.Impl:85 Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'  application=scribe

Json / Standard

{"component":"scribe","instance_uuid":"03c263a0-6e3d-416e-b7f2-0e56b9e34841","timestamp":"2024-01-17T00:04:12.537+0000","level":"INFO","correlation_id":"tbd","description":"Starting pipeline on behalf of 'Alice_1::1220f03ed424480ab4487d88230fc033f3910f4cb4492fea68535a5760744b53dabe'","scribe":{"application":"scribe"}}
{"component":"scribe","instance_uuid":"03c263a0-6e3d-416e-b7f2-0e56b9e34841","timestamp":"2024-01-17T00:04:13.551+0000","level":"INFO","correlation_id":"tbd","description":"Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'","scribe":{"application":"scribe"}}
{"component":"scribe","instance_uuid":"03c263a0-6e3d-416e-b7f2-0e56b9e34841","timestamp":"2024-01-17T00:04:13.935+0000","level":"INFO","correlation_id":"tbd","description":"Contract filter inclusive of 1 templates and 0 interfaces","scribe":{"application":"scribe"}}
{"component":"scribe","instance_uuid":"03c263a0-6e3d-416e-b7f2-0e56b9e34841","timestamp":"2024-01-17T00:04:14.659+0000","level":"INFO","correlation_id":"tbd","description":"Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'","scribe":{"application":"scribe"}}

Json / Structured

{"timestamp":"2024-01-17T00:08:25+0000","level":"INFO","thread":"zio-fiber-0","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:34","message":"Starting pipeline on behalf of 'Alice_1::122077c6b00e952ff694e2b25b6f5eb9582f815dfe793e2da668b119481a1dd5acdc'","application":"scribe"}
{"timestamp":"2024-01-17T00:08:26+0000","level":"INFO","thread":"zio-fiber-0","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:61","message":"Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'","application":"scribe"}
{"timestamp":"2024-01-17T00:08:26+0000","level":"INFO","thread":"zio-fiber-882","location":"com.digitalasset.zio.daml.ledgerapi.package:201","message":"Contract filter inclusive of 1 templates and 0 interfaces","application":"scribe"}
{"timestamp":"2024-01-17T00:08:26+0000","level":"INFO","thread":"zio-fiber-0","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:85","message":"Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'","application":"scribe"}

Json / Custom

--logger-pattern=%label{timestamp}{%timestamp{yyyy-MM-dd'T'HH:mm:ss}} %label{level}{%level} %label{location}{%name:%line} %label{description}{%message} %label{cause}{%cause} %label{scribe}{%kvs}

{"timestamp":"2024-01-17T00:16:31","level":"INFO","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:34","description":"Starting pipeline on behalf of 'Alice_1::1220ee13431ac437d454ea59d622cfc76599e0846a3caf166b4306d47b1bf83944a6'","scribe":{"application":"scribe"}}
{"timestamp":"2024-01-17T00:16:33","level":"INFO","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:61","description":"Last checkpoint is absent. Seeding from ACS before processing transactions with starting offset '00000000000000000b'","scribe":{"application":"scribe"}}
{"timestamp":"2024-01-17T00:16:34","level":"INFO","location":"com.digitalasset.zio.daml.ledgerapi.package:201","description":"Contract filter inclusive of 1 templates and 0 interfaces","scribe":{"application":"scribe"}}
{"timestamp":"2024-01-17T00:16:35","level":"INFO","location":"com.digitalasset.scribe.pipeline.pipeline.Impl:85","description":"Continuing from offset '00000000000000000b' and index '0' until offset '00000000000000000b'","scribe":{"application":"scribe"}}

Notice you need to use %label{your_label}{format} to describe a Json attribute-value pair.

Application metrics

Assuming PQS exposes metrics as described above, you can access the following metrics at http://localhost:9090/metrics. Each metric is accompanied by # HELP and # TYPE comments, which describe the meaning of the metric and its type, respectively. Some metric types have additional constituent parts exposed as separate metrics. For example, a histogram metric type tracks max, count, sum, and actual ranged buckets as separate time series. Metrics are labeled where it makes sense, providing additional context such as the type of operation or the template/choice involved. Conceptual list of metrics (refer to actual metric names in the Prometheus output):

Type	Name	Description
gauge	`watermark_ix`	Current watermark index (transaction ordinal number for consistent reads)
counter	`pipeline_events_total`	Processed ledger events
histogram	`jdbc_conn_use`	Latency of database connections usage
histogram	`jdbc_conn_isvalid`	Latency of database connection validation
histogram	`jdbc_conn_commit`	Latency of database connection commit
histogram	`total_tx_handling_latency`	Total latency of transaction handling in PQS (observed in LAPI to committed in DB)
gauge	`tx_lag_from_ledger_wallclock`	Lag from ledger (wall-clock delta (in ms) from command completion to receipt by pipeline)
histogram	`pipeline_convert_acs_event`	Latency of converting ACS events
histogram	`pipeline_convert_transaction`	Latency of converting transactions
histogram	`pipeline_prepare_batch_latency`	Latency of preparing batches of statements
histogram	`pipeline_execute_batch_latency`	Latency of executing batches of statements
histogram	`pipeline_progress_watermark_latency`	Latency of watermark progression
histogram	`pipeline_wp_acs_events_size`	Number of in-flight units of work in `pipeline_wp_acs_events` wait point
histogram	`pipeline_wp_acs_statements_size`	Number of in-flight units of work in `pipeline_wp_acs_statements` wait point
histogram	`pipeline_wp_acs_batched_statements_size`	Number of in-flight units of work in `pipeline_wp_acs_batched_statements` wait point
histogram	`pipeline_wp_acs_prepared_statements_size`	Number of in-flight units of work in `pipeline_wp_acs_prepared_statements` wait point
histogram	`pipeline_wp_events_size`	Number of in-flight units of work in `pipeline_wp_events` wait point
histogram	`pipeline_wp_statements_size`	Number of in-flight units of work in `pipeline_wp_statements` wait point
histogram	`pipeline_wp_batched_statements_size`	Number of in-flight units of work in `pipeline_wp_batched_statements` wait point
histogram	`pipeline_wp_prepared_statements_size`	Number of in-flight units of work in `pipeline_wp_prepared_statements` wait point
histogram	`pipeline_wp_watermarks_size`	Number of in-flight units of work in `pipeline_wp_watermarks` wait point
counter	`pipeline_wp_acs_events_total`	Number of units of work processed in `pipeline_wp_acs_events` wait point
counter	`pipeline_wp_acs_statements_total`	Number of units of work processed in `pipeline_wp_acs_statements` wait point
counter	`pipeline_wp_acs_batched_statements_total`	Number of units of work processed in `pipeline_wp_acs_batched_statements` wait point
counter	`pipeline_wp_acs_prepared_statements_total`	Number of units of work processed in `pipeline_wp_acs_prepared_statements` wait point
counter	`pipeline_wp_events_total`	Number of units of work processed in `pipeline_wp_events` wait point
counter	`pipeline_wp_statements_total`	Number of units of work processed in `pipeline_wp_statements` wait point
counter	`pipeline_wp_batched_statements_total`	Number of units of work processed in `pipeline_wp_batched_statements` wait point
counter	`pipeline_wp_prepared_statements_total`	Number of units of work processed in `pipeline_wp_prepared_statements` wait point
counter	`pipeline_wp_watermarks_total`	Number of units of work processed in `pipeline_wp_watermarks` wait point
counter	`app_restarts_total`	Tracks number of times recoverable failures forced the pipeline to restart
gauge	`grpc_up`	Indicator whether gRPC channel is up and operational
gauge	`jdbc_conn_pool_up`	Indicator whether JDBC connection pool is up and operational

Grafana dashboard

Based on the metrics described above, it is possible to build a comprehensive dashboard to monitor PQS. Vendor-supplied Grafana dashboard for PQS can be downloaded from artifacts repository (see pqs-download). You may want to refer to this as a starting point for your own.

grafana/v9.4.0/dashboard.json
grafana/v10.4.0/dashboard.json
grafana/v11.0.0/dashboard.json

Health check

The health of the PQS process can be monitored using the health check endpoint /livez. The health check endpoint is available on the configured network interface (--health-address) and TCP port (--health-port). Note the default is 127.0.0.1:8080.

$ curl http://localhost:8080/livez
{"status":"ok"}

Tracing of pipeline execution

PQS instruments the most critical parts of its operations with tracing to provide insights into the execution flow and performance. Traces can be exported to various OpenTelemetry backends by providing appropriate configuration, for example:

$ export OTEL_TRACES_EXPORTER=otlp
$ export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
$ export OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317"
$ export JDK_JAVA_OPTIONS="-javaagent:path/to/opentelemetry-javaagent.jar"
$ ./scribe.jar pipeline ledger postgres-document ...

The following root spans are emitted by PQS:

span name	description
`process metadata and schema`	interactions that happen when PQS starts up and ensures its datastore is ready for operations
`initialization routine`	interactions that happen when PQS establishes its offset range boundaries (including seeding from ACS if requested) on startup
`consume com.daml.ledger.api.v1.TransactionService/GetTransactions` `consume com.daml.ledger.api.v1.TransactionService/GetTransactionTrees`	[Daml SDK v2.x] timeline of processing a ledger transaction from delivery over gRPC to its persistence to datastore
`consume com.daml.ledger.api.v2.UpdateService/GetUpdates` `consume com.daml.ledger.api.v2.UpdateService/GetUpdateTrees`	[Daml SDK v3.x] timeline of processing a ledger transaction from delivery over gRPC to its persistence to datastore
`execute datastore transaction`	interactions when a batch of transactions is persisted to the datastore
`advance datastore watermark`	interactions when the latest consecutive watermark is persisted to the datastore

All spans are enriched with contextual information through OpenTelemetry’s attributes and events where appropriate. It is advisable to get to know this contextual data. Due to the technical nature of asynchronous and parallel execution, PQS heavily employs span links³ to highlight causal relationships between independent traces. Modern trace visualisation tools leverage this information to provide a usable representation and navigation through the involved traces. Below is an example of causal trace data that spans receipt of a transaction from the Ledger API all the way to it becoming visible by PQS’ SQL API in Postgres.

Span #110
Trace ID       : 042ce1ffa24b34b38472933ac8209d54
Parent ID      :
ID             : d5c0071e1d9bbf76
Name           : consume com.daml.ledger.api.v1.TransactionService/GetTransactionTrees
Kind           : Consumer
Start time     : 2024-11-06 03:16:43.004004 +0000 UTC
End time       : 2024-11-06 03:16:43.004193 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> messaging.operation.name: Str(consume)
     -> messaging.batch.message_count: Int(1)
     -> messaging.destination.name: Str(com.daml.ledger.api.v1.TransactionService/GetTransactionTrees)
     -> messaging.system: Str(canton)
     -> messaging.operation.type: Str(process)

Span #123
Trace ID       : 042ce1ffa24b34b38472933ac8209d54
Parent ID      : d5c0071e1d9bbf76
ID             : 9d60e1f4c42dce76
Name           : export transaction tree
Kind           : Internal
Start time     : 2024-11-06 03:16:43.004134 +0000 UTC
End time       : 2024-11-06 03:16:43.024574 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> daml.effective_at: Str(2024-11-06T03:16:42.827847Z)
     -> daml.command_id: Str(3563113460)
     -> daml.events_count: Int(3)
     -> daml.workflow_id: Empty()
     -> daml.transaction_id: Str(122056219af2a73f913e1c2f0ce4422c156bc9cfdb5e5d49baaee0053bf3787f4a97)
     -> daml.offset: Str(000000000000000261)
Events:
SpanEvent #0
     -> Name: canonicalizing transaction tree
     -> Timestamp: 2024-11-06 03:16:43.004809542 +0000 UTC
SpanEvent #1
     -> Name: canonicalized transaction tree
     -> Timestamp: 2024-11-06 03:16:43.005138375 +0000 UTC
SpanEvent #2
     -> Name: converting canonical transaction to domain model
     -> Timestamp: 2024-11-06 03:16:43.005690625 +0000 UTC
SpanEvent #3
     -> Name: converted canonical transaction to domain model
     -> Timestamp: 2024-11-06 03:16:43.006170917 +0000 UTC
SpanEvent #4
     -> Name: released transaction model into batch
     -> Timestamp: 2024-11-06 03:16:43.015018459 +0000 UTC
SpanEvent #5
     -> Name: prepared SQL statements for transaction model
     -> Timestamp: 2024-11-06 03:16:43.015437 +0000 UTC
SpanEvent #6
     -> Name: flushed transaction model SQL to datastore
     -> Timestamp: 2024-11-06 03:16:43.019356042 +0000 UTC
SpanEvent #7
     -> Name: advanced datastore watermark
     -> Timestamp: 2024-11-06 03:16:43.024570334 +0000 UTC
     -> Attributes::
          -> index: Int(384)
          -> offset: Str(000000000000000261)
Links:
SpanLink #0
     -> Trace ID: 839da768a12333920b709410fb73911a
     -> ID: 276627b6e10f62c5
     -> TraceState:
     -> Attributes::
          -> target: Str(↥ ledger submission)
SpanLink #1
     -> Trace ID: 76c58361d46c08761c37ef5821e8fb78
     -> ID: 6051f05f10af0399
     -> TraceState:
     -> Attributes::
          -> target: Str(↧ persist to datastore)
SpanLink #2
     -> Trace ID: 71e67e2420deeef36ef3efacea6399dc
     -> ID: 161b5911e7a0ec18
     -> TraceState:
     -> Attributes::
          -> target: Str(↧ advance watermark)

Span #115
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      :
ID             : 81f5f42361aa93ee
Name           : execute datastore transaction
Kind           : Internal
Start time     : 2024-11-06 03:16:43.015931 +0000 UTC
End time       : 2024-11-06 03:16:43.020991 +0000 UTC
Status code    : Unset
Status message :

Span #111
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      : 81f5f42361aa93ee
ID             : 4bf8484e99999c64
Name           : acquire connection
Kind           : Internal
Start time     : 2024-11-06 03:16:43.016475 +0000 UTC
End time       : 2024-11-06 03:16:43.016688 +0000 UTC
Status code    : Unset
Status message :

Span #113
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      : 81f5f42361aa93ee
ID             : 6051f05f10af0399
Name           : execute batch
Kind           : Internal
Start time     : 2024-11-06 03:16:43.016828 +0000 UTC
End time       : 2024-11-06 03:16:43.019494 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> scribe.batch.models_count: Int(37)
Links:
SpanLink #0
     -> Trace ID: 33736b299a690b885c2314b9b17bde05
     -> ID: aba3d1dd6024ff71
     -> TraceState:
     -> Attributes::
          -> offset: Str(00000000000000025c)
          -> target: Str(↥ incoming transaction)
SpanLink #1
     -> Trace ID: 17f3edce9565defd379bf3ab8243f86d
     -> ID: 076afe5b4aac1212
     -> TraceState:
     -> Attributes::
          -> offset: Str(00000000000000025d)
          -> target: Str(↥ incoming transaction)
SpanLink #2
     -> Trace ID: 646ae61de95731c7726a6caee2d69ee9
     -> ID: bca9f5c28de74c90
     -> TraceState:
     -> Attributes::
          -> offset: Str(00000000000000025e)
          -> target: Str(↥ incoming transaction)
SpanLink #3
     -> Trace ID: 9ebd4d4f288b8b338f4192c0d7ea1b8c
     -> ID: a1d145fa9d76d5b3
     -> TraceState:
     -> Attributes::
          -> offset: Str(00000000000000025f)
          -> target: Str(↥ incoming transaction)
SpanLink #4
     -> Trace ID: e0716f968b5019a450da04317ea8f776
     -> ID: a75658ce89441bee
     -> TraceState:
     -> Attributes::
          -> offset: Str(000000000000000260)
          -> target: Str(↥ incoming transaction)
SpanLink #5
     -> Trace ID: 042ce1ffa24b34b38472933ac8209d54
     -> ID: 9d60e1f4c42dce76
     -> TraceState:
     -> Attributes::
          -> offset: Str(000000000000000261)
          -> target: Str(↥ incoming transaction)

Span #112
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      : 6051f05f10af0399
ID             : 00419239933510fa
Name           : execute SQL
Kind           : Internal
Start time     : 2024-11-06 03:16:43.016855 +0000 UTC
End time       : 2024-11-06 03:16:43.019162 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> scribe.__contracts.rows_count: Int(9)
     -> scribe.__exercises.rows_count: Int(3)
     -> scribe.__events.rows_count: Int(12)
     -> scribe.__archives.rows_count: Int(1)
     -> scribe.__transactions.rows_count: Int(6)

Span #114
Trace ID       : 76c58361d46c08761c37ef5821e8fb78
Parent ID      : 81f5f42361aa93ee
ID             : 9872ff55adc9e370
Name           : commit transaction
Kind           : Internal
Start time     : 2024-11-06 03:16:43.019916 +0000 UTC
End time       : 2024-11-06 03:16:43.020742 +0000 UTC
Status code    : Unset
Status message :

Span #124
Trace ID       : 71e67e2420deeef36ef3efacea6399dc
Parent ID      :
ID             : 161b5911e7a0ec18
Name           : advance datastore watermark
Kind           : Internal
Start time     : 2024-11-06 03:16:43.021507 +0000 UTC
End time       : 2024-11-06 03:16:43.024872 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> scribe.watermark.offset: Str(000000000000000261)
     -> scribe.watermark.ix: Int(384)
Links:
SpanLink #0
     -> Trace ID: 76c58361d46c08761c37ef5821e8fb78
     -> ID: 6051f05f10af0399
     -> TraceState:
     -> Attributes::
          -> target: Str(↥ persist to datastore)

Span #116
Trace ID       : 71e67e2420deeef36ef3efacea6399dc
Parent ID      : 161b5911e7a0ec18
ID             : 33ab3918ebfe138d
Name           : acquire connection
Kind           : Internal
Start time     : 2024-11-06 03:16:43.022009 +0000 UTC
End time       : 2024-11-06 03:16:43.022222 +0000 UTC
Status code    : Unset
Status message :

Span #6
Trace ID       : 71e67e2420deeef36ef3efacea6399dc
Parent ID      : 161b5911e7a0ec18
ID             : 1a66240dfd597654
Name           : UPDATE scribe.__watermark
Kind           : Client
Start time     : 2024-11-06 03:16:43.022737084 +0000 UTC
End time       : 2024-11-06 03:16:43.023134917 +0000 UTC
Status code    : Unset
Status message :
Attributes:
     -> db.operation: Str(UPDATE)
     -> db.sql.table: Str(__watermark)
     -> db.name: Str(scribe)
     -> db.connection_string: Str(postgresql://postgres-scribe:5432)
     -> server.address: Str(postgres-scribe)
     -> server.port: Int(5432)
     -> db.user: Str(pguser)
     -> db.statement: Str(update __watermark set "offset" = ?, ix = ?;)
     -> db.system: Str(postgresql)

Span #117
Trace ID       : 71e67e2420deeef36ef3efacea6399dc
Parent ID      : 161b5911e7a0ec18
ID             : f0b24fb074fe41f8
Name           : commit transaction
Kind           : Internal
Start time     : 2024-11-06 03:16:43.023629 +0000 UTC
End time       : 2024-11-06 03:16:43.024157 +0000 UTC
Status code    : Unset
Status message :

Trace context propagation

PQS is an intermediary between a ledger instance and downstream applications that would prefer to access data through SQL rather than in streaming manner from Ledger API directly. Despite forming a pipeline between two data storage systems (Canton and PostgreSQL), PQS stores the original ledger transaction’s trace context (see also open-tracing-ledger-api-client) for the purposes of propagation rather than its own. This allows downstream applications to decide for themselves how they want to connect to the original submission’s trace (as a child span or as a new trace connected through span links).

select "offset",
       (trace_context).trace_parent,
       (trace_context).trace_state
from transactions limit 1;

offset       |                      trace_parent                       |   trace_state
--------------------+---------------------------------------------------------+-----------------
0000000000000000bb | 00-f35923baa38cc520a1fc3aec6771380b-b4cf363cbf5efa6a-01 | foo=bar,baz=qux

Span #85
    Trace ID       : f35923baa38cc520a1fc3aec6771380b
    Parent ID      : d3300bedd4c64511
    ID             : b4cf363cbf5efa6a
    Name           : MessageDispatcher.handle
    Kind           : Internal
    Start time     : 2024-11-05 04:01:40.808 +0000 UTC
    End time       : 2024-11-05 04:01:40.822694083 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> canton.class: Str(com.digitalasset.canton.participant.protocol.EnterpriseMessageDispatcher)
↑↑↑ span context propagated through transaction/tree stream in Ledger API

↓↓↓ following parent's links chain leads us to the root span of original submission
Span #19
    Trace ID       : f35923baa38cc520a1fc3aec6771380b
    Parent ID      :
    ID             : de3aed62b5fb43ce
    Name           : com.daml.ledger.api.v1.CommandService/SubmitAndWaitForTransaction
    Kind           : Server
    Start time     : 2024-11-05 04:01:40.569 +0000 UTC
    End time       : 2024-11-05 04:01:40.866904459 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> rpc.method: Str(SubmitAndWaitForTransaction)
     -> daml.submitter: Str()
     -> rpc.service: Str(com.daml.ledger.api.v1.CommandService)
     -> net.peer.port: Int(38640)
     -> net.transport: Str(ip_tcp)
     -> daml.workflow_id: Str()
     -> daml.command_id: Str(3498760027)
     -> rpc.system: Str(grpc)
     -> net.peer.ip: Str(172.18.0.15)
     -> daml.application_id: Str(appid)
     -> rpc.grpc.status_code: Int(0)

Accessing data stored in PQS’ transactions.trace_context column allows any application to re-create the propagated trace context⁴ and use it with their runtime’s instrumentation library.

Diagnostics

PQS is capable of exporting diagnostic telemetry snapshots. This data export archive contains essential troubleshooting information such as:

application thread dumps (over a period of time)
application metrics (over a period of time)

Getting this archive is as easy as accessing the socket with netcat tool:

$ nc localhost 9091 > health-dump.zip
$ unzip health-dump.zip
Archive:  health-dump.zip
  inflating: metrics.openmetrics
  inflating: threads-20250307-105606.zip

The table below lists the available configuration sources with priority decreasing from left to right:

System property	Environment variable	Default value	Description
`da.diagnostics.enabled`	`DA_DIAGNOSTICS_ENABLED`	`true`	Enables/disables diagnostics data collection and exposition
`da.diagnostics.host`	`DA_DIAGNOSTICS_HOST`	`127.0.0.1`	Hostname or IP address to use for binding the exposition socket
`da.diagnostics.port`	`DA_DIAGNOSTICS_PORT`	`0`	Port to use for binding the exposition socket (`0` = random port)
`da.diagnostics.dump.path`	`DA_DIAGNOSTICS_DUMP_PATH`	`<empty>`	Directory to write to on graceful shutdown (path needs to be an existing writable directory)
`da.diagnostics.metrics.interval`	`DA_DIAGNOSTICS_METRICS_INTERVAL`	`PT10S`	Metrics collection interval in ISO 8601 format
`da.diagnostics.metrics.buffer.size`	`DA_DIAGNOSTICS_METRICS_BUFFER_SIZE`	`60`	Quantity of samples to store for each monitored metric (rolling window)
`da.diagnostics.metrics.tags`	`DA_DIAGNOSTICS_METRICS_TAGS`	`<empty>`	Comma-separated list of additional labels to enrich each metric with during exposition (for example, `job=myapp,env=staging,deployed=20250101`)
`da.diagnostics.threads.interval`	`DA_DIAGNOSTICS_THREADS_INTERVAL`	`PT1M`	Thread dumps collection interval in ISO 8601 format
`da.diagnostics.threads.buffer.size`	`DA_DIAGNOSTICS_THREADS_BUFFER_SIZE`	`10`	Quantity of thread dumps to store (rolling window)

Documentation Index

​Approach to observability

​Logging

​Log level

​Per-logger log level

​Log pattern

​Log format for console output

​Log format for file output

​Destination file for file output

​Log format and log pattern combinations

​Application metrics

​Grafana dashboard

​Health check

​Tracing of pipeline execution

​Trace context propagation

​Diagnostics

Footnotes