Supported Metrics#
Warning
Note that CTA Metrics is in its early stages and considered experimental. Published metrics and their attributes are likely to change.
Metric and attribute names follow OpenTelemetry semantic conventions, with CTA-specific prefixes for internal domains (e.g., cta.taped, cta.scheduler).
Info
For how to configure CTA to publish metrics, see Enabling Metrics.
Metrics#
| Metric Name | Type | Unit | Description | Attributes |
|---|---|---|---|---|
db.client.connection.count |
UpDownCounter | 1 | The number of connections that are currently in a state described by the state attribute. |
db.namespace db.system.name state |
db.client.operation.duration |
Histogram | ms | Duration of database client operations. | db.namespace db.system.name ( error.type) |
cta.frontend.request.duration |
Histogram | ms | Duration the frontend takes to process a request. | event.name ( error.type) |
cta.frontend.active_requests |
UpDownCounter | ms | Number of in-flight frontend requests. | event.name |
cta.scheduler.operation.duration |
Histogram | ms | Duration of a CTA scheduling operation. | cta.scheduler.operation.name |
cta.objectstore.lock.acquire.duration |
Histogram | ms | Duration taken to acquire an objectstore lock. | lock.type |
cta.taped.transfer.file.count |
Counter | 1 | Number of files transferred using the io medium in the given io direction. | cta.io.direction cta.io.medium ( error.type) |
cta.taped.transfer.file.size |
Counter | by | Bytes transferred using the io medium in the given io direction. | cta.io.direction cta.io.medium |
cta.taped.transfer.active |
UpDownCounter | 1 | Number of threads actively transferring using the io medium in the given io direction. | cta.io.direction cta.io.medium |
cta.taped.buffer.usage |
Gauge | by | Bytes in use by the memory buffer in cta-taped. | |
cta.taped.buffer.limit |
Gauge | by | Total bytes available for the memory buffer in cta-taped. | |
cta.taped.mount.duration |
Histogram | s | Duration to mount a tape. | cta.io.direction |
cta.taped.mount.type |
UpDownCounter | 1 | Number of drive sessions with the given mount type. | cta.taped.mount.type |
cta.taped.drive.status |
UpDownCounter | 1 | Number of drives in a given state. | cta.taped.drive.state |
Resource Attributes#
| Attribute Name | Description |
|---|---|
service.namespace |
Logical namespace of the service emitting the metric. Equivalent to the instance name in CTA. |
service.name |
Name of the service emitting the metric (e.g. cta.taped, cta.frontend). |
service.version |
Version of the service emitting the metric. |
service.instance.id |
Unique identifier for the specific service instance. Useful when multiple replicas run under the same namespace. |
process.title |
Title of the process within the service. For cta.taped, this means per-drive. |
host.name |
Host on which the service is running. |
cta.scheduler.namespace |
Logical name of the scheduler backend in use (e.g. disk, tape). |
tape.drive.name |
Name of the tape drive (only exposed for cta-taped). |
tape.library.logical.name |
Name of the logical library of the tape drive (only exposed for cta-taped). |
Metric Attributes#
| Attribute Name | Description |
|---|---|
db.namespace |
Database namespace (schema or logical grouping). |
db.system.name |
Name of the database system (e.g., postgresql, oracle). |
cta.scheduler.operation.name |
Name of the CTA scheduling operation (e.g. enqueueArchive, cancelRepack). |
cta.frontend.requester.name |
Name of the frontend event requester (e.g. user, subsystem, or service calling the API). |
cta.io.direction |
Direction of the transfer (read or write). |
cta.io.medium |
Medium used for io (disk or tape). |
cta.taped.thread_pool.name |
Name of the thread pool handling taped operations. |
cta.taped.drive.state |
State that the drive is in |
cta.taped.mount.type |
Type of mount. |
lock.type |
Type of lock being acquired in the object store or internal resource (e.g., read, write). |
event.name |
Name of the event being tracked (e.g., frontend or scheduler event). |
error.type |
Classification of an error that occurred (e.g., network, timeout, permission_denied). |
state |
Operational or lifecycle state represented by the metric (e.g., active, queued, failed). |
le |
Histogram bucket upper bound (“less than or equal” duration in ms). |
Note on Resource vs Metric Attributes#
Resource attributes describe the entity that produced the telemetry (service/process/host). They are not automatically attached to every metric by all backends.
- In Prometheus, resource attributes are exposed as a separate time series named
target_info(and related info series). They are not labels on each metric by default. - If you need resource attributes as labels on every series, either:
- enable resource -> metric label conversion in your OpenTelemetry Collector pipeline, or
- join metrics with
target_infoin PromQL (e.g.,on(...) group_left(...)) at query time. - Prometheus label keys are sanitized to be valid identifiers (e.g.,
service.instance.id->service_instance_id).