More metrics documentation

ericallam · ericallam · commit 1898b5ffbc33 · 2026-02-20T13:42:33.000Z
diff --git a/docs/insights/metrics.mdx b/docs/insights/metrics.mdx
@@ -9,6 +9,12 @@ In the Trigger.dev dashboard we have built-in dashboards and you can create your
 
 Metrics dashboards are powered by [TRQL queries](/insights/query) with widgets that can be displayed as charts, tables, or single values. They automatically refresh to show the latest data.
 
+### Available metrics data
+
+Trigger.dev automatically collects system metrics (CPU, memory, disk, network) and Node.js runtime metrics (event loop, heap) for all deployed tasks -- no configuration needed. You can also create custom metrics using the `otel.metrics` API from the SDK.
+
+All of this data is available in the `metrics` table for use in dashboard widgets. See [Logging, tracing & metrics](/logging#metrics) for the full list of automatic metrics and how to create custom ones, or the [Query page](/insights/query#metrics-table-columns) for the `metrics` table schema.
+
 ![The built-in Metrics dashboard](/images/metrics-built-in.png)
 
 ### Visualization types
diff --git a/docs/insights/query.mdx b/docs/insights/query.mdx
@@ -6,7 +6,43 @@ description: "Query allows you to write custom queries against your data using T
 ### Available tables
 
 - `runs`: contains all task run data including status, timing, costs, and metadata
-- `metrics`: contains metrics data for your runs including CPU, memory, and your custom metrics.
+- `metrics`: contains metrics data for your runs including CPU, memory, and your custom metrics
+
+### `metrics` table columns
+
+| Column | Type | Description |
+| :--- | :--- | :--- |
+| `metric_name` | string | Metric identifier (e.g., `process.cpu.utilization`) |
+| `metric_type` | string | `gauge`, `sum`, or `histogram` |
+| `value` | number | The observed value |
+| `bucket_start` | datetime | 10-second aggregation bucket start time |
+| `run_id` | string | Associated run ID |
+| `task_identifier` | string | Task slug |
+| `attempt_number` | number | Attempt number |
+| `machine_id` | string | Machine that produced the metric |
+| `machine_name` | string | Machine preset (e.g., `small-1x`) |
+| `worker_version` | string | Worker version |
+| `environment_type` | string | `PRODUCTION`, `STAGING`, `DEVELOPMENT`, `PREVIEW` |
+| `attributes` | json | Raw JSON attributes for custom data |
+
+See [Logging, tracing & metrics](/logging#automatic-system-and-runtime-metrics) for the full list of automatically collected metrics and how to create custom metrics.
+
+### `prettyFormat()`
+
+Use `prettyFormat()` to format metric values for display:
+
+```sql
+SELECT
+  timeBucket(),
+  prettyFormat(avg(value), 'bytes') AS avg_memory
+FROM metrics
+WHERE metric_name = 'process.memory.usage'
+GROUP BY timeBucket
+ORDER BY timeBucket
+LIMIT 1000
+```
+
+Available format types: `bytes`, `percent`, `duration`, `durationSeconds`, `quantity`, `costInDollars`.
 
 ## Using the Query dashboard
 
diff --git a/docs/logging.mdx b/docs/logging.mdx
@@ -1,6 +1,6 @@
 ---
-title: "Logging and tracing"
-description: "How to use the built-in logging and tracing system."
+title: "Logging, tracing & metrics"
+description: "How to use the built-in logging, tracing, and metrics system."
 ---
 
 ![The run log](/images/run-log.png)
@@ -77,3 +77,126 @@ export const customTrace = task({
   },
 });
 ```
+
+## Metrics
+
+Trigger.dev collects system and runtime metrics automatically for deployed tasks, and provides an API for recording custom metrics using OpenTelemetry.
+
+You can view metrics in the [Metrics dashboards](/insights/metrics), query them with [TRQL](/insights/query), and export them to external services via [telemetry exporters](/config/config-file#telemetry).
+
+### Custom metrics API
+
+Import `otel` from `@trigger.dev/sdk` and use the standard OpenTelemetry Metrics API to create custom instruments.
+
+Create instruments **at module level** (outside the task `run` function) so they are reused across runs:
+
+```ts /trigger/metrics.ts
+import { task, logger, otel } from "@trigger.dev/sdk";
+
+// Create a meter — instruments are created once at module level
+const meter = otel.metrics.getMeter("my-app");
+
+const itemsProcessed = meter.createCounter("items.processed", {
+  description: "Total number of items processed",
+  unit: "items",
+});
+
+const itemDuration = meter.createHistogram("item.duration", {
+  description: "Time spent processing each item",
+  unit: "ms",
+});
+
+const queueDepth = meter.createUpDownCounter("queue.depth", {
+  description: "Current queue depth",
+  unit: "items",
+});
+
+export const processQueue = task({
+  id: "process-queue",
+  run: async (payload: { items: string[] }) => {
+    queueDepth.add(payload.items.length);
+
+    for (const item of payload.items) {
+      const start = performance.now();
+
+      // ... process item ...
+
+      const elapsed = performance.now() - start;
+
+      itemsProcessed.add(1, { "item.type": "order" });
+      itemDuration.record(elapsed, { "item.type": "order" });
+      queueDepth.add(-1);
+    }
+
+    logger.info("Queue processed", { count: payload.items.length });
+  },
+});
+```
+
+#### Available instrument types
+
+| Instrument | Method | Use case |
+| :--- | :--- | :--- |
+| Counter | `meter.createCounter()` | Monotonically increasing values (items processed, requests sent) |
+| Histogram | `meter.createHistogram()` | Distributions of values (durations, sizes) |
+| UpDownCounter | `meter.createUpDownCounter()` | Values that go up and down (queue depth, active connections) |
+
+All instruments accept optional attributes when recording values. Attributes let you break down metrics by dimension (e.g., by item type, status, or region).
+
+### Automatic system and runtime metrics
+
+Trigger.dev automatically collects the following metrics for deployed tasks. No configuration is needed.
+
+| Metric name | Type | Unit | Description |
+| :--- | :--- | :--- | :--- |
+| `process.cpu.utilization` | gauge | ratio | Process CPU usage (0-1) |
+| `process.cpu.time` | counter | seconds | CPU time consumed |
+| `process.memory.usage` | gauge | bytes | Process memory usage |
+| `system.memory.usage` | gauge | bytes | System memory usage |
+| `system.memory.utilization` | gauge | ratio | System memory utilization (0-1) |
+| `system.network.io` | counter | bytes | Network I/O |
+| `system.disk.io` | counter | bytes | Disk I/O (Linux only) |
+| `system.disk.operations` | counter | operations | Disk operations (Linux only) |
+| `system.filesystem.usage` | gauge | bytes | Filesystem usage (Linux only) |
+| `system.filesystem.utilization` | gauge | ratio | Filesystem utilization (Linux only) |
+| `nodejs.event_loop.utilization` | gauge | ratio | Event loop utilization (0-1) |
+| `nodejs.event_loop.delay.p95` | gauge | seconds | Event loop delay p95 |
+| `nodejs.event_loop.delay.max` | gauge | seconds | Event loop delay max |
+| `nodejs.heap.used` | gauge | bytes | V8 heap used |
+| `nodejs.heap.total` | gauge | bytes | V8 heap total |
+
+<Note>
+In dev mode (`trigger dev`), `system.*` metrics are not collected to reduce noise. Only `process.*`, `nodejs.*`, and custom metrics are available during development.
+</Note>
+
+### Context attributes
+
+All metrics (both automatic and custom) are tagged with run context so you can filter and group them:
+
+- `run_id` — the run that produced the metric
+- `task_identifier` — the task slug
+- `attempt_number` — the attempt number
+- `machine_name` — the machine preset (e.g., `small-1x`)
+- `worker_version` — the deployed worker version
+- `environment_type` — `PRODUCTION`, `STAGING`, `DEVELOPMENT`, or `PREVIEW`
+
+### Querying metrics
+
+Use [TRQL](/insights/query) to query metrics data. For example, to see average CPU utilization over time:
+
+```sql
+SELECT
+  timeBucket(),
+  avg(value) AS avg_cpu
+FROM metrics
+WHERE metric_name = 'process.cpu.utilization'
+GROUP BY timeBucket
+ORDER BY timeBucket
+LIMIT 1000
+```
+
+See the [Query page](/insights/query#metrics-table-columns) for the full `metrics` table schema.
+
+### Exporting metrics
+
+You can send metrics to external observability services (Axiom, Honeycomb, Datadog, etc.) by configuring [telemetry exporters](/config/config-file#telemetry) in your `trigger.config.ts`.