Application Metrics with Prometheus and Node.js

Learn how to instrument your Node.js applications with Prometheus metrics for better observability and performance monitoring in production.
While I was debugging a performance issue in production the other day, I realized something important: I had no idea what was actually happening inside my application. I was flying blind. The logs showed requests were slow, but why? Which endpoints? What percentiles? I was once guilty of thinking "if it runs, ship it" — but production taught me otherwise.
This experience pushed me to finally understand application metrics, and Prometheus became my go-to solution. Let me share what I learned about instrumenting Node.js applications the right way.
Why Application Metrics Matter in Node.js
Here's the thing most developers don't realize until it's too late: logs tell you what happened, but metrics tell you what's happening right now. When I finally decided to add proper metrics to my applications, I discovered issues I never knew existed.
You might have a memory leak that slowly builds up over days. Or an endpoint that's fast 99% of the time but occasionally takes 30 seconds. These problems hide in the averages. Traditional monitoring catches the obvious crashes, but metrics catch the subtle degradations that lose customers.
Little did I know that Node.js's event loop behavior makes metrics even more critical. A single blocking operation can cascade through your entire application, and without metrics, you'll never see it coming.

Understanding Prometheus Metric Types
Before you start instrumenting everything, you need to understand what you're measuring. Prometheus offers four metric types, and choosing wrong will give you useless data.
Counters only go up. They're perfect for tracking requests, errors, or any event that accumulates. I use these for counting API calls, database queries, and failed authentication attempts.
Gauges go up and down. Think of them as snapshots: current memory usage, active connections, queue length. When I was tracking WebSocket connections, gauges showed me exactly when users were connecting and disconnecting.
Histograms measure distributions. This is where things get fascinating! They track request durations, response sizes, or any measurement where you care about percentiles. I cannot stress this enough: histograms are your friend for understanding latency.
Summaries are similar to histograms but calculate percentiles client-side. I rarely use these because histograms are more flexible for aggregation across instances.
The mistake I made early on? Using gauges for everything. I tried to track request duration with a gauge by updating it each time. Terrible idea! You lose all distribution information and only see the last value.
Setting Up Prometheus Client in Node.js
Let's get practical. Here's how I structure metrics in a real Node.js application:
const express = require('express');
const client = require('prom-client');
const app = express();
// Create a Registry
const register = new client.Registry();
// Add default metrics (CPU, memory, event loop lag)
client.collectDefaultMetrics({ register });
// Custom counter for HTTP requests
const httpRequestTotal = new client.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register]
});
// Custom histogram for request duration
const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10],
registers: [register]
});
// Gauge for active requests
const httpRequestsInProgress = new client.Gauge({
name: 'http_requests_in_progress',
help: 'Number of HTTP requests currently being processed',
labelNames: ['method', 'route'],
registers: [register]
});
// Metrics endpoint
app.get('/metrics', async (req, res) => {
res.setHeader('Content-Type', register.contentType);
const metrics = await register.metrics();
res.send(metrics);
});
app.listen(3000, () => {
console.log('Server running on port 3000');
});This setup gives you the foundation. The collectDefaultMetrics() call is wonderful because it automatically tracks Node.js internals like heap usage and event loop lag. When I came across an event loop issue, these default metrics caught it before users noticed.
Instrumenting HTTP Requests with Custom Metrics
Now here's where most tutorials stop, but this is where the real value starts. You need middleware that tracks every request automatically:
import { Request, Response, NextFunction } from 'express';
function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
const start = Date.now();
const route = req.route?.path || req.path;
// Increment in-progress gauge
httpRequestsInProgress.inc({ method: req.method, route });
// Track when response finishes
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const statusCode = res.statusCode.toString();
// Record metrics
httpRequestTotal.inc({
method: req.method,
route,
status_code: statusCode
});
httpRequestDuration.observe({
method: req.method,
route,
status_code: statusCode
}, duration);
// Decrement in-progress gauge
httpRequestsInProgress.dec({ method: req.method, route });
});
next();
}
app.use(metricsMiddleware);
// Example routes
app.get('/api/users', async (req, res) => {
const users = await db.getUsers();
res.json(users);
});
app.post('/api/users', async (req, res) => {
const user = await db.createUser(req.body);
res.status(201).json(user);
});This middleware tracks three critical things: total requests (counter), request duration (histogram), and concurrent requests (gauge). When I finally implemented this pattern, I discovered that my /api/search endpoint was taking 10x longer than everything else during peak hours.

Tracking Business Metrics: Beyond Infrastructure
Here's something I wish I'd learned earlier: infrastructure metrics are necessary but not sufficient. You also need business metrics. In other words, don't just track HTTP status codes — track what matters to your business.
For an e-commerce app, I track checkout completions, cart abandonments, and payment processing time. For a SaaS platform, I track feature usage, API quota consumption, and user session duration.
const checkoutCounter = new client.Counter({
name: 'checkout_completions_total',
help: 'Total number of completed checkouts',
labelNames: ['payment_method', 'currency'],
registers: [register]
});
const cartValue = new client.Histogram({
name: 'cart_value_dollars',
help: 'Distribution of cart values',
buckets: [10, 25, 50, 100, 250, 500, 1000],
registers: [register]
});
app.post('/api/checkout', async (req, res) => {
const { paymentMethod, currency, amount } = req.body;
try {
await processPayment(req.body);
checkoutCounter.inc({
payment_method: paymentMethod,
currency
});
cartValue.observe(amount);
res.json({ success: true });
} catch (error) {
res.status(500).json({ error: 'Payment failed' });
}
});These metrics directly translate to revenue impact. When cart values dropped by 15%, we caught it within hours — not weeks later in a financial report.
Choosing the Right Buckets for Histograms
This is where I see developers make expensive mistakes. Histogram buckets define your granularity, and bad buckets mean useless data.
For HTTP request duration, I use: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]. This gives me good resolution under 1 second (where most requests should be) and broader buckets for slower requests.
For cart values, I use: [10, 25, 50, 100, 250, 500, 1000]. This aligns with our pricing tiers and helps us understand purchasing patterns.
The key insight? Your buckets should match your SLAs and business thresholds. If your SLA promises 95% of requests under 500ms, make sure you have a bucket at 0.5.
Luckily we can change buckets without losing existing data — Prometheus calculates percentiles from the buckets you define. But if your buckets are too coarse, you'll lose precision forever.
Exposing Metrics Endpoint and Scrape Configuration
Your metrics are useless if Prometheus can't scrape them. The /metrics endpoint I showed earlier exposes metrics in Prometheus format, but you need to configure Prometheus to scrape it.
Create a prometheus.yml configuration:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'nodejs-app'
static_configs:
- targets: ['localhost:3000']This tells Prometheus to scrape your Node.js app every 15 seconds. In production, I use service discovery instead of static targets, but this works wonderfully for getting started.
One mistake I made: exposing /metrics publicly. That endpoint can leak internal information about your architecture. Always put it behind authentication or restrict it to your monitoring network.
Production-Ready Metrics Strategy
After running Prometheus in production for several years, here's what I've learned works:
Start with default metrics and HTTP instrumentation. These give you immediate value. Then add business metrics that align with your KPIs. Don't go overboard — every metric has a cost.
Use labels strategically. Labels create new time series, and too many time series will crush your Prometheus server. I limit labels to high-cardinality dimensions like HTTP method, route, and status code. User IDs or request IDs? Never use those as labels.
Set up alerting rules based on your metrics. A metric without an alert is just interesting information. When request duration p95 crosses 1 second, I want to know immediately.
And that concludes the end of this post! I hope you found this valuable and look out for more in the future!