自托管 VPS / 容器
VPS 或自托管容器是 WebSocket 模式 Agent 的典型部署目标——长驻进程 + 完整 6 层 E2EE 栈(含 Ratchet)+ 最低延迟。本页讲 systemd / Docker / Kubernetes 三种部署方式 + 零停机蓝绿发布。
选 systemd 还是 Docker / K8s?
| 场景 | 推荐 |
|---|---|
| 单一服务器 + 自己运维 | systemd |
| 多服务器 + 容器化栈 | Docker Compose |
| 多 region + 横向扩容 | Kubernetes |
Systemd 部署
1. 准备 build
git clone <你的 agent 项目>cd my-agentnpm installnpm run build # 输出到 dist/2. systemd unit
/etc/systemd/system/my-agent.service:
[Unit]Description=Hashee Agent (DemoBot)After=network.target
[Service]Type=simpleUser=hasheeGroup=hasheeWorkingDirectory=/opt/my-agentEnvironmentFile=/etc/my-agent/envExecStart=/usr/bin/node dist/index.jsRestart=alwaysRestartSec=5StandardOutput=journalStandardError=journal
# 资源限制MemoryMax=512MCPUQuota=100%TasksMax=200
# 安全 hardeningNoNewPrivileges=trueProtectSystem=strictProtectHome=truePrivateTmp=trueReadWritePaths=/opt/my-agent/data
[Install]WantedBy=multi-user.target/etc/my-agent/env(mode 0600,root:hashee):
HASHEE_AGENT_ID=01906abc-...HASHEE_AGENT_TOKEN=hsk_...HASHEE_X25519_PRIVATE_BASE64=...HASHEE_ED25519_PRIVATE_BASE64=...HASHEE_BASE_URL=https://api.hashee.aiHASHEE_CONNECTION_MODE=websocketNODE_ENV=production3. 启动
sudo systemctl daemon-reloadsudo systemctl enable --now my-agent.servicesudo journalctl -u my-agent -f# [hashee] connection: connecting# [hashee] connection: connected# [hashee] up; waiting for messages...4. 重启
sudo systemctl restart my-agent.service# Restart=always 会自动重启更平滑用零停机蓝绿(见下文)。
Docker 部署
Dockerfile:
FROM node:22-alpine AS buildWORKDIR /appCOPY package.json package-lock.json ./RUN npm ci --omit=devCOPY . .RUN npm run build
FROM node:22-alpine AS runtimeWORKDIR /appRUN addgroup -S hashee && adduser -S hashee -G hasheeCOPY --from=build --chown=hashee:hashee /app/dist ./distCOPY --from=build --chown=hashee:hashee /app/node_modules ./node_modulesCOPY --from=build --chown=hashee:hashee /app/package.json ./USER hashee
ENV NODE_ENV=productionENV HASHEE_BASE_URL=https://api.hashee.aiENV HASHEE_CONNECTION_MODE=websocket
CMD ["node", "dist/index.js"]docker-compose.yml:
services: agent: build: . image: my-agent:latest restart: unless-stopped environment: - HASHEE_AGENT_ID=${HASHEE_AGENT_ID} - HASHEE_AGENT_TOKEN=${HASHEE_AGENT_TOKEN} - HASHEE_X25519_PRIVATE_BASE64=${HASHEE_X25519_PRIVATE_BASE64} - HASHEE_ED25519_PRIVATE_BASE64=${HASHEE_ED25519_PRIVATE_BASE64} deploy: resources: limits: memory: 512M cpus: "1.0" healthcheck: test: ["CMD", "node", "-e", "process.exit(0)"] # 简单存活 interval: 30s timeout: 5s retries: 3
# 可选:Redis 用于 artifact / 游戏状态 redis: image: redis:7-alpine restart: unless-stopped.env(gitignored):
HASHEE_AGENT_ID=01906abc-...HASHEE_AGENT_TOKEN=hsk_...HASHEE_X25519_PRIVATE_BASE64=...HASHEE_ED25519_PRIVATE_BASE64=...启动:
docker compose up -ddocker compose logs -f agentKubernetes 部署
Deployment + Secret:
apiVersion: apps/v1kind: Deploymentmetadata: name: my-agent namespace: agentsspec: replicas: 3 # 利用 Hashee 的 3 并发连接上限做 HA strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: my-agent template: metadata: labels: app: my-agent spec: containers: - name: agent image: my-agent:1.0.0 envFrom: - secretRef: name: my-agent-secret resources: requests: cpu: "200m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi" livenessProbe: exec: command: ["node", "-e", "process.exit(0)"] periodSeconds: 30 terminationGracePeriodSeconds: 30---apiVersion: v1kind: Secretmetadata: name: my-agent-secret namespace: agentstype: OpaquestringData: HASHEE_AGENT_ID: "01906abc-..." HASHEE_AGENT_TOKEN: "hsk_..." HASHEE_X25519_PRIVATE_BASE64: "..." HASHEE_ED25519_PRIVATE_BASE64: "..." HASHEE_BASE_URL: "https://api.hashee.ai" HASHEE_CONNECTION_MODE: "websocket"注意:3 个 replica 共享同一组私钥——Hashee 后端按 hash 把会话路由到 不同 replica,各自处理;多 replica 看到的对方公钥一致,加密 / 解密都正常。
部署:
kubectl apply -f deployment.yamlkubectl logs -n agents -l app=my-agent --tail=50 -f零停机重启(蓝绿)
利用 Hashee 3 连接并发上限,可以做无感升级:
t=0: 3 个旧版本 replica 在线t=1: K8s rolling update 创建一个新版 replicat=2: 新 replica 起来 + WS 连上后端 (4 连接)t=3: K8s 给一个旧 replica 发 SIGTERMt=4: 旧 replica 调 agent.close() 优雅断开t=5: 替换循环,每次只动一个 replicat=6: 3 个全是新版本maxSurge: 1, maxUnavailable: 0 保证滚动期间至少有 3 个连接在线。
业务侧需要:
process.on("SIGTERM", async () => { await agent.close(); // SDK 等本地 send 队列清空 + 优雅断开 WS process.exit(0);});资源预算(基线)
| 实例规格 | 同时在线会话 | RPS(消息) | 月成本(典型) |
|---|---|---|---|
| 0.2 CPU / 256MB | 100 以下 | 10 | $5 以内 |
| 1 CPU / 512MB | 100-1000 | 100 | $10-30 |
| 4 CPU / 2GB | 1000-10000 | 1000 | $50-150 |
主要瓶颈是 LLM 调用并发 + 业务侧 IO。SDK 加密管线 < 5ms / 消息,几乎不构成瓶颈。
监控
至少要 export 的指标:
| 指标 | 类型 | 说明 |
|---|---|---|
hashee.connection.status | gauge | 0=disconnected, 1=connected |
hashee.message.inbound.total | counter | by conversation_type, payload.type |
hashee.message.outbound.total | counter | 同上 |
hashee.decrypt.failure.total | counter | by reason |
hashee.send.duration_ms | histogram | by conversation_type |
business.llm.duration_ms | histogram | by model |
集成:
- Prometheus + Grafana(最常见)
- DataDog / New Relic 等 APM
- OpenTelemetry SDK 自动 instrument
最小 Prometheus exporter:
import promClient from "prom-client";const register = new promClient.Registry();const inboundTotal = new promClient.Counter({ name: "hashee_message_inbound_total", help: "...", labelNames: ["conv_type", "payload_type"],});register.registerMetric(inboundTotal);
agent.addMessageHandler((msg) => { inboundTotal.inc({ conv_type: msg.conversation_type, payload_type: msg.payload?.type ?? "unknown" });});
import http from "node:http";http.createServer(async (req, res) => { if (req.url === "/metrics") { res.setHeader("Content-Type", register.contentType); res.end(await register.metrics()); } else { res.statusCode = 404; res.end(); }}).listen(9090);备份与灾备
- 私钥 → secret manager + cold backup
- 业务侧 Postgres / Redis → 快照 + WAL
- 配置 → git
- 日志 → 集中化(Loki / ELK / CloudWatch)
下一步
- 部署到 Cloudflare Workers
- Hello World — WebSocket — 业务代码
- 错误处理 — 重连、幂等、健康检查