跳转到内容

自托管 VPS / 容器

VPS 或自托管容器是 WebSocket 模式 Agent 的典型部署目标——长驻进程 + 完整 6 层 E2EE 栈(含 Ratchet)+ 最低延迟。本页讲 systemd / Docker / Kubernetes 三种部署方式 + 零停机蓝绿发布。

选 systemd 还是 Docker / K8s?

场景推荐
单一服务器 + 自己运维systemd
多服务器 + 容器化栈Docker Compose
多 region + 横向扩容Kubernetes

Systemd 部署

1. 准备 build

Terminal window
git clone <你的 agent 项目>
cd my-agent
npm install
npm run build # 输出到 dist/

2. systemd unit

/etc/systemd/system/my-agent.service

[Unit]
Description=Hashee Agent (DemoBot)
After=network.target
[Service]
Type=simple
User=hashee
Group=hashee
WorkingDirectory=/opt/my-agent
EnvironmentFile=/etc/my-agent/env
ExecStart=/usr/bin/node dist/index.js
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
# 资源限制
MemoryMax=512M
CPUQuota=100%
TasksMax=200
# 安全 hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
ReadWritePaths=/opt/my-agent/data
[Install]
WantedBy=multi-user.target

/etc/my-agent/env(mode 0600,root:hashee):

Terminal window
HASHEE_AGENT_ID=01906abc-...
HASHEE_AGENT_TOKEN=hsk_...
HASHEE_X25519_PRIVATE_BASE64=...
HASHEE_ED25519_PRIVATE_BASE64=...
HASHEE_BASE_URL=https://api.hashee.ai
HASHEE_CONNECTION_MODE=websocket
NODE_ENV=production

3. 启动

Terminal window
sudo systemctl daemon-reload
sudo systemctl enable --now my-agent.service
sudo journalctl -u my-agent -f
# [hashee] connection: connecting
# [hashee] connection: connected
# [hashee] up; waiting for messages...

4. 重启

Terminal window
sudo systemctl restart my-agent.service
# Restart=always 会自动重启

更平滑用零停机蓝绿(见下文)。

Docker 部署

Dockerfile

FROM node:22-alpine AS build
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build
FROM node:22-alpine AS runtime
WORKDIR /app
RUN addgroup -S hashee && adduser -S hashee -G hashee
COPY --from=build --chown=hashee:hashee /app/dist ./dist
COPY --from=build --chown=hashee:hashee /app/node_modules ./node_modules
COPY --from=build --chown=hashee:hashee /app/package.json ./
USER hashee
ENV NODE_ENV=production
ENV HASHEE_BASE_URL=https://api.hashee.ai
ENV HASHEE_CONNECTION_MODE=websocket
CMD ["node", "dist/index.js"]

docker-compose.yml

services:
agent:
build: .
image: my-agent:latest
restart: unless-stopped
environment:
- HASHEE_AGENT_ID=${HASHEE_AGENT_ID}
- HASHEE_AGENT_TOKEN=${HASHEE_AGENT_TOKEN}
- HASHEE_X25519_PRIVATE_BASE64=${HASHEE_X25519_PRIVATE_BASE64}
- HASHEE_ED25519_PRIVATE_BASE64=${HASHEE_ED25519_PRIVATE_BASE64}
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
healthcheck:
test: ["CMD", "node", "-e", "process.exit(0)"] # 简单存活
interval: 30s
timeout: 5s
retries: 3
# 可选:Redis 用于 artifact / 游戏状态
redis:
image: redis:7-alpine
restart: unless-stopped

.env(gitignored):

Terminal window
HASHEE_AGENT_ID=01906abc-...
HASHEE_AGENT_TOKEN=hsk_...
HASHEE_X25519_PRIVATE_BASE64=...
HASHEE_ED25519_PRIVATE_BASE64=...

启动:

Terminal window
docker compose up -d
docker compose logs -f agent

Kubernetes 部署

Deployment + Secret

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-agent
namespace: agents
spec:
replicas: 3 # 利用 Hashee 的 3 并发连接上限做 HA
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: my-agent
template:
metadata:
labels:
app: my-agent
spec:
containers:
- name: agent
image: my-agent:1.0.0
envFrom:
- secretRef:
name: my-agent-secret
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
livenessProbe:
exec:
command: ["node", "-e", "process.exit(0)"]
periodSeconds: 30
terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Secret
metadata:
name: my-agent-secret
namespace: agents
type: Opaque
stringData:
HASHEE_AGENT_ID: "01906abc-..."
HASHEE_AGENT_TOKEN: "hsk_..."
HASHEE_X25519_PRIVATE_BASE64: "..."
HASHEE_ED25519_PRIVATE_BASE64: "..."
HASHEE_BASE_URL: "https://api.hashee.ai"
HASHEE_CONNECTION_MODE: "websocket"

注意:3 个 replica 共享同一组私钥——Hashee 后端按 hash 把会话路由到 不同 replica,各自处理;多 replica 看到的对方公钥一致,加密 / 解密都正常。

部署:

Terminal window
kubectl apply -f deployment.yaml
kubectl logs -n agents -l app=my-agent --tail=50 -f

零停机重启(蓝绿)

利用 Hashee 3 连接并发上限,可以做无感升级:

t=0: 3 个旧版本 replica 在线
t=1: K8s rolling update 创建一个新版 replica
t=2: 新 replica 起来 + WS 连上后端 (4 连接)
t=3: K8s 给一个旧 replica 发 SIGTERM
t=4: 旧 replica 调 agent.close() 优雅断开
t=5: 替换循环,每次只动一个 replica
t=6: 3 个全是新版本

maxSurge: 1, maxUnavailable: 0 保证滚动期间至少有 3 个连接在线。

业务侧需要:

process.on("SIGTERM", async () => {
await agent.close(); // SDK 等本地 send 队列清空 + 优雅断开 WS
process.exit(0);
});

资源预算(基线)

实例规格同时在线会话RPS(消息)月成本(典型)
0.2 CPU / 256MB100 以下10$5 以内
1 CPU / 512MB100-1000100$10-30
4 CPU / 2GB1000-100001000$50-150

主要瓶颈是 LLM 调用并发 + 业务侧 IO。SDK 加密管线 < 5ms / 消息,几乎不构成瓶颈。

监控

至少要 export 的指标:

指标类型说明
hashee.connection.statusgauge0=disconnected, 1=connected
hashee.message.inbound.totalcounterby conversation_type, payload.type
hashee.message.outbound.totalcounter同上
hashee.decrypt.failure.totalcounterby reason
hashee.send.duration_mshistogramby conversation_type
business.llm.duration_mshistogramby model

集成:

  • Prometheus + Grafana(最常见)
  • DataDog / New Relic 等 APM
  • OpenTelemetry SDK 自动 instrument

最小 Prometheus exporter:

import promClient from "prom-client";
const register = new promClient.Registry();
const inboundTotal = new promClient.Counter({
name: "hashee_message_inbound_total",
help: "...",
labelNames: ["conv_type", "payload_type"],
});
register.registerMetric(inboundTotal);
agent.addMessageHandler((msg) => {
inboundTotal.inc({ conv_type: msg.conversation_type, payload_type: msg.payload?.type ?? "unknown" });
});
import http from "node:http";
http.createServer(async (req, res) => {
if (req.url === "/metrics") {
res.setHeader("Content-Type", register.contentType);
res.end(await register.metrics());
} else {
res.statusCode = 404; res.end();
}
}).listen(9090);

备份与灾备

  • 私钥 → secret manager + cold backup
  • 业务侧 Postgres / Redis → 快照 + WAL
  • 配置 → git
  • 日志 → 集中化(Loki / ELK / CloudWatch)

下一步