# DChain production deployment Turn-key-ish stack: 3 validators + Caddy TLS edge + optional Prometheus/Grafana, behind auto-HTTPS. ## Prerequisites - Docker + Compose v2 - A public IP and open ports `80`, `443`, `4001` (libp2p) on every host - DNS `A`-record pointing `DOMAIN` at the host running Caddy - Basic familiarity with editing env files ## Layout (single-host pilot) ``` ┌─ Caddy :443 ── TLS terminate ──┬─ node1:8080 ──┐ internet ────────→│ ├─ node2:8080 │ round-robin /api/* └─ Caddy :4001 (passthrough) └─ node3:8080 │ ip_hash /api/ws ... Prometheus → node{1,2,3}:8080/metrics Grafana ← Prometheus data source ``` For a real multi-datacentre deployment, copy this whole directory onto each VPS, edit `docker-compose.yml` to keep only the node that runs there, and put Caddy on one dedicated edge host (or none — point clients at one node directly and accept the lower availability). ## First-boot procedure 1. **Generate keys** for each validator. Easiest way: ```bash # On any box with the repo checked out docker build -t dchain-node-slim -f deploy/prod/Dockerfile.slim . mkdir -p deploy/prod/keys for i in 1 2 3; do docker run --rm -v "$PWD/deploy/prod/keys:/out" dchain-node-slim \ /usr/local/bin/client keygen --out /out/node$i.json done cat deploy/prod/keys/node*.json | jq -r .pub_key # → copy into DCHAIN_VALIDATORS ``` 2. **Configure env files**. Copy `node.env.example` to `node1.env`, `node2.env`, `node3.env`. Paste the three pubkeys from step 1 into `DCHAIN_VALIDATORS` in ALL THREE files. Set `DOMAIN` to your public host. 3. **Start the network**: ```bash DOMAIN=dchain.example.com docker compose up -d docker compose logs -f node1 # watch genesis + first blocks ``` First block is genesis (index 0), created only by `node1` because it has the `--genesis` flag. After you see blocks #1, #2, #3… committing, **edit `docker-compose.yml` and remove the `--genesis` flag from node1's command section**, then `docker compose up -d node1` to re-create it without that flag. Leaving `--genesis` in makes no-op on a non-empty DB but is noise in the logs. 4. **Verify HTTPS** and HTTP-to-HTTPS redirect: ```bash curl -s https://$DOMAIN/api/netstats | jq curl -s https://$DOMAIN/api/well-known-contracts | jq ``` Caddy should have issued a cert automatically from Let's Encrypt. 5. **(Optional) observability**: ```bash GRAFANA_ADMIN_PW=$(openssl rand -hex 24) docker compose --profile monitor up -d # Grafana at http://:3000, user admin, password from env ``` Add a "Prometheus" data source pointing at `http://prometheus:9090`, then import a dashboard that graphs: - `dchain_blocks_total` (rate) - `dchain_tx_submit_accepted_total` / `rejected_total` - `dchain_ws_connections` - `dchain_peer_count_live` - `rate(dchain_block_commit_seconds_sum[5m]) / rate(dchain_block_commit_seconds_count[5m])` ## Common tasks ### Add a 4th validator The new node joins as an observer via `--join`, then an existing validator promotes it on-chain: ```bash # On the new box docker run -d --name node4 \ --volumes chaindata:/data \ -e DCHAIN_ANNOUNCE=/ip4//tcp/4001 \ dchain-node-slim \ --db=/data/chain --join=https://$DOMAIN --register-relay ``` Then from any existing validator: ```bash docker compose exec node1 /usr/local/bin/client add-validator \ --key /keys/node.json \ --node http://localhost:8080 \ --target ``` The new node starts signing as soon as it sees itself in the validator set on-chain — no restart needed. ### Upgrade without downtime PBFT tolerates `f` faulty nodes out of `3f+1`. For 3 validators that means **zero** — any offline node halts consensus. So for 3-node clusters: 1. `docker compose pull && docker compose build` on all three hosts first. 2. Graceful one-at-a-time: `docker compose up -d --no-deps node1`, wait for `/api/netstats` to show it catching up, then do node2, then node3. For 4+ nodes you can afford one-at-a-time hot rolls. ### Back up the chain ```bash docker run --rm -v node1_data:/data -v "$PWD":/bak alpine \ tar czf /bak/dchain-backup-$(date +%F).tar.gz -C /data . ``` Restore by swapping the file back into a fresh named volume before node startup. ### Remove a bad validator Same as adding but with `remove-validator`. Only works if a majority of CURRENT validators cosign the removal — intentional, keeps one rogue validator from kicking others unilaterally (see ROADMAP P2.1). ## Security notes - `/metrics` is firewalled to internal networks by Caddy. If you need external scraping, add proper auth (Caddy `basicauth` or mTLS). - All public endpoints are rate-limited per-IP via the node itself — see `api_guards.go`. Adjust limits before releasing to the open internet. - Each node runs as non-root inside a read-only rootfs container with all capabilities dropped. If you need to exec into one, `docker compose exec --user root nodeN sh`. - The Ed25519 key files mounted at `/keys/node.json` are your validator identities. Losing them means losing your ability to produce blocks; get them onto the host via your normal secret-management (Vault, sealed- secrets, encrypted tarball at deploy time). **Never commit them to git.** ## Troubleshooting | Symptom | Check | |---------|-------| | Caddy keeps issuing `failed to get certificate` | Is port 80 open? DNS A-record pointing here? `docker compose logs caddy` | | New node can't sync: `FATAL: genesis hash mismatch` | The `--db` volume has data from a different chain. `docker volume rm nodeN_data` and re-up | | Chain stops producing blocks | `docker compose logs nodeN \| tail -100`; look for `SLOW AddBlock` or validator silence | | `/api/ws` returns 429 | Client opened > `WSMaxConnectionsPerIP` (default 10). Check `ws.go` for per-IP cap | | Disk usage growing | Background vlog GC runs every 5 min. Manual: `docker compose exec nodeN /bin/sh -c 'kill -USR1 1'` (see `StartValueLogGC`) |