Files

vsecoder f726587ac6 fix(docker): pre-create /data as dchain user so named volumes inherit ownership

Running Dockerfile.slim with a fresh named volume crashed on startup:

  [NODE] open chain: open badger: Error Creating Dir: "/data/chain"
    error: mkdir /data/chain: permission denied

Docker copies the mount-point's directory ownership (from the image)
into a new named volume at first attach. In the previous Dockerfile
/data was created implicitly by the VOLUME directive, which means it
was owned by root — but the container runs as the unprivileged
`dchain` user, so it couldn't `mkdir /data/chain` on first boot.

Fix: explicitly `mkdir /data && chown dchain:dchain /data` in the
same RUN that creates the user, before the VOLUME directive. Fresh
volumes now inherit dchain:dchain ownership automatically; no
operator-side `docker run --user root chown` workaround needed.

Operators already running with a root-owned volume from before this
fix need to chown once manually:

  docker run --rm -v dchain_data:/data --user root alpine \
    sh -c 'chown -R 100:101 /data'

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-18 22:43:31 +03:00

caddy

chore: initial commit for v0.0.1

2026-04-17 14:16:44 +03:00

docker-compose.yml

chore: initial commit for v0.0.1

2026-04-17 14:16:44 +03:00

Dockerfile.slim

fix(docker): pre-create /data as dchain user so named volumes inherit ownership

2026-04-18 22:43:31 +03:00

node.env.example

chore: initial commit for v0.0.1

2026-04-17 14:16:44 +03:00

prometheus.yml

chore: initial commit for v0.0.1

2026-04-17 14:16:44 +03:00

README.md

chore: initial commit for v0.0.1

2026-04-17 14:16:44 +03:00

README.md

DChain production deployment

Turn-key-ish stack: 3 validators + Caddy TLS edge + optional Prometheus/Grafana, behind auto-HTTPS.

Prerequisites

Docker + Compose v2
A public IP and open ports 80, 443, 4001 (libp2p) on every host
DNS A-record pointing DOMAIN at the host running Caddy
Basic familiarity with editing env files

Layout (single-host pilot)

                   ┌─ Caddy :443 ── TLS terminate ──┬─ node1:8080 ──┐
 internet ────────→│                                ├─ node2:8080  │ round-robin /api/*
                   └─ Caddy :4001 (passthrough)     └─ node3:8080  │ ip_hash /api/ws
                   ...
                   Prometheus → node{1,2,3}:8080/metrics
                   Grafana ← Prometheus data source

For a real multi-datacentre deployment, copy this whole directory onto each VPS, edit docker-compose.yml to keep only the node that runs there, and put Caddy on one dedicated edge host (or none — point clients at one node directly and accept the lower availability).

First-boot procedure

Generate keys for each validator. Easiest way:

# On any box with the repo checked out
docker build -t dchain-node-slim -f deploy/prod/Dockerfile.slim .
mkdir -p deploy/prod/keys
for i in 1 2 3; do
  docker run --rm -v "$PWD/deploy/prod/keys:/out" dchain-node-slim \
    /usr/local/bin/client keygen --out /out/node$i.json
done
cat deploy/prod/keys/node*.json | jq -r .pub_key  # → copy into DCHAIN_VALIDATORS

Configure env files. Copy node.env.example to node1.env, node2.env, node3.env. Paste the three pubkeys from step 1 into DCHAIN_VALIDATORS in ALL THREE files. Set DOMAIN to your public host.
Start the network:
```
DOMAIN=dchain.example.com docker compose up -d
docker compose logs -f node1   # watch genesis + first blocks
```
First block is genesis (index 0), created only by node1 because it has the --genesis flag. After you see blocks #1, #2, #3… committing, edit docker-compose.yml and remove the --genesis flag from node1's command section, then docker compose up -d node1 to re-create it without that flag. Leaving --genesis in makes no-op on a non-empty DB but is noise in the logs.
Verify HTTPS and HTTP-to-HTTPS redirect:
```
curl -s https://$DOMAIN/api/netstats | jq
curl -s https://$DOMAIN/api/well-known-contracts | jq
```
Caddy should have issued a cert automatically from Let's Encrypt.
(Optional) observability:
```
GRAFANA_ADMIN_PW=$(openssl rand -hex 24) docker compose --profile monitor up -d
# Grafana at http://<host>:3000, user admin, password from env
```
Add a "Prometheus" data source pointing at http://prometheus:9090, then import a dashboard that graphs:
- dchain_blocks_total (rate)
- dchain_tx_submit_accepted_total / rejected_total
- dchain_ws_connections
- dchain_peer_count_live
- rate(dchain_block_commit_seconds_sum[5m]) / rate(dchain_block_commit_seconds_count[5m])

Common tasks

Add a 4th validator

The new node joins as an observer via --join, then an existing validator promotes it on-chain:

# On the new box
docker run -d --name node4 \
  --volumes chaindata:/data \
  -e DCHAIN_ANNOUNCE=/ip4/<public-ip>/tcp/4001 \
  dchain-node-slim \
  --db=/data/chain --join=https://$DOMAIN --register-relay

Then from any existing validator:

docker compose exec node1 /usr/local/bin/client add-validator \
  --key /keys/node.json \
  --node http://localhost:8080 \
  --target <NEW_PUBKEY>

The new node starts signing as soon as it sees itself in the validator set on-chain — no restart needed.

Upgrade without downtime

PBFT tolerates f faulty nodes out of 3f+1. For 3 validators that means zero — any offline node halts consensus. So for 3-node clusters:

docker compose pull && docker compose build on all three hosts first.
Graceful one-at-a-time: docker compose up -d --no-deps node1, wait for /api/netstats to show it catching up, then do node2, then node3.

For 4+ nodes you can afford one-at-a-time hot rolls.

Back up the chain

docker run --rm -v node1_data:/data -v "$PWD":/bak alpine \
  tar czf /bak/dchain-backup-$(date +%F).tar.gz -C /data .

Restore by swapping the file back into a fresh named volume before node startup.

Remove a bad validator

Same as adding but with remove-validator. Only works if a majority of CURRENT validators cosign the removal — intentional, keeps one rogue validator from kicking others unilaterally (see ROADMAP P2.1).

Security notes

/metrics is firewalled to internal networks by Caddy. If you need external scraping, add proper auth (Caddy basicauth or mTLS).
All public endpoints are rate-limited per-IP via the node itself — see api_guards.go. Adjust limits before releasing to the open internet.
Each node runs as non-root inside a read-only rootfs container with all capabilities dropped. If you need to exec into one, docker compose exec --user root nodeN sh.
The Ed25519 key files mounted at /keys/node.json are your validator identities. Losing them means losing your ability to produce blocks; get them onto the host via your normal secret-management (Vault, sealed- secrets, encrypted tarball at deploy time). Never commit them to git.

Troubleshooting

Symptom	Check
Caddy keeps issuing `failed to get certificate`	Is port 80 open? DNS A-record pointing here? `docker compose logs caddy`
New node can't sync: `FATAL: genesis hash mismatch`	The `--db` volume has data from a different chain. `docker volume rm nodeN_data` and re-up
Chain stops producing blocks	`docker compose logs nodeN \| tail -100`; look for `SLOW AddBlock` or validator silence
`/api/ws` returns 429	Client opened > `WSMaxConnectionsPerIP` (default 10). Check `ws.go` for per-IP cap
Disk usage growing	Background vlog GC runs every 5 min. Manual: `docker compose exec nodeN /bin/sh -c 'kill -USR1 1'` (see `StartValueLogGC`)