chore: initial commit for v0.0.1
DChain single-node blockchain + React Native messenger client. Core: - PBFT consensus with multi-sig validator admission + equivocation slashing - BadgerDB + schema migration scaffold (CurrentSchemaVersion=0) - libp2p gossipsub (tx/v1, blocks/v1, relay/v1, version/v1) - Native Go contracts (username_registry) alongside WASM (wazero) - WebSocket gateway with topic-based fanout + Ed25519-nonce auth - Relay mailbox with NaCl envelope encryption (X25519 + Ed25519) - Prometheus /metrics, per-IP rate limit, body-size cap Deployment: - Single-node compose (deploy/single/) with Caddy TLS + optional Prometheus - 3-node dev compose (docker-compose.yml) with mocked internet topology - 3-validator prod compose (deploy/prod/) for federation - Auto-update from Gitea via /api/update-check + systemd timer - Build-time version injection (ldflags → node --version) - UI / Swagger toggle flags (DCHAIN_DISABLE_UI, DCHAIN_DISABLE_SWAGGER) Client (client-app/): - Expo / React Native / NativeWind - E2E NaCl encryption, typing indicator, contact requests - Auto-discovery of canonical contracts, chain_id aware, WS reconnect on node switch Documentation: - README.md, CHANGELOG.md, CONTEXT.md - deploy/single/README.md with 6 operator scenarios - deploy/UPDATE_STRATEGY.md with 4-layer forward-compat design - docs/contracts/*.md per contract
This commit is contained in:
339
deploy/UPDATE_STRATEGY.md
Normal file
339
deploy/UPDATE_STRATEGY.md
Normal file
@@ -0,0 +1,339 @@
|
||||
# DChain node — update & seamless-upgrade strategy
|
||||
|
||||
Этот документ отвечает на два вопроса:
|
||||
|
||||
1. **Как оператор ноды обновляет её от git-сервера** (pull → build → restart) без
|
||||
простоя и без потери данных.
|
||||
2. **Как мы сохраняем бесшовную совместимость** между версиями ноды, чтобы не
|
||||
пришлось "ломать" старых клиентов, чужие ноды или собственную историю.
|
||||
|
||||
Читается в связке с `deploy/single/README.md` (операционный runbook) и
|
||||
`CHANGELOG.md` (что уже зашипплено).
|
||||
|
||||
---
|
||||
|
||||
## 1. Слои, которые надо развести
|
||||
|
||||
| Слой | Что ломает совместимость | Кто страдает | Как закрыто |
|
||||
|---------------------|--------------------------------------------------------------------|-----------------------|----------------------|
|
||||
| **Wire-протокол** | gossipsub topic name, tx encoding, PBFT message format | P2P-сеть целиком | §3. Versioned topics |
|
||||
| **HTTP/WS API** | эндпоинт меняет схему, WS op исчезает | Клиенты (mobile, web) | §4. API versioning |
|
||||
| **Chain state** | новый EventType в блоке, новое поле в TxRecord | Joiner'ы, валидаторы | §5. Chain upgrade |
|
||||
| **Storage layout** | BadgerDB prefix переименован, ключи перемешались | Сам бинарь при старте | §6. DB migrations |
|
||||
| **Docker image** | пересобрать образ, поменять флаги | Только локально | §2. Rolling restart |
|
||||
|
||||
**Главный принцип:** любое изменение проходит как **минимум два релиза** —
|
||||
сначала *"понимаем оба формата, пишем новый"*, потом *"не умеем старый"*.
|
||||
Между ними время, за которое оператор обновляется.
|
||||
|
||||
---
|
||||
|
||||
## 2. Rolling-restart от git-сервера (single-node)
|
||||
|
||||
### 2.1. Скрипт `deploy/single/update.sh`
|
||||
|
||||
Оператор ставит **один cron/systemd-timer** который дергает этот скрипт:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
# deploy/single/update.sh — pull-and-restart update for a single DChain node.
|
||||
# Safe to run unattended: no-op if git HEAD didn't move.
|
||||
set -euo pipefail
|
||||
|
||||
REPO_DIR="${REPO_DIR:-/opt/dchain}"
|
||||
IMAGE_TAG="${IMAGE_TAG:-dchain-node-slim}"
|
||||
CONTAINER="${CONTAINER:-dchain_node}"
|
||||
|
||||
cd "$REPO_DIR"
|
||||
git fetch --quiet origin main
|
||||
local=$(git rev-parse HEAD)
|
||||
remote=$(git rev-parse origin/main)
|
||||
[[ "$local" = "$remote" ]] && { echo "up to date: $local"; exit 0; }
|
||||
|
||||
echo "updating $local → $remote"
|
||||
|
||||
# 1. rebuild
|
||||
docker build --quiet -t "$IMAGE_TAG:$remote" -t "$IMAGE_TAG:latest" \
|
||||
-f deploy/prod/Dockerfile.slim .
|
||||
|
||||
# 2. smoke-test new image BEFORE killing the running one
|
||||
docker run --rm --entrypoint /usr/local/bin/node "$IMAGE_TAG:$remote" --version \
|
||||
>/dev/null || { echo "new image fails smoke test"; exit 1; }
|
||||
|
||||
# 3. checkpoint the DB (cheap cp-on-write snapshot via badger)
|
||||
curl -fs "http://127.0.0.1:8080/api/admin/checkpoint" \
|
||||
-H "Authorization: Bearer $DCHAIN_API_TOKEN" \
|
||||
|| echo "checkpoint failed, continuing anyway"
|
||||
|
||||
# 4. stop-start with the SAME volume + env
|
||||
git log -1 --pretty='update: %h %s' > .last-update
|
||||
docker compose -f deploy/single/docker-compose.yml up -d --force-recreate node
|
||||
|
||||
# 5. wait for health
|
||||
for i in {1..30}; do
|
||||
curl -fsS http://127.0.0.1:8080/api/netstats >/dev/null && { echo ok; exit 0; }
|
||||
sleep 2
|
||||
done
|
||||
echo "new container did not become healthy"
|
||||
docker logs "$CONTAINER" | tail -40
|
||||
exit 1
|
||||
```
|
||||
|
||||
### 2.2. systemd таймер
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/dchain-update.service
|
||||
[Unit]
|
||||
Description=DChain node pull-and-restart
|
||||
[Service]
|
||||
Type=oneshot
|
||||
EnvironmentFile=/opt/dchain/deploy/single/node.env
|
||||
ExecStart=/opt/dchain/deploy/single/update.sh
|
||||
|
||||
# /etc/systemd/system/dchain-update.timer
|
||||
[Unit]
|
||||
Description=Pull DChain updates hourly
|
||||
[Timer]
|
||||
OnCalendar=hourly
|
||||
RandomizedDelaySec=15min
|
||||
Persistent=true
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
```
|
||||
|
||||
`RandomizedDelaySec=15min` — чтобы куча нод на одной сети не перезапускалась
|
||||
одновременно, иначе на момент обновления PBFT quorum может упасть.
|
||||
|
||||
### 2.3. Даунтайм одной ноды
|
||||
|
||||
| Шаг | Время | Можно ли слать tx? |
|
||||
|-------------------|-------|--------------------|
|
||||
| docker build | 30-90s| да (старая ещё работает) |
|
||||
| docker compose up | 2-5s | нет (transition) |
|
||||
| DB open + replay | 1-3s | нет |
|
||||
| healthy | — | да |
|
||||
|
||||
**Итого ~5-8 секунд простоя на одну ноду.** Клиент (React Native) уже
|
||||
реконнектится по WS автоматически (см. `client-app/lib/ws.ts` — retry loop,
|
||||
max 30s backoff).
|
||||
|
||||
### 2.4. Multi-node rolling (для будущего кластера)
|
||||
|
||||
Когда появится 3+ валидаторов: update скрипт должен обновлять **по одному** с
|
||||
паузой между ними больше, чем health-check interval. В `deploy/prod/` есть
|
||||
`docker-compose.yml` с тремя нодами — там эквивалент выглядит как:
|
||||
|
||||
```bash
|
||||
for n in node1 node2 node3; do
|
||||
docker compose up -d --force-recreate "$n"
|
||||
for i in {1..30}; do
|
||||
curl -fs "http://127.0.0.1:808${n: -1}/api/netstats" >/dev/null && break
|
||||
sleep 2
|
||||
done
|
||||
done
|
||||
```
|
||||
|
||||
Пока в сети 2/3 валидатора живы, PBFT quorum не падает и блоки продолжают
|
||||
коммититься. Единственная нода, которая обновляется, пропустит 1-2 блока и
|
||||
догонит их через gossip gap-fill (уже работает, см. `p2p/host.go` → GetBlocks).
|
||||
|
||||
---
|
||||
|
||||
## 3. Wire-протокол: versioned topics
|
||||
|
||||
Текущие gossipsub-топики:
|
||||
|
||||
```
|
||||
dchain/tx/v1
|
||||
dchain/blocks/v1
|
||||
dchain/relay/v1
|
||||
```
|
||||
|
||||
`/v1` суффикс — это не формальность, это **рельса под миграцию**. Когда
|
||||
появится несовместимое изменение (напр. новый PBFT round format):
|
||||
|
||||
1. Релиз N: нода подписана на ОБА топика `dchain/blocks/v1` и `dchain/blocks/v2`.
|
||||
Публикует в v2, читает из обоих.
|
||||
2. Релиз N+1 (после того, как оператор видит в `/api/netstats` что все 100%
|
||||
пиров ≥ N): нода перестаёт читать v1.
|
||||
3. Релиз N+2: v1 удаляется из кода.
|
||||
|
||||
Между N и N+2 должно пройти **минимум 30 дней**. За это время у каждого оператора
|
||||
хоть раз сработает auto-update.
|
||||
|
||||
---
|
||||
|
||||
## 4. API versioning
|
||||
|
||||
### Уже есть:
|
||||
|
||||
- `/api/*` — v1, "explorer API", stable contract
|
||||
- `/v2/chain/*` — специальная секция для tonapi-подобных клиентов (tonapi совместимость)
|
||||
|
||||
### Правило на будущее:
|
||||
|
||||
1. **Только добавляем поля** к существующим ответам. JSON-клиент не падает от
|
||||
незнакомого поля — Go unmarshal игнорирует, TypeScript через `unknown` каст
|
||||
тоже. Никогда не переименовываем и не удаляем.
|
||||
2. Если нужна breaking-change — новый префикс. Например, если CreateChannelPayload
|
||||
меняет формат, появляется `/v2/channels/*`. Старый `/api/channels/*` сохраняется
|
||||
как read-only adapter поверх нового стораджа.
|
||||
3. **Deprecation header:** когда старый эндпоинт переведён на адаптер, добавить
|
||||
`Warning: 299 - "use /v2/channels/* instead, this will be removed 2026-06-01"`.
|
||||
4. **Клиент сам определяет версию** через `/api/well-known-version`:
|
||||
```json
|
||||
{ "node_version": "0.5.0", "protocol_version": 3, "features": ["channels_v1", "fan_out"] }
|
||||
```
|
||||
Клиент в `client-app/lib/api.ts` кеширует ответ и знает, что можно звать.
|
||||
Уже есть `/api/well-known-contracts` как прецедент; `/api/well-known-version`
|
||||
добавляется одной функцией.
|
||||
|
||||
### Клиентская сторона — graceful degradation:
|
||||
|
||||
- WebSocket: если op `submit_tx` вернул `{error: "unknown_op"}`, fallback на
|
||||
HTTP POST /api/tx.
|
||||
- HTTP: fetch'и обёрнуты в try/catch в `api.ts`, 404 на новом эндпоинте →
|
||||
скрыть фичу в UI (feature-flag), не падать.
|
||||
- **Chain-ID check:** уже есть (`client-app/lib/api.ts` → `networkInfo()`),
|
||||
если нода сменила chain_id — клиент очищает кеш и пересинкается.
|
||||
|
||||
---
|
||||
|
||||
## 5. Chain state upgrade
|
||||
|
||||
Самый болезненный слой: если блок N+1 содержит EventType, который старая нода
|
||||
не умеет обрабатывать, она **отклонит** весь блок и отвалится от консенсуса.
|
||||
|
||||
### 5.1. Strict forward-compatibility правила для EventType
|
||||
|
||||
```go
|
||||
// ApplyTx в blockchain/chain.go
|
||||
switch ev.Type {
|
||||
case EventTransfer: ...
|
||||
case EventRegisterRelay: ...
|
||||
case EventCreateChannel: ...
|
||||
// ...
|
||||
case EventFutureFeatureWeDontHave:
|
||||
// ← НЕ возвращать error! ЭТО крашнет валидатор на своём же блоке
|
||||
// иначе.
|
||||
// Правило: неизвестный event type === no-op + warn. Tx включается в блок,
|
||||
// fee списывается, результат = ничего не изменилось.
|
||||
chain.log.Warn("unknown event type", "type", ev.Type, "tx", tx.ID)
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
**Проверить:** сейчас `ApplyTx` в `blockchain/chain.go` падает на unknown event.
|
||||
Это приоритетный fix для seamless — добавить в план.
|
||||
|
||||
### 5.2. Feature activation flags
|
||||
|
||||
Новый EventType добавляется в два этапа:
|
||||
|
||||
1. **Release A:** бинарь умеет `EventChannelBan`, но **не пропускает** его в
|
||||
мемпул, пока не увидит в chain state запись `feature:channel_ban:enabled`.
|
||||
Эту запись создаёт одна "activation tx" от валидаторов (multi-sig).
|
||||
2. **Release B (через 30+ дней):** операторы, у которых автопуллится, получили
|
||||
Release A. Один валидатор подаёт activation tx — она пишет в state, все
|
||||
остальные validate её, ОК.
|
||||
3. С этого момента `EventChannelBan` легален. Старые ноды (кто не обновился)
|
||||
отклонят activation tx → отвалятся от консенсуса. Это сознательно: они и
|
||||
так не понимают новый event, лучше явная ошибка "обновись", чем silent
|
||||
divergence.
|
||||
|
||||
Прототип в `blockchain/types.go` уже есть — `chain.GovernanceContract` может
|
||||
хранить feature flags. Нужен конкретный helper `chain.FeatureEnabled(name)`.
|
||||
|
||||
### 5.3. Genesis hash pin
|
||||
|
||||
Новая нода при `--join` скачивает `/api/network-info`, читает `genesis_hash`,
|
||||
сравнивает со своим (пустым, т.к. чистый старт). Если в сети уже есть другой
|
||||
genesis — ошибка `FATAL: genesis hash mismatch`. Это защита от случайного
|
||||
фарка при опечатке в `DCHAIN_JOIN`. Работает сейчас, не трогать.
|
||||
|
||||
---
|
||||
|
||||
## 6. DB migrations (BadgerDB)
|
||||
|
||||
Правила работы с префиксами:
|
||||
|
||||
```go
|
||||
const (
|
||||
prefixTx = "tx:"
|
||||
prefixChannel = "chan:"
|
||||
prefixSchemaVer = "schema:v" // ← meta-ключ, хранит текущую версию схемы
|
||||
)
|
||||
```
|
||||
|
||||
При старте:
|
||||
|
||||
```go
|
||||
cur := chain.ReadSchemaVersion() // default 0 если ключ отсутствует
|
||||
for cur < TargetSchemaVersion {
|
||||
switch cur {
|
||||
case 0:
|
||||
// migration 0→1: rename prefix "member:" → "chan_mem:"
|
||||
migrate_0_to_1(db)
|
||||
case 1:
|
||||
// migration 1→2: add x25519_pub column to IdentityInfo
|
||||
migrate_1_to_2(db)
|
||||
}
|
||||
cur++
|
||||
chain.WriteSchemaVersion(cur)
|
||||
}
|
||||
```
|
||||
|
||||
**Свойства миграции:**
|
||||
|
||||
- Идемпотентна: если упала посередине, повторный старт доделает.
|
||||
- Однонаправлена: downgrade → надо восстанавливать из backup. Это OK, документируется.
|
||||
- Бэкап перед миграцией: `update.sh` из §2.1 делает `/api/admin/checkpoint` до
|
||||
перезапуска. (Этот endpoint надо ещё реализовать — сейчас его нет.)
|
||||
- Первая миграция, которую надо сделать — завести сам mechanism, даже если
|
||||
`TargetSchemaVersion = 0`. Чтобы следующая breaking-change могла им
|
||||
воспользоваться.
|
||||
|
||||
---
|
||||
|
||||
## 7. Что сделать сейчас, чтобы "не пришлось ничего ломать в будущем"
|
||||
|
||||
Минимальный чек-лист, **отсортирован по приоритету**:
|
||||
|
||||
### P0 (до следующего release):
|
||||
|
||||
- [ ] `ApplyTx`: unknown EventType → warn + no-op, НЕ error. (§5.1)
|
||||
- [ ] `/api/well-known-version` endpoint (§4). Тривиально, 20 строк.
|
||||
- [ ] Schema version meta-ключ в BadgerDB, даже если `current = 0`. (§6)
|
||||
- [ ] `deploy/single/update.sh` + systemd timer примеры. (§2)
|
||||
|
||||
### P1 (до 1.0):
|
||||
|
||||
- [ ] `chain.FeatureEnabled(name)` helper + документ activation flow. (§5.2)
|
||||
- [ ] `/api/admin/checkpoint` endpoint (за token-guard), делает `db.Flatten` +
|
||||
создаёт snapshot в `/data/snapshots/<timestamp>/`. (§2.1)
|
||||
- [ ] Deprecation-header механизм в HTTP middleware. (§4)
|
||||
- [ ] CI smoke-test: "новый бинарь поверх старого volume" — проверяет что
|
||||
миграции не ломают данные.
|
||||
|
||||
### P2 (nice-to-have):
|
||||
|
||||
- [ ] Multi-version e2e test в `cmd/loadtest`: два процесса на разных HEAD,
|
||||
убедиться что они в консенсусе.
|
||||
- [ ] `go-blockchain/pkg/migrate/` отдельный пакет с registry migrations.
|
||||
|
||||
---
|
||||
|
||||
## 8. Короткий ответ на вопрос
|
||||
|
||||
> надо подумать на счёт синхронизации и обновления ноды с гит сервера, а так
|
||||
> же бесшовности, чтобы не пришлось ничего ломать в будущем
|
||||
|
||||
1. **Синхронизация с git:** `deploy/single/update.sh` + systemd timer раз в час,
|
||||
~5-8 секунд даунтайма на single-node.
|
||||
2. **Бесшовность:** 4 слоя, каждый со своим правилом расширения без
|
||||
ломания — versioned topics, additive-only API, feature-flag activation
|
||||
для новых EventType, schema-versioned БД.
|
||||
3. **P0-тикеты выше** (4 штуки, маленькие) закрывают "семплинг worst case":
|
||||
unknown event как no-op, version endpoint, schema-version key в БД,
|
||||
update-скрипт. Этого достаточно чтобы следующие 3-5 релизов прошли без
|
||||
breaking-change.
|
||||
63
deploy/prod/Dockerfile.slim
Normal file
63
deploy/prod/Dockerfile.slim
Normal file
@@ -0,0 +1,63 @@
|
||||
# Production image for dchain-node.
|
||||
#
|
||||
# Differs from the repo-root Dockerfile in two ways:
|
||||
# 1. No testdata / contract WASMs baked in — a fresh node uses the native
|
||||
# username_registry (shipped in-binary) and starts with an empty keys
|
||||
# directory; identities and optional WASM contracts come in via
|
||||
# mounted volumes or docker-compose bind mounts.
|
||||
# 2. Builds only `node` and `client` — no wallet/peerid helpers that
|
||||
# aren't needed in production.
|
||||
#
|
||||
# The resulting image is ~20 MB vs ~60 MB for the dev one, and has no
|
||||
# pre-installed keys that an attacker could exploit to impersonate a
|
||||
# testnet validator.
|
||||
|
||||
# ---- build stage ----
|
||||
FROM golang:1.24-alpine AS builder
|
||||
WORKDIR /app
|
||||
|
||||
COPY go.mod go.sum ./
|
||||
RUN go mod download
|
||||
|
||||
COPY . .
|
||||
|
||||
# Build-time version metadata. All four args are injected via -ldflags -X
|
||||
# into go-blockchain/node/version so `node --version` and
|
||||
# /api/well-known-version report the real commit, not the "dev" default.
|
||||
# Callers pass these with `docker build --build-arg VERSION_TAG=... …`;
|
||||
# the deploy/single/update.sh script derives them from git automatically.
|
||||
ARG VERSION_TAG=dev
|
||||
ARG VERSION_COMMIT=none
|
||||
ARG VERSION_DATE=unknown
|
||||
ARG VERSION_DIRTY=false
|
||||
|
||||
RUN LDFLAGS="-s -w \
|
||||
-X go-blockchain/node/version.Tag=${VERSION_TAG} \
|
||||
-X go-blockchain/node/version.Commit=${VERSION_COMMIT} \
|
||||
-X go-blockchain/node/version.Date=${VERSION_DATE} \
|
||||
-X go-blockchain/node/version.Dirty=${VERSION_DIRTY}" && \
|
||||
CGO_ENABLED=0 GOOS=linux go build -trimpath -ldflags="$LDFLAGS" -o /bin/node ./cmd/node && \
|
||||
CGO_ENABLED=0 GOOS=linux go build -trimpath -ldflags="$LDFLAGS" -o /bin/client ./cmd/client
|
||||
|
||||
# ---- runtime stage ----
|
||||
FROM alpine:3.19
|
||||
|
||||
RUN apk add --no-cache ca-certificates tzdata
|
||||
|
||||
# Run as unprivileged user by default. Operators can override with --user root
|
||||
# if they need to bind privileged ports (shouldn't be necessary behind Caddy).
|
||||
RUN addgroup -S dchain && adduser -S -G dchain dchain
|
||||
|
||||
COPY --from=builder /bin/node /usr/local/bin/node
|
||||
COPY --from=builder /bin/client /usr/local/bin/client
|
||||
|
||||
USER dchain
|
||||
|
||||
# Default data location; override in compose with a named volume.
|
||||
VOLUME /data
|
||||
|
||||
# libp2p P2P port + HTTP (serves /api/*, /metrics, /api/ws).
|
||||
EXPOSE 4001/tcp
|
||||
EXPOSE 8080/tcp
|
||||
|
||||
ENTRYPOINT ["/usr/local/bin/node"]
|
||||
163
deploy/prod/README.md
Normal file
163
deploy/prod/README.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# DChain production deployment
|
||||
|
||||
Turn-key-ish stack: 3 validators + Caddy TLS edge + optional
|
||||
Prometheus/Grafana, behind auto-HTTPS.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker + Compose v2
|
||||
- A public IP and open ports `80`, `443`, `4001` (libp2p) on every host
|
||||
- DNS `A`-record pointing `DOMAIN` at the host running Caddy
|
||||
- Basic familiarity with editing env files
|
||||
|
||||
## Layout (single-host pilot)
|
||||
|
||||
```
|
||||
┌─ Caddy :443 ── TLS terminate ──┬─ node1:8080 ──┐
|
||||
internet ────────→│ ├─ node2:8080 │ round-robin /api/*
|
||||
└─ Caddy :4001 (passthrough) └─ node3:8080 │ ip_hash /api/ws
|
||||
...
|
||||
Prometheus → node{1,2,3}:8080/metrics
|
||||
Grafana ← Prometheus data source
|
||||
```
|
||||
|
||||
For a real multi-datacentre deployment, copy this whole directory onto each
|
||||
VPS, edit `docker-compose.yml` to keep only the node that runs there, and
|
||||
put Caddy on one dedicated edge host (or none — point clients at one node
|
||||
directly and accept the lower availability).
|
||||
|
||||
## First-boot procedure
|
||||
|
||||
1. **Generate keys** for each validator. Easiest way:
|
||||
|
||||
```bash
|
||||
# On any box with the repo checked out
|
||||
docker build -t dchain-node-slim -f deploy/prod/Dockerfile.slim .
|
||||
mkdir -p deploy/prod/keys
|
||||
for i in 1 2 3; do
|
||||
docker run --rm -v "$PWD/deploy/prod/keys:/out" dchain-node-slim \
|
||||
/usr/local/bin/client keygen --out /out/node$i.json
|
||||
done
|
||||
cat deploy/prod/keys/node*.json | jq -r .pub_key # → copy into DCHAIN_VALIDATORS
|
||||
```
|
||||
|
||||
2. **Configure env files**. Copy `node.env.example` to `node1.env`,
|
||||
`node2.env`, `node3.env`. Paste the three pubkeys from step 1 into
|
||||
`DCHAIN_VALIDATORS` in ALL THREE files. Set `DOMAIN` to your public host.
|
||||
|
||||
3. **Start the network**:
|
||||
|
||||
```bash
|
||||
DOMAIN=dchain.example.com docker compose up -d
|
||||
docker compose logs -f node1 # watch genesis + first blocks
|
||||
```
|
||||
|
||||
First block is genesis (index 0), created only by `node1` because it has
|
||||
the `--genesis` flag. After you see blocks #1, #2, #3… committing,
|
||||
**edit `docker-compose.yml` and remove the `--genesis` flag from node1's
|
||||
command section**, then `docker compose up -d node1` to re-create it
|
||||
without that flag. Leaving `--genesis` in makes no-op on a non-empty DB
|
||||
but is noise in the logs.
|
||||
|
||||
4. **Verify HTTPS** and HTTP-to-HTTPS redirect:
|
||||
|
||||
```bash
|
||||
curl -s https://$DOMAIN/api/netstats | jq
|
||||
curl -s https://$DOMAIN/api/well-known-contracts | jq
|
||||
```
|
||||
|
||||
Caddy should have issued a cert automatically from Let's Encrypt.
|
||||
|
||||
5. **(Optional) observability**:
|
||||
|
||||
```bash
|
||||
GRAFANA_ADMIN_PW=$(openssl rand -hex 24) docker compose --profile monitor up -d
|
||||
# Grafana at http://<host>:3000, user admin, password from env
|
||||
```
|
||||
|
||||
Add a "Prometheus" data source pointing at `http://prometheus:9090`,
|
||||
then import a dashboard that graphs:
|
||||
- `dchain_blocks_total` (rate)
|
||||
- `dchain_tx_submit_accepted_total` / `rejected_total`
|
||||
- `dchain_ws_connections`
|
||||
- `dchain_peer_count_live`
|
||||
- `rate(dchain_block_commit_seconds_sum[5m]) / rate(dchain_block_commit_seconds_count[5m])`
|
||||
|
||||
## Common tasks
|
||||
|
||||
### Add a 4th validator
|
||||
|
||||
The new node joins as an observer via `--join`, then an existing validator
|
||||
promotes it on-chain:
|
||||
|
||||
```bash
|
||||
# On the new box
|
||||
docker run -d --name node4 \
|
||||
--volumes chaindata:/data \
|
||||
-e DCHAIN_ANNOUNCE=/ip4/<public-ip>/tcp/4001 \
|
||||
dchain-node-slim \
|
||||
--db=/data/chain --join=https://$DOMAIN --register-relay
|
||||
```
|
||||
|
||||
Then from any existing validator:
|
||||
|
||||
```bash
|
||||
docker compose exec node1 /usr/local/bin/client add-validator \
|
||||
--key /keys/node.json \
|
||||
--node http://localhost:8080 \
|
||||
--target <NEW_PUBKEY>
|
||||
```
|
||||
|
||||
The new node starts signing as soon as it sees itself in the validator set
|
||||
on-chain — no restart needed.
|
||||
|
||||
### Upgrade without downtime
|
||||
|
||||
PBFT tolerates `f` faulty nodes out of `3f+1`. For 3 validators that means
|
||||
**zero** — any offline node halts consensus. So for 3-node clusters:
|
||||
|
||||
1. `docker compose pull && docker compose build` on all three hosts first.
|
||||
2. Graceful one-at-a-time: `docker compose up -d --no-deps node1`, wait for
|
||||
`/api/netstats` to show it catching up, then do node2, then node3.
|
||||
|
||||
For 4+ nodes you can afford one-at-a-time hot rolls.
|
||||
|
||||
### Back up the chain
|
||||
|
||||
```bash
|
||||
docker run --rm -v node1_data:/data -v "$PWD":/bak alpine \
|
||||
tar czf /bak/dchain-backup-$(date +%F).tar.gz -C /data .
|
||||
```
|
||||
|
||||
Restore by swapping the file back into a fresh named volume before node
|
||||
startup.
|
||||
|
||||
### Remove a bad validator
|
||||
|
||||
Same as adding but with `remove-validator`. Only works if a majority of
|
||||
CURRENT validators cosign the removal — intentional, keeps one rogue
|
||||
validator from kicking others unilaterally (see ROADMAP P2.1).
|
||||
|
||||
## Security notes
|
||||
|
||||
- `/metrics` is firewalled to internal networks by Caddy. If you need
|
||||
external scraping, add proper auth (Caddy `basicauth` or mTLS).
|
||||
- All public endpoints are rate-limited per-IP via the node itself — see
|
||||
`api_guards.go`. Adjust limits before releasing to the open internet.
|
||||
- Each node runs as non-root inside a read-only rootfs container with all
|
||||
capabilities dropped. If you need to exec into one, `docker compose exec
|
||||
--user root nodeN sh`.
|
||||
- The Ed25519 key files mounted at `/keys/node.json` are your validator
|
||||
identities. Losing them means losing your ability to produce blocks; get
|
||||
them onto the host via your normal secret-management (Vault, sealed-
|
||||
secrets, encrypted tarball at deploy time). **Never commit them to git.**
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Check |
|
||||
|---------|-------|
|
||||
| Caddy keeps issuing `failed to get certificate` | Is port 80 open? DNS A-record pointing here? `docker compose logs caddy` |
|
||||
| New node can't sync: `FATAL: genesis hash mismatch` | The `--db` volume has data from a different chain. `docker volume rm nodeN_data` and re-up |
|
||||
| Chain stops producing blocks | `docker compose logs nodeN \| tail -100`; look for `SLOW AddBlock` or validator silence |
|
||||
| `/api/ws` returns 429 | Client opened > `WSMaxConnectionsPerIP` (default 10). Check `ws.go` for per-IP cap |
|
||||
| Disk usage growing | Background vlog GC runs every 5 min. Manual: `docker compose exec nodeN /bin/sh -c 'kill -USR1 1'` (see `StartValueLogGC`) |
|
||||
88
deploy/prod/caddy/Caddyfile
Normal file
88
deploy/prod/caddy/Caddyfile
Normal file
@@ -0,0 +1,88 @@
|
||||
# Caddy configuration for DChain prod.
|
||||
#
|
||||
# What this does:
|
||||
# 1. Auto-HTTPS via Let's Encrypt (requires the DOMAIN envvar and
|
||||
# a DNS A-record pointing at this host).
|
||||
# 2. Round-robins HTTP /api/* across the three node backends. GETs are
|
||||
# idempotent so round-robin is safe; POST /api/tx is accepted by any
|
||||
# validator and gossiped to the rest — no stickiness needed.
|
||||
# 3. Routes /api/ws (WebSocket upgrade) through with header
|
||||
# preservation. Uses ip_hash (lb_policy client_ip) so one client
|
||||
# sticks to one node — avoids re-doing the auth handshake on every
|
||||
# subscribe.
|
||||
# 4. Serves /metrics ONLY from localhost IPs so the Prometheus inside
|
||||
# the stack can scrape it; public scrapers are refused.
|
||||
#
|
||||
# To use:
|
||||
# - Set environment var DOMAIN before `docker compose up`:
|
||||
# DOMAIN=dchain.example.com docker compose up -d
|
||||
# - DNS must resolve DOMAIN → this host's public IP.
|
||||
# - Port 80 must be reachable for ACME HTTP-01 challenge.
|
||||
{
|
||||
# Global options. `auto_https` is on by default — leave it alone.
|
||||
email {$ACME_EMAIL:admin@example.com}
|
||||
servers {
|
||||
# Enable HTTP/3 for mobile clients.
|
||||
protocols h1 h2 h3
|
||||
}
|
||||
}
|
||||
|
||||
# ── Public endpoint ────────────────────────────────────────────────────────
|
||||
{$DOMAIN:localhost} {
|
||||
# Compression for JSON / HTML responses.
|
||||
encode zstd gzip
|
||||
|
||||
# ── WebSocket ──────────────────────────────────────────────────────
|
||||
# Client-IP stickiness so reconnects land on the same node. This keeps
|
||||
# per-subscription state local and avoids replaying every auth+subscribe
|
||||
# to a cold node.
|
||||
@ws path /api/ws
|
||||
handle @ws {
|
||||
reverse_proxy node1:8080 node2:8080 node3:8080 {
|
||||
lb_policy ip_hash
|
||||
# Health-check filters dead nodes out of the pool automatically.
|
||||
health_uri /api/netstats
|
||||
health_interval 15s
|
||||
# Upgrade headers preserved by Caddy by default for WS path; no
|
||||
# extra config needed.
|
||||
}
|
||||
}
|
||||
|
||||
# ── REST API ──────────────────────────────────────────────────────
|
||||
handle /api/* {
|
||||
reverse_proxy node1:8080 node2:8080 node3:8080 {
|
||||
lb_policy least_conn
|
||||
health_uri /api/netstats
|
||||
health_interval 15s
|
||||
# Soft fail open: if no node is healthy, return a clear 503.
|
||||
fail_duration 30s
|
||||
}
|
||||
}
|
||||
|
||||
# ── /metrics — internal only ──────────────────────────────────────
|
||||
# Refuse external scraping of Prometheus metrics. Inside the Docker
|
||||
# network Prometheus hits node1:8080/metrics directly, bypassing Caddy.
|
||||
@metricsPublic {
|
||||
path /metrics
|
||||
not remote_ip 127.0.0.1 ::1 172.16.0.0/12 192.168.0.0/16 10.0.0.0/8
|
||||
}
|
||||
handle @metricsPublic {
|
||||
respond "forbidden" 403
|
||||
}
|
||||
|
||||
# ── Everything else → explorer HTML ───────────────────────────────
|
||||
handle {
|
||||
reverse_proxy node1:8080 {
|
||||
health_uri /api/netstats
|
||||
health_interval 15s
|
||||
}
|
||||
}
|
||||
|
||||
# Server-side logging; write JSON for easy log aggregation.
|
||||
log {
|
||||
output stdout
|
||||
format json
|
||||
level INFO
|
||||
}
|
||||
}
|
||||
}
|
||||
175
deploy/prod/docker-compose.yml
Normal file
175
deploy/prod/docker-compose.yml
Normal file
@@ -0,0 +1,175 @@
|
||||
name: dchain-prod
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════
|
||||
# DChain production stack.
|
||||
#
|
||||
# Layout:
|
||||
# - 3 validator nodes, each with its own persistent volume and key file
|
||||
# - Caddy reverse proxy on the edge: auto-HTTPS from Let's Encrypt,
|
||||
# rewrites ws upgrades, round-robins /api/* across nodes
|
||||
# - Prometheus + Grafana for observability (optional, profile=monitor)
|
||||
#
|
||||
# Quick start (1-host single-server):
|
||||
# cp node.env.example node1.env # edit domain / pubkeys
|
||||
# cp node.env.example node2.env
|
||||
# cp node.env.example node3.env
|
||||
# docker compose up -d # runs nodes + Caddy
|
||||
# docker compose --profile monitor up -d # adds Prometheus + Grafana
|
||||
#
|
||||
# For multi-host (the realistic case), copy this file per VPS and remove
|
||||
# the two nodes that aren't yours; Caddy can still live on one of them or
|
||||
# on a dedicated edge box. Operators are expected to edit this file —
|
||||
# it's a reference, not a magic turnkey.
|
||||
#
|
||||
# Key files:
|
||||
# ./keys/node{1,2,3}.json — Ed25519 identity, bake in via bind mount
|
||||
# ./caddy/Caddyfile — auto-HTTPS config
|
||||
# ./node.env.example — ENV template
|
||||
# ./prometheus.yml — scrape config
|
||||
# ══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
networks:
|
||||
internet:
|
||||
name: dchain_internet
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
node1_data:
|
||||
node2_data:
|
||||
node3_data:
|
||||
caddy_data:
|
||||
caddy_config:
|
||||
prom_data:
|
||||
grafana_data:
|
||||
|
||||
x-node-base: &node-base
|
||||
build:
|
||||
context: ../..
|
||||
dockerfile: deploy/prod/Dockerfile.slim
|
||||
restart: unless-stopped
|
||||
networks: [internet]
|
||||
# Drop all Linux capabilities — the node binary needs none.
|
||||
cap_drop: [ALL]
|
||||
# Read-only root FS; only /data is writable (volume-mounted).
|
||||
read_only: true
|
||||
tmpfs: [/tmp]
|
||||
security_opt: [no-new-privileges:true]
|
||||
# Health check hits /api/netstats through the local HTTP server.
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "wget -qO- http://127.0.0.1:8080/api/netstats >/dev/null || exit 1"]
|
||||
interval: 10s
|
||||
timeout: 3s
|
||||
retries: 6
|
||||
start_period: 15s
|
||||
|
||||
services:
|
||||
node1:
|
||||
<<: *node-base
|
||||
container_name: dchain_node1
|
||||
hostname: node1
|
||||
env_file: ./node1.env
|
||||
volumes:
|
||||
- node1_data:/data
|
||||
- ./keys/node1.json:/keys/node.json:ro
|
||||
command:
|
||||
- "--genesis" # drop --genesis after first boot
|
||||
- "--db=/data/chain"
|
||||
- "--mailbox-db=/data/mailbox"
|
||||
- "--key=/keys/node.json"
|
||||
- "--relay-key=/data/relay.json"
|
||||
- "--listen=/ip4/0.0.0.0/tcp/4001"
|
||||
- "--stats-addr=:8080"
|
||||
- "--heartbeat=true"
|
||||
- "--register-relay"
|
||||
|
||||
node2:
|
||||
<<: *node-base
|
||||
container_name: dchain_node2
|
||||
hostname: node2
|
||||
env_file: ./node2.env
|
||||
depends_on:
|
||||
node1: { condition: service_healthy }
|
||||
volumes:
|
||||
- node2_data:/data
|
||||
- ./keys/node2.json:/keys/node.json:ro
|
||||
command:
|
||||
- "--db=/data/chain"
|
||||
- "--mailbox-db=/data/mailbox"
|
||||
- "--key=/keys/node.json"
|
||||
- "--relay-key=/data/relay.json"
|
||||
- "--listen=/ip4/0.0.0.0/tcp/4001"
|
||||
- "--stats-addr=:8080"
|
||||
- "--join=http://node1:8080" # bootstrap from node1
|
||||
- "--register-relay"
|
||||
|
||||
node3:
|
||||
<<: *node-base
|
||||
container_name: dchain_node3
|
||||
hostname: node3
|
||||
env_file: ./node3.env
|
||||
depends_on:
|
||||
node1: { condition: service_healthy }
|
||||
volumes:
|
||||
- node3_data:/data
|
||||
- ./keys/node3.json:/keys/node.json:ro
|
||||
command:
|
||||
- "--db=/data/chain"
|
||||
- "--mailbox-db=/data/mailbox"
|
||||
- "--key=/keys/node.json"
|
||||
- "--relay-key=/data/relay.json"
|
||||
- "--listen=/ip4/0.0.0.0/tcp/4001"
|
||||
- "--stats-addr=:8080"
|
||||
- "--join=http://node1:8080"
|
||||
- "--register-relay"
|
||||
|
||||
# ── Edge: Caddy with auto-HTTPS + WS upgrade + load-balancing ────────────
|
||||
caddy:
|
||||
image: caddy:2.8-alpine
|
||||
container_name: dchain_caddy
|
||||
restart: unless-stopped
|
||||
networks: [internet]
|
||||
ports:
|
||||
- "80:80"
|
||||
- "443:443"
|
||||
- "443:443/udp" # HTTP/3 / QUIC
|
||||
volumes:
|
||||
- ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
|
||||
- caddy_data:/data
|
||||
- caddy_config:/config
|
||||
depends_on:
|
||||
node1: { condition: service_healthy }
|
||||
|
||||
# ── Observability ────────────────────────────────────────────────────────
|
||||
# Start these only when needed: `docker compose --profile monitor up -d`
|
||||
|
||||
prometheus:
|
||||
profiles: [monitor]
|
||||
image: prom/prometheus:v2.53.0
|
||||
container_name: dchain_prometheus
|
||||
restart: unless-stopped
|
||||
networks: [internet]
|
||||
volumes:
|
||||
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
- prom_data:/prometheus
|
||||
command:
|
||||
- "--config.file=/etc/prometheus/prometheus.yml"
|
||||
- "--storage.tsdb.retention.time=30d"
|
||||
# No external port — exposed only to Grafana via internal network.
|
||||
|
||||
grafana:
|
||||
profiles: [monitor]
|
||||
image: grafana/grafana:11.1.0
|
||||
container_name: dchain_grafana
|
||||
restart: unless-stopped
|
||||
networks: [internet]
|
||||
ports:
|
||||
- "3000:3000"
|
||||
depends_on: [prometheus]
|
||||
environment:
|
||||
GF_SECURITY_ADMIN_USER: admin
|
||||
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PW:-change-me}
|
||||
GF_USERS_ALLOW_SIGN_UP: "false"
|
||||
volumes:
|
||||
- grafana_data:/var/lib/grafana
|
||||
- ./grafana/datasources:/etc/grafana/provisioning/datasources:ro
|
||||
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
|
||||
36
deploy/prod/node.env.example
Normal file
36
deploy/prod/node.env.example
Normal file
@@ -0,0 +1,36 @@
|
||||
# DChain node environment — copy to node1.env / node2.env / node3.env and
|
||||
# customise per-host. These values are read by the node binary via ENV
|
||||
# fallback (flags still override).
|
||||
#
|
||||
# Required:
|
||||
# DCHAIN_VALIDATORS Comma-separated Ed25519 pubkeys of the initial
|
||||
# validator set. All three nodes must agree on
|
||||
# this list at genesis; later additions happen
|
||||
# on-chain via ADD_VALIDATOR.
|
||||
# DCHAIN_ANNOUNCE Public libp2p multiaddr peers use to dial this
|
||||
# node from the internet. e.g.
|
||||
# /ip4/203.0.113.10/tcp/4001
|
||||
#
|
||||
# Optional:
|
||||
# DCHAIN_PEERS Bootstrap peer multiaddrs. Auto-filled by
|
||||
# --join if omitted.
|
||||
# DCHAIN_GOVERNANCE_CONTRACT Deployed governance contract ID (hex).
|
||||
# DCHAIN_RELAY_FEE µT per message when registering as a relay.
|
||||
# ACME_EMAIL Email for Let's Encrypt (TLS expiry reminders).
|
||||
# DOMAIN Public hostname — Caddy issues cert for this.
|
||||
#
|
||||
# Security:
|
||||
# Key files are bind-mounted at runtime; do NOT put private keys in this
|
||||
# file. Each node needs its own identity — generate with
|
||||
# docker compose run --rm node1 /usr/local/bin/client keygen --out /keys/node.json
|
||||
# and copy out with `docker cp`.
|
||||
|
||||
DCHAIN_VALIDATORS=PUT_FIRST_PUBKEY_HERE,PUT_SECOND_PUBKEY_HERE,PUT_THIRD_PUBKEY_HERE
|
||||
DCHAIN_ANNOUNCE=/ip4/0.0.0.0/tcp/4001
|
||||
|
||||
# DCHAIN_PEERS=/ip4/203.0.113.10/tcp/4001/p2p/12D3Koo...
|
||||
# DCHAIN_GOVERNANCE_CONTRACT=
|
||||
# DCHAIN_RELAY_FEE=1000
|
||||
# ACME_EMAIL=admin@example.com
|
||||
# DOMAIN=dchain.example.com
|
||||
# GRAFANA_ADMIN_PW=change-me-to-something-long
|
||||
17
deploy/prod/prometheus.yml
Normal file
17
deploy/prod/prometheus.yml
Normal file
@@ -0,0 +1,17 @@
|
||||
# Prometheus scrape config for DChain prod.
|
||||
# Mounted read-only into the prometheus container.
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
scrape_timeout: 5s
|
||||
evaluation_interval: 30s
|
||||
external_labels:
|
||||
network: dchain-prod
|
||||
|
||||
scrape_configs:
|
||||
- job_name: dchain-node
|
||||
metrics_path: /metrics
|
||||
static_configs:
|
||||
- targets: [node1:8080, node2:8080, node3:8080]
|
||||
labels:
|
||||
group: validators
|
||||
46
deploy/single/Caddyfile
Normal file
46
deploy/single/Caddyfile
Normal file
@@ -0,0 +1,46 @@
|
||||
# Single-node Caddy: TLS terminate + WS upgrade + internal-only /metrics.
|
||||
#
|
||||
# No load balancing — one node backend. Keeps the file short and easy to
|
||||
# audit. For a multi-node deployment see deploy/prod/caddy/Caddyfile.
|
||||
{
|
||||
email {$ACME_EMAIL:admin@example.com}
|
||||
servers {
|
||||
protocols h1 h2 h3
|
||||
}
|
||||
}
|
||||
|
||||
{$DOMAIN:localhost} {
|
||||
encode zstd gzip
|
||||
|
||||
# WebSocket (single backend; no stickiness concerns).
|
||||
@ws path /api/ws
|
||||
handle @ws {
|
||||
reverse_proxy node:8080
|
||||
}
|
||||
|
||||
# REST API.
|
||||
handle /api/* {
|
||||
reverse_proxy node:8080
|
||||
}
|
||||
|
||||
# /metrics is for the operator's Prometheus only. Block external IPs.
|
||||
@metricsPublic {
|
||||
path /metrics
|
||||
not remote_ip 127.0.0.1 ::1 172.16.0.0/12 192.168.0.0/16 10.0.0.0/8
|
||||
}
|
||||
handle @metricsPublic {
|
||||
respond "forbidden" 403
|
||||
}
|
||||
|
||||
# Anything else → explorer HTML from the node.
|
||||
handle {
|
||||
reverse_proxy node:8080
|
||||
}
|
||||
|
||||
log {
|
||||
output stdout
|
||||
format json
|
||||
level INFO
|
||||
}
|
||||
}
|
||||
}
|
||||
387
deploy/single/README.md
Normal file
387
deploy/single/README.md
Normal file
@@ -0,0 +1,387 @@
|
||||
# DChain single-node deployment
|
||||
|
||||
Один узел + опционально Caddy TLS + опционально Prometheus/Grafana.
|
||||
Подходит под четыре основных сценария:
|
||||
|
||||
1. **личная нода** — публичная или приватная с токеном,
|
||||
2. **первый узел новой сети** (genesis),
|
||||
3. **присоединение к существующей сети** (relay / observer / validator),
|
||||
4. **headless API-нода** для мобильных клиентов — без HTML-UI.
|
||||
|
||||
Для 3-валидаторного кластера смотри `../prod/`.
|
||||
|
||||
---
|
||||
|
||||
## Навигация
|
||||
|
||||
- [0. Что поднимается](#0-что-поднимается)
|
||||
- [1. Быстрый старт](#1-быстрый-старт)
|
||||
- [2. Сценарии конфигурации](#2-сценарии-конфигурации)
|
||||
- [2.1. Публичная нода с UI и открытым Swagger](#21-публичная-нода-с-ui-и-открытым-swagger)
|
||||
- [2.2. Headless API-нода (без UI, Swagger открыт)](#22-headless-api-нода-без-ui-swagger-открыт)
|
||||
- [2.3. Полностью приватная (токен на всё, UI выключен)](#23-полностью-приватная-токен-на-всё-ui-выключен)
|
||||
- [2.4. Только-API без Swagger](#24-только-api-без-swagger)
|
||||
- [2.5. Первая нода новой сети (--genesis)](#25-первая-нода-новой-сети---genesis)
|
||||
- [2.6. Присоединение к существующей сети (--join)](#26-присоединение-к-существующей-сети---join)
|
||||
- [3. HTTP-поверхность](#3-http-поверхность)
|
||||
- [4. Auto-update от Gitea](#4-auto-update-от-gitea)
|
||||
- [5. Обновление / бэкап / восстановление](#5-обновление--бэкап--восстановление)
|
||||
- [6. Troubleshooting](#6-troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## 0. Что поднимается
|
||||
|
||||
Базовый compose (`docker compose up -d`) поднимает:
|
||||
|
||||
| Сервис | Что это | Порты |
|
||||
|--------|---------|-------|
|
||||
| `node` | сама нода DChain (`dchain-node-slim` image) | `4001` (libp2p P2P, наружу), `8080` (HTTP/WS — только через Caddy) |
|
||||
| `caddy` | TLS edge с auto-HTTPS (Let's Encrypt) | `80`, `443`, `443/udp` |
|
||||
|
||||
С `--profile monitor` добавляются:
|
||||
|
||||
| Сервис | Что это | Порты |
|
||||
|--------|---------|-------|
|
||||
| `prometheus` | метрики + TSDB (30 дней retention) | внутри сети |
|
||||
| `grafana` | дашборды | `3000` |
|
||||
|
||||
---
|
||||
|
||||
## 1. Быстрый старт
|
||||
|
||||
```bash
|
||||
# 1. Сгенерируй ключ ноды (один раз, храни в безопасности).
|
||||
docker build -t dchain-node-slim -f ../prod/Dockerfile.slim ../..
|
||||
mkdir -p keys
|
||||
docker run --rm --entrypoint /usr/local/bin/client \
|
||||
-v "$PWD/keys:/out" dchain-node-slim \
|
||||
keygen --out /out/node.json
|
||||
|
||||
# 2. Скопируй env и отредактируй.
|
||||
cp node.env.example node.env
|
||||
$EDITOR node.env # минимум: DCHAIN_ANNOUNCE, DOMAIN, DCHAIN_API_TOKEN
|
||||
|
||||
# 3. Подними.
|
||||
docker compose up -d
|
||||
|
||||
# 4. (опционально) Мониторинг.
|
||||
GRAFANA_ADMIN_PW=$(openssl rand -hex 16) \
|
||||
docker compose --profile monitor up -d
|
||||
# → Grafana http://<host>:3000, источник http://prometheus:9090
|
||||
|
||||
# 5. Проверь живость.
|
||||
curl -s https://$DOMAIN/api/netstats
|
||||
curl -s https://$DOMAIN/api/well-known-version
|
||||
```
|
||||
|
||||
> **Windows:** если запускаете через Docker Desktop и Git Bash, добавляйте
|
||||
> `MSYS_NO_PATHCONV=1` перед командами с `/out`, `/keys` и подобными Unix-путями
|
||||
> — иначе Git Bash сконвертирует их в Windows-пути.
|
||||
|
||||
---
|
||||
|
||||
## 2. Сценарии конфигурации
|
||||
|
||||
Все сценарии отличаются только содержимым `node.env`. Пересоздавать
|
||||
контейнер: `docker compose up -d --force-recreate node`.
|
||||
|
||||
### 2.1. Публичная нода с UI и открытым Swagger
|
||||
|
||||
**Когда подходит:** вы хотите показать Explorer всем (адрес для поиска
|
||||
по pubkey, история блоков, список валидаторов), и оставить Swagger как
|
||||
живую документацию API.
|
||||
|
||||
```ini
|
||||
# node.env
|
||||
DCHAIN_ANNOUNCE=/ip4/203.0.113.10/tcp/4001
|
||||
DOMAIN=dchain.example.com
|
||||
ACME_EMAIL=you@example.com
|
||||
|
||||
# никакого токена — публичный режим
|
||||
# UI и Swagger зажгутся по умолчанию (флаги ниже не задаём)
|
||||
```
|
||||
|
||||
Результат:
|
||||
|
||||
| URL | Что там |
|
||||
|-----|---------|
|
||||
| `https://$DOMAIN/` | Блок-эксплорер (главная) |
|
||||
| `https://$DOMAIN/address?pub=…` | Баланс + история по pubkey |
|
||||
| `https://$DOMAIN/tx?id=…` | Детали транзакции |
|
||||
| `https://$DOMAIN/validators` | Список валидаторов |
|
||||
| `https://$DOMAIN/tokens` | Зарегистрированные токены |
|
||||
| `https://$DOMAIN/swagger` | **Swagger UI** — интерактивная OpenAPI спека |
|
||||
| `https://$DOMAIN/swagger/openapi.json` | Сырой OpenAPI JSON — для codegen |
|
||||
| `https://$DOMAIN/api/*` | Вся JSON-API поверхность |
|
||||
| `https://$DOMAIN/metrics` | Prometheus exposition |
|
||||
|
||||
|
||||
### 2.2. Headless API-нода (без UI, Swagger открыт)
|
||||
|
||||
**Когда подходит:** нода — это бэкенд для мобильного приложения,
|
||||
HTML-эксплорер не нужен, но Swagger хочется оставить как доку для
|
||||
разработчиков.
|
||||
|
||||
```ini
|
||||
# node.env
|
||||
DCHAIN_ANNOUNCE=/ip4/203.0.113.20/tcp/4001
|
||||
DOMAIN=api.dchain.example.com
|
||||
|
||||
# Отключаем HTML-страницы эксплорера, но НЕ Swagger.
|
||||
DCHAIN_DISABLE_UI=true
|
||||
```
|
||||
|
||||
Эффект:
|
||||
- `GET /` → `404 page not found`
|
||||
- `GET /address`, `/tx`, `/validators`, `/tokens`, `/contract` и все
|
||||
`/assets/explorer/*` → 404.
|
||||
- `GET /swagger` → Swagger UI, работает (без изменений).
|
||||
- `GET /api/*`, `GET /metrics`, `GET /api/ws` → работают.
|
||||
|
||||
Нода логирует:
|
||||
```
|
||||
[NODE] explorer UI: disabled (--disable-ui)
|
||||
[NODE] swagger: http://0.0.0.0:8080/swagger
|
||||
```
|
||||
|
||||
|
||||
### 2.3. Полностью приватная (токен на всё, UI выключен)
|
||||
|
||||
**Когда подходит:** персональная нода под мессенджер, вы — единственный
|
||||
пользователь, никому посторонним не должна быть видна даже статистика.
|
||||
|
||||
```ini
|
||||
# node.env
|
||||
DCHAIN_ANNOUNCE=/ip4/203.0.113.30/tcp/4001
|
||||
DOMAIN=node.personal.example
|
||||
|
||||
DCHAIN_API_TOKEN=$(openssl rand -hex 32) # скопируйте в клиент
|
||||
DCHAIN_API_PRIVATE=true # закрывает и read-эндпоинты
|
||||
|
||||
# UI вам не нужен, а кому бы и был — всё равно 401 без токена.
|
||||
DCHAIN_DISABLE_UI=true
|
||||
```
|
||||
|
||||
Эффект:
|
||||
- Любой `/api/*` без `Authorization: Bearer <token>` → `401`.
|
||||
- `/swagger` по-прежнему отдаётся (он не кастомизируется под токены,
|
||||
а API-вызовы из Swagger UI будут возвращать 401 — это нормально).
|
||||
- P2P порт `4001` остаётся открытым — без него нода не синкается с сетью.
|
||||
|
||||
Передать токен клиенту:
|
||||
```ts
|
||||
// client-app/lib/api.ts — в post()/get() добавить:
|
||||
headers: { 'Authorization': 'Bearer ' + YOUR_TOKEN }
|
||||
|
||||
// для WebSocket — токен как query-параметр:
|
||||
this.url = base.replace(/^http/, 'ws') + '/api/ws?token=' + YOUR_TOKEN;
|
||||
```
|
||||
|
||||
|
||||
### 2.4. Только-API без Swagger
|
||||
|
||||
**Когда подходит:** максимально hardened headless-нода. Даже описание
|
||||
API поверхности не должно быть на виду.
|
||||
|
||||
```ini
|
||||
DCHAIN_ANNOUNCE=/ip4/203.0.113.40/tcp/4001
|
||||
DOMAIN=rpc.dchain.example.com
|
||||
|
||||
DCHAIN_DISABLE_UI=true
|
||||
DCHAIN_DISABLE_SWAGGER=true
|
||||
```
|
||||
|
||||
Эффект:
|
||||
- `/` → 404, `/swagger` → 404, `/api/*` → работает.
|
||||
- В логах:
|
||||
```
|
||||
[NODE] explorer UI: disabled (--disable-ui)
|
||||
[NODE] swagger: disabled (--disable-swagger)
|
||||
```
|
||||
- Swagger спеку всё равно можно сгенерить локально: `go run ./cmd/node`
|
||||
в dev-режиме → `http://localhost:8080/swagger/openapi.json` → сохранить.
|
||||
|
||||
|
||||
### 2.5. Первая нода новой сети (`--genesis`)
|
||||
|
||||
Любой из сценариев выше + установить `DCHAIN_GENESIS=true` при самом
|
||||
первом запуске. Нода создаст блок 0 со своим же pubkey как единственным
|
||||
валидатором. После первого успешного старта удалите эту строку из
|
||||
`node.env` (no-op, но шумит в логах).
|
||||
|
||||
```ini
|
||||
DCHAIN_GENESIS=true
|
||||
DCHAIN_ANNOUNCE=/ip4/203.0.113.10/tcp/4001
|
||||
DOMAIN=dchain.example.com
|
||||
```
|
||||
|
||||
Проверка:
|
||||
```bash
|
||||
curl -s https://$DOMAIN/api/netstats | jq .validator_count # → 1
|
||||
curl -s https://$DOMAIN/api/network-info | jq .genesis_hash # сохраните
|
||||
```
|
||||
|
||||
|
||||
### 2.6. Присоединение к существующей сети (`--join`)
|
||||
|
||||
Любой из сценариев + `DCHAIN_JOIN` со списком HTTP URL-ов seed-нод.
|
||||
Нода подтянет `chain_id`, genesis hash, список валидаторов и пиров
|
||||
автоматически через `/api/network-info`. Запускается как **observer**
|
||||
по умолчанию — применяет блоки и принимает tx, но не голосует.
|
||||
|
||||
```ini
|
||||
DCHAIN_JOIN=https://seed1.dchain.example.com,https://seed2.dchain.example.com
|
||||
DCHAIN_ANNOUNCE=/ip4/203.0.113.50/tcp/4001
|
||||
DOMAIN=node2.example.com
|
||||
```
|
||||
|
||||
Чтобы стать валидатором — существующий валидатор должен подать
|
||||
`ADD_VALIDATOR` с мульти-подписями. См. `../prod/README.md` →
|
||||
"Add a 4th validator".
|
||||
|
||||
---
|
||||
|
||||
## 3. HTTP-поверхность
|
||||
|
||||
Что отдаёт нода по умолчанию (все `/api/*` всегда включены, даже с `DCHAIN_DISABLE_UI=true`):
|
||||
|
||||
### Публичные health / discovery
|
||||
| Endpoint | Назначение |
|
||||
|----------|-----------|
|
||||
| `/api/netstats` | tip height, total tx count, supply, validator count |
|
||||
| `/api/network-info` | one-shot bootstrap payload для нового клиента/ноды |
|
||||
| `/api/well-known-version` | node_version, protocol_version, features[], build{tag, commit, date, dirty} |
|
||||
| `/api/well-known-contracts` | канонические contract_id → name map |
|
||||
| `/api/update-check` | сравнивает свой commit с Gitea release (нужен `DCHAIN_UPDATE_SOURCE_URL`) |
|
||||
| `/api/validators` | активный validator set |
|
||||
| `/api/peers` | живые libp2p пиры + их версия (из gossip-топика `dchain/version/v1`) |
|
||||
|
||||
### Chain explorer JSON
|
||||
| Endpoint | Назначение |
|
||||
|----------|-----------|
|
||||
| `/api/blocks?limit=N` | последние N блоков |
|
||||
| `/api/block/{index}` | один блок |
|
||||
| `/api/txs/recent?limit=N` | последние N tx |
|
||||
| `/api/tx/{id}` | одна транзакция |
|
||||
| `/api/address/{pubkey_or_DC-addr}` | баланс + история |
|
||||
| `/api/identity/{pubkey_or_DC-addr}` | ed25519 ↔ x25519 binding |
|
||||
| `/api/relays` | зарегистрированные relay-ноды |
|
||||
| `/api/contracts` / `/api/contracts/{id}` / `/api/contracts/{id}/state/{key}` | контракты |
|
||||
| `/api/tokens` / `/api/tokens/{id}` / `/api/nfts` | токены и NFT |
|
||||
| `/api/channels/{id}` / `/api/channels/{id}/members` | каналы и члены (для fan-out) |
|
||||
|
||||
### Submit / Real-time
|
||||
| Endpoint | Назначение |
|
||||
|----------|-----------|
|
||||
| `POST /api/tx` | submit подписанной tx (rate-limit + body-cap; token-gate если задан) |
|
||||
| `GET /api/ws` | WebSocket (auth, topic subscribe, submit_tx, typing) |
|
||||
| `GET /api/events` | SSE (односторонний legacy stream) |
|
||||
|
||||
### HTML (выключается `DCHAIN_DISABLE_UI=true`)
|
||||
| Endpoint | Назначение |
|
||||
|----------|-----------|
|
||||
| `/` | главная эксплорера |
|
||||
| `/address`, `/tx`, `/node`, `/relays`, `/validators`, `/contract`, `/tokens`, `/token` | страницы |
|
||||
| `/assets/explorer/*.js|css` | статические ассеты |
|
||||
|
||||
### Swagger (выключается `DCHAIN_DISABLE_SWAGGER=true`)
|
||||
| Endpoint | Назначение |
|
||||
|----------|-----------|
|
||||
| `/swagger` | Swagger UI (грузит swagger-ui-dist с unpkg) |
|
||||
| `/swagger/openapi.json` | сырая OpenAPI 3.0 спека |
|
||||
|
||||
### Prometheus
|
||||
| Endpoint | Назначение |
|
||||
|----------|-----------|
|
||||
| `/metrics` | exposition, всегда включён |
|
||||
|
||||
> **Защита `/metrics`:** у эндпоинта нет встроенной авторизации. В
|
||||
> публичной деплое закройте его на уровне Caddy — пример в
|
||||
> `Caddyfile`: рестрикт по IP/токену scrape-сервера.
|
||||
|
||||
---
|
||||
|
||||
## 4. Auto-update от Gitea
|
||||
|
||||
После поднятия проекта на Gitea:
|
||||
|
||||
```ini
|
||||
# node.env
|
||||
DCHAIN_UPDATE_SOURCE_URL=https://gitea.example.com/api/v1/repos/dchain/dchain/releases/latest
|
||||
DCHAIN_UPDATE_SOURCE_TOKEN= # опционально, для приватных repo
|
||||
UPDATE_ALLOW_MAJOR=false # блокирует v1.x → v2.y без явного согласия
|
||||
```
|
||||
|
||||
Проверка:
|
||||
```bash
|
||||
curl -s https://$DOMAIN/api/update-check | jq .
|
||||
# {
|
||||
# "current": { "tag": "v0.5.0", "commit": "abc1234", ... },
|
||||
# "latest": { "tag": "v0.5.1", "url": "https://gitea...", ... },
|
||||
# "update_available": true,
|
||||
# "checked_at": "2026-04-17T10:41:03Z"
|
||||
# }
|
||||
```
|
||||
|
||||
systemd-таймер для бесшумного hourly-обновления:
|
||||
```bash
|
||||
sudo cp systemd/dchain-update.{service,timer} /etc/systemd/system/
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now dchain-update.timer
|
||||
```
|
||||
|
||||
Скрипт `update.sh`:
|
||||
1. спрашивает `/api/update-check` — если `update_available: false`, выходит;
|
||||
2. делает `git fetch --tags`, checkout на новый тег;
|
||||
3. **semver-guard**: блокирует major-скачок (vN.x → vN+1.y) если
|
||||
`UPDATE_ALLOW_MAJOR != true`;
|
||||
4. ребилдит образ с injected версией (`VERSION_TAG/COMMIT/DATE/DIRTY`);
|
||||
5. smoke-test `node --version`;
|
||||
6. `docker compose up -d --force-recreate node`;
|
||||
7. polling `/api/netstats` — до 60 сек, fail loud если не ожил.
|
||||
|
||||
Подробнее — `../UPDATE_STRATEGY.md`.
|
||||
|
||||
---
|
||||
|
||||
## 5. Обновление / бэкап / восстановление
|
||||
|
||||
```bash
|
||||
# Ручное обновление (downtime ~5-8 сек):
|
||||
docker compose pull
|
||||
docker compose build
|
||||
docker compose up -d --force-recreate node
|
||||
|
||||
# Проверить что новая версия поднялась:
|
||||
docker exec dchain_node /usr/local/bin/node --version
|
||||
curl -s https://$DOMAIN/api/well-known-version | jq .build
|
||||
|
||||
# Backup chain state:
|
||||
docker run --rm -v dchain-single_node_data:/data -v "$PWD":/bak alpine \
|
||||
tar czf /bak/dchain-$(date +%F).tar.gz -C /data .
|
||||
|
||||
# Восстановление:
|
||||
docker compose stop node
|
||||
docker run --rm -v dchain-single_node_data:/data -v "$PWD":/bak alpine \
|
||||
sh -c "rm -rf /data/* && tar xzf /bak/dchain-2026-04-10.tar.gz -C /data"
|
||||
docker compose up -d node
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Troubleshooting
|
||||
|
||||
| Симптом | Проверка |
|
||||
|---------|----------|
|
||||
| `failed to get certificate` в Caddy | DNS A-record на DOMAIN → этот хост? Порт 80 открыт? |
|
||||
| `/api/tx` возвращает 401 | Токен в заголовке совпадает с `DCHAIN_API_TOKEN`? |
|
||||
| Ноды не видят друг друга | Порт 4001 открыт? `DCHAIN_ANNOUNCE` = публичный IP? |
|
||||
| Блоки не растут (validator mode) | `docker compose logs node | grep PBFT` — собирается ли quorum? |
|
||||
| `/` возвращает 404 | `DCHAIN_DISABLE_UI=true` установлен — либо уберите, либо используйте `/api/*` |
|
||||
| `/swagger` возвращает 404 | `DCHAIN_DISABLE_SWAGGER=true` — уберите, либо хостьте `openapi.json` отдельно |
|
||||
| `update-check` возвращает 503 | `DCHAIN_UPDATE_SOURCE_URL` не задан или пустой |
|
||||
| `update-check` возвращает 502 | Gitea недоступна или URL неверный — проверьте `curl $DCHAIN_UPDATE_SOURCE_URL` руками |
|
||||
| `FATAL: genesis hash mismatch` | В volume чейн с другим genesis. `docker volume rm dchain-single_node_data` → `up -d` (потеря локальных данных) |
|
||||
| Диск растёт | BadgerDB GC работает раз в 5 мин; для блокчейна с десятками тысяч блоков обычно < 500 MB |
|
||||
| `--version` выдаёт `dev` | Образ собран без `--build-arg VERSION_*` — ребилдните через `update.sh` или `docker build --build-arg VERSION_TAG=...` вручную |
|
||||
129
deploy/single/docker-compose.yml
Normal file
129
deploy/single/docker-compose.yml
Normal file
@@ -0,0 +1,129 @@
|
||||
name: dchain-single
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════
|
||||
# Single-node DChain deployment.
|
||||
#
|
||||
# One validator (or observer) + Caddy TLS edge + optional
|
||||
# Prometheus/Grafana. Intended for:
|
||||
# - Personal nodes: operator runs their own, optionally private.
|
||||
# - Tail of a larger network: joins via --join, participates / observes.
|
||||
# - First node of a brand-new network: starts with --genesis.
|
||||
#
|
||||
# Quick start:
|
||||
# cp node.env.example node.env # edit DOMAIN / API_TOKEN / JOIN
|
||||
# docker compose up -d # node + Caddy
|
||||
# docker compose --profile monitor up -d
|
||||
#
|
||||
# For a multi-validator cluster see deploy/prod/ (3-of-3 PBFT setup).
|
||||
# ══════════════════════════════════════════════════════════════════════════
|
||||
|
||||
networks:
|
||||
dchain:
|
||||
name: dchain_single
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
node_data:
|
||||
caddy_data:
|
||||
caddy_config:
|
||||
prom_data:
|
||||
grafana_data:
|
||||
|
||||
services:
|
||||
# ── The node ──────────────────────────────────────────────────────────
|
||||
# One process does everything: consensus (if validator), relay, HTTP,
|
||||
# WebSocket, metrics. Three knobs are worth knowing before first boot:
|
||||
#
|
||||
# 1. DCHAIN_GENESIS=true → creates block 0 with THIS node's key as sole
|
||||
# validator. Use only once, on the very first node of a fresh chain.
|
||||
# Drop the flag on subsequent restarts (no-op but noisy).
|
||||
# 2. DCHAIN_JOIN=http://...,http://... → fetch /api/network-info from
|
||||
# the listed seeds, auto-populate --peers / --validators, sync chain.
|
||||
# Use this when joining an existing network instead of --genesis.
|
||||
# 3. DCHAIN_API_TOKEN=... → if set, gates POST /api/tx (and WS submit).
|
||||
# With DCHAIN_API_PRIVATE=true, gates reads too. Empty = public.
|
||||
node:
|
||||
build:
|
||||
context: ../..
|
||||
dockerfile: deploy/prod/Dockerfile.slim
|
||||
container_name: dchain_node
|
||||
restart: unless-stopped
|
||||
env_file: ./node.env
|
||||
networks: [dchain]
|
||||
volumes:
|
||||
- node_data:/data
|
||||
- ./keys/node.json:/keys/node.json:ro
|
||||
# 4001 → libp2p P2P (MUST be publicly routable for federation)
|
||||
# 8080 → HTTP + WebSocket, only exposed internally to Caddy by default
|
||||
ports:
|
||||
- "4001:4001"
|
||||
expose:
|
||||
- "8080"
|
||||
cap_drop: [ALL]
|
||||
read_only: true
|
||||
tmpfs: [/tmp]
|
||||
security_opt: [no-new-privileges:true]
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "wget -qO- http://127.0.0.1:8080/api/netstats >/dev/null || exit 1"]
|
||||
interval: 10s
|
||||
timeout: 3s
|
||||
retries: 6
|
||||
start_period: 15s
|
||||
command:
|
||||
- "--db=/data/chain"
|
||||
- "--mailbox-db=/data/mailbox"
|
||||
- "--key=/keys/node.json"
|
||||
- "--relay-key=/data/relay.json"
|
||||
- "--listen=/ip4/0.0.0.0/tcp/4001"
|
||||
- "--stats-addr=:8080"
|
||||
# All other config comes via DCHAIN_* env vars from node.env.
|
||||
|
||||
# ── TLS edge ──────────────────────────────────────────────────────────
|
||||
caddy:
|
||||
image: caddy:2.8-alpine
|
||||
container_name: dchain_caddy
|
||||
restart: unless-stopped
|
||||
networks: [dchain]
|
||||
ports:
|
||||
- "80:80"
|
||||
- "443:443"
|
||||
- "443:443/udp"
|
||||
volumes:
|
||||
- ./Caddyfile:/etc/caddy/Caddyfile:ro
|
||||
- caddy_data:/data
|
||||
- caddy_config:/config
|
||||
environment:
|
||||
DOMAIN: ${DOMAIN:-localhost}
|
||||
ACME_EMAIL: ${ACME_EMAIL:-admin@example.com}
|
||||
depends_on:
|
||||
node: { condition: service_healthy }
|
||||
|
||||
# ── Observability (opt-in) ────────────────────────────────────────────
|
||||
prometheus:
|
||||
profiles: [monitor]
|
||||
image: prom/prometheus:v2.53.0
|
||||
container_name: dchain_prometheus
|
||||
restart: unless-stopped
|
||||
networks: [dchain]
|
||||
volumes:
|
||||
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
- prom_data:/prometheus
|
||||
command:
|
||||
- "--config.file=/etc/prometheus/prometheus.yml"
|
||||
- "--storage.tsdb.retention.time=30d"
|
||||
|
||||
grafana:
|
||||
profiles: [monitor]
|
||||
image: grafana/grafana:11.1.0
|
||||
container_name: dchain_grafana
|
||||
restart: unless-stopped
|
||||
networks: [dchain]
|
||||
ports:
|
||||
- "3000:3000"
|
||||
depends_on: [prometheus]
|
||||
environment:
|
||||
GF_SECURITY_ADMIN_USER: admin
|
||||
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PW:-change-me}
|
||||
GF_USERS_ALLOW_SIGN_UP: "false"
|
||||
volumes:
|
||||
- grafana_data:/var/lib/grafana
|
||||
119
deploy/single/node.env.example
Normal file
119
deploy/single/node.env.example
Normal file
@@ -0,0 +1,119 @@
|
||||
# ───────────────────────────────────────────────────────────────────────────
|
||||
# Single-node DChain deployment — operator configuration.
|
||||
#
|
||||
# Copy this file to `node.env` and fill in the blanks. All variables are
|
||||
# DCHAIN_*-prefixed; the node binary reads them as flag fallbacks
|
||||
# (CLI > env > hard-coded default).
|
||||
# ───────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
# ══ 1. Mode: first node of a new chain, OR joiner to an existing one ══
|
||||
|
||||
# Uncomment for the VERY FIRST node of a brand-new network.
|
||||
# Creates block 0 with this node's key as the sole initial validator.
|
||||
# Drop this flag after the first successful boot (it's a no-op on a
|
||||
# non-empty DB but clutters logs).
|
||||
#DCHAIN_GENESIS=true
|
||||
|
||||
# Comma-separated HTTP URLs of seed nodes to bootstrap from. The node
|
||||
# fetches /api/network-info from each in order until one replies, then
|
||||
# auto-populates --peers / --validators and starts syncing.
|
||||
#
|
||||
# Leave empty ONLY if you're using --genesis above (first node) OR you're
|
||||
# running a standalone offline node for testing.
|
||||
#DCHAIN_JOIN=https://seed1.dchain.example.com,https://seed2.dchain.example.com
|
||||
|
||||
|
||||
# ══ 2. Access control ═══════════════════════════════════════════════════
|
||||
|
||||
# Shared secret required to submit transactions. Without this, ANY client
|
||||
# that can reach your node can submit txs through it (they still need a
|
||||
# valid signature, so they can't forge — but they could clutter YOUR
|
||||
# mempool with their traffic).
|
||||
#
|
||||
# Recommended:
|
||||
# DCHAIN_API_TOKEN=$(openssl rand -hex 32)
|
||||
#
|
||||
# Configure the same value in your mobile/desktop client's "Authorization:
|
||||
# Bearer ..." header. Leave commented-out for a fully public node.
|
||||
#DCHAIN_API_TOKEN=REPLACE_WITH_A_LONG_RANDOM_SECRET
|
||||
|
||||
# Go a step further: require the token on READ endpoints too. Only you
|
||||
# (and anyone you share the token with) can query /api/netstats, balances,
|
||||
# tx history, etc. Useful for a personal node where chat metadata is
|
||||
# sensitive. Requires DCHAIN_API_TOKEN above to be set.
|
||||
#DCHAIN_API_PRIVATE=true
|
||||
|
||||
|
||||
# ══ 3. Networking ══════════════════════════════════════════════════════
|
||||
|
||||
# Public libp2p multiaddr others will use to dial this node. Substitute
|
||||
# your VPS's public IP (or use a hostname resolved via DNS). Port 4001
|
||||
# must be open on your firewall.
|
||||
DCHAIN_ANNOUNCE=/ip4/CHANGE_ME_TO_YOUR_PUBLIC_IP/tcp/4001
|
||||
|
||||
# Public domain for HTTPS access. Must have a DNS A-record pointing at
|
||||
# this host BEFORE `docker compose up` — Caddy issues a cert via
|
||||
# Let's Encrypt on first start.
|
||||
DOMAIN=node.example.com
|
||||
ACME_EMAIL=admin@example.com
|
||||
|
||||
|
||||
# ══ 4. Role ═══════════════════════════════════════════════════════════
|
||||
|
||||
# Observer mode: this node applies blocks and serves HTTP/WS but never
|
||||
# proposes or votes. Use if you want an API-only node (e.g. running behind
|
||||
# a load balancer for clients, without caring about consensus). Skip if
|
||||
# this node is a validator.
|
||||
#DCHAIN_OBSERVER=true
|
||||
|
||||
# Submit a REGISTER_RELAY tx at startup so clients can use this node as a
|
||||
# relay for encrypted messages. Costs 1 tx fee (1000 µT by default).
|
||||
# Requires the node identity to have a minimum balance.
|
||||
#DCHAIN_REGISTER_RELAY=true
|
||||
#DCHAIN_RELAY_FEE=1000
|
||||
|
||||
# Governance contract ID — if your network uses on-chain gas-price /
|
||||
# parameter voting. Auto-discovered from --join seeds; only set manually
|
||||
# to pin a non-canonical deployment.
|
||||
#DCHAIN_GOVERNANCE_CONTRACT=
|
||||
|
||||
|
||||
# ══ 5. Validator-only ═════════════════════════════════════════════════
|
||||
|
||||
# Validator set (comma-separated pubkeys). On a joining node this gets
|
||||
# populated automatically from --join. On --genesis this is the initial
|
||||
# set (usually just this node's own pubkey).
|
||||
#DCHAIN_VALIDATORS=
|
||||
|
||||
|
||||
# ══ 6. Logging ════════════════════════════════════════════════════════
|
||||
|
||||
# `text` is human-readable; `json` is machine-parsable for Loki/ELK.
|
||||
DCHAIN_LOG_FORMAT=json
|
||||
|
||||
|
||||
# ══ 7. Auto-update (used by deploy/single/update.sh + systemd timer) ══
|
||||
|
||||
# Full URL of your Gitea release-API endpoint. Exposed as /api/update-check.
|
||||
# Format: https://<gitea-host>/api/v1/repos/<owner>/<repo>/releases/latest
|
||||
# When set, the update script prefers this over blind git-fetch — less
|
||||
# upstream traffic, and releases act as a gate (operator publishes a release
|
||||
# when a version is known-good).
|
||||
#DCHAIN_UPDATE_SOURCE_URL=https://gitea.example.com/api/v1/repos/dchain/dchain/releases/latest
|
||||
|
||||
# Optional PAT (personal access token) for private repos. Not needed if the
|
||||
# repo is public.
|
||||
#DCHAIN_UPDATE_SOURCE_TOKEN=
|
||||
|
||||
# Semver guard: set to "true" to permit auto-update across major versions
|
||||
# (v1.x → v2.y). Defaults to false — you get a loud error instead of a
|
||||
# potentially breaking upgrade at 3am.
|
||||
#UPDATE_ALLOW_MAJOR=false
|
||||
|
||||
|
||||
# ══ 8. Monitoring (only used if you run --profile monitor) ════════════
|
||||
|
||||
# Grafana admin password. Change this if you expose the dashboard
|
||||
# publicly.
|
||||
GRAFANA_ADMIN_PW=change-me-to-something-long
|
||||
18
deploy/single/prometheus.yml
Normal file
18
deploy/single/prometheus.yml
Normal file
@@ -0,0 +1,18 @@
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
scrape_timeout: 5s
|
||||
evaluation_interval: 30s
|
||||
external_labels:
|
||||
deployment: dchain-single
|
||||
|
||||
scrape_configs:
|
||||
- job_name: dchain-node
|
||||
metrics_path: /metrics
|
||||
# When --api-private is set, the node will reject the scrape.
|
||||
# Uncomment the bearer_token line below and set it to the same
|
||||
# value as DCHAIN_API_TOKEN in node.env.
|
||||
# authorization:
|
||||
# type: Bearer
|
||||
# credentials: SAME_AS_DCHAIN_API_TOKEN
|
||||
static_configs:
|
||||
- targets: [node:8080]
|
||||
57
deploy/single/systemd/README.md
Normal file
57
deploy/single/systemd/README.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# Systemd units for DChain auto-update
|
||||
|
||||
Two files, one-time setup.
|
||||
|
||||
## Install
|
||||
|
||||
Assumes the repo is checked out at `/opt/dchain`. Adjust `WorkingDirectory=`
|
||||
and `EnvironmentFile=` in `dchain-update.service` if you put it elsewhere.
|
||||
|
||||
```bash
|
||||
sudo cp dchain-update.{service,timer} /etc/systemd/system/
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now dchain-update.timer
|
||||
```
|
||||
|
||||
## Verify
|
||||
|
||||
```bash
|
||||
# When does the timer next fire?
|
||||
systemctl list-timers dchain-update.timer
|
||||
|
||||
# What did the last run do?
|
||||
journalctl -u dchain-update.service -n 100 --no-pager
|
||||
|
||||
# Run one update immediately, without waiting for the timer
|
||||
sudo systemctl start dchain-update.service
|
||||
```
|
||||
|
||||
## How it behaves
|
||||
|
||||
- Every hour (± up to 15 min jitter) the timer triggers the service.
|
||||
- The service runs `update.sh` once, which:
|
||||
- fetches `origin/main`
|
||||
- if HEAD didn't move: exits 0, nothing touched
|
||||
- if HEAD moved: fast-forwards, rebuilds image, smoke-tests the new
|
||||
binary, restarts the container, polls health
|
||||
- Downtime per update is ~5-8 seconds (Badger reopen + HTTP listener warm-up).
|
||||
- Failures write to journal; add `OnFailure=` if you want Pushover/email.
|
||||
|
||||
## Disable auto-update
|
||||
|
||||
If you want to pin a version and review changes manually:
|
||||
|
||||
```bash
|
||||
sudo systemctl disable --now dchain-update.timer
|
||||
```
|
||||
|
||||
You can still invoke `update.sh` by hand when you've reviewed and
|
||||
fast-forwarded your working tree.
|
||||
|
||||
## Why hourly + jitter
|
||||
|
||||
A whole federation restarting in the same 60-second window would drop PBFT
|
||||
quorum below 2/3 for that window. With 1-hour cadence and 15-min jitter, the
|
||||
max probability of two validators being down simultaneously is about
|
||||
`(15s / 15min)² × N_validators²`, which stays safely below the quorum floor
|
||||
for any realistic N.
|
||||
35
deploy/single/systemd/dchain-update.service
Normal file
35
deploy/single/systemd/dchain-update.service
Normal file
@@ -0,0 +1,35 @@
|
||||
# DChain single-node pull-and-restart service.
|
||||
#
|
||||
# Install:
|
||||
# sudo cp dchain-update.service dchain-update.timer /etc/systemd/system/
|
||||
# sudo systemctl daemon-reload
|
||||
# sudo systemctl enable --now dchain-update.timer
|
||||
#
|
||||
# View runs:
|
||||
# systemctl list-timers dchain-update.timer
|
||||
# journalctl -u dchain-update.service -n 200 --no-pager
|
||||
#
|
||||
# The timer (sibling file) fires the service; the service runs update.sh
|
||||
# once per fire, which itself is a no-op when HEAD hasn't moved.
|
||||
|
||||
[Unit]
|
||||
Description=DChain node: fetch latest, rebuild, rolling restart
|
||||
Documentation=file:///opt/dchain/deploy/UPDATE_STRATEGY.md
|
||||
# Don't try to update while Docker is still coming up after a host reboot.
|
||||
After=docker.service network-online.target
|
||||
Requires=docker.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
# REPO_DIR + COMPOSE_FILE come from the update script's defaults; override
|
||||
# here with Environment= if you moved the checkout to a non-default path.
|
||||
WorkingDirectory=/opt/dchain
|
||||
EnvironmentFile=-/opt/dchain/deploy/single/node.env
|
||||
ExecStart=/opt/dchain/deploy/single/update.sh
|
||||
|
||||
# Lock down the unit — update.sh only needs git + docker + curl.
|
||||
PrivateTmp=true
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ReadWritePaths=/opt/dchain /var/run/docker.sock
|
||||
ProtectHome=true
|
||||
24
deploy/single/systemd/dchain-update.timer
Normal file
24
deploy/single/systemd/dchain-update.timer
Normal file
@@ -0,0 +1,24 @@
|
||||
# Timer for dchain-update.service — fires hourly with a random 15-minute jitter.
|
||||
#
|
||||
# Why the jitter: if every operator on the same network runs `OnCalendar=hourly`
|
||||
# at :00:00, the whole federation restarts its nodes in the same minute and
|
||||
# PBFT quorum drops below 2/3. With a random delay spread across 15 minutes
|
||||
# each node updates at a slightly different time, so at any instant the vast
|
||||
# majority of validators remain live.
|
||||
#
|
||||
# Persistent=true means if the machine was asleep/off at fire time, the timer
|
||||
# catches up on next boot instead of silently skipping.
|
||||
|
||||
[Unit]
|
||||
Description=Run DChain node update hourly
|
||||
Requires=dchain-update.service
|
||||
|
||||
[Timer]
|
||||
OnBootSec=10min
|
||||
OnUnitActiveSec=1h
|
||||
RandomizedDelaySec=15min
|
||||
Persistent=true
|
||||
Unit=dchain-update.service
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
166
deploy/single/update.sh
Normal file
166
deploy/single/update.sh
Normal file
@@ -0,0 +1,166 @@
|
||||
#!/usr/bin/env bash
|
||||
# deploy/single/update.sh — pull-and-restart update for a DChain single node.
|
||||
#
|
||||
# Modes
|
||||
# ─────
|
||||
# 1. RELEASE mode (preferred): DCHAIN_UPDATE_SOURCE_URL is set, node has
|
||||
# /api/update-check — we trust that endpoint to tell us what's latest,
|
||||
# then git-checkout the matching tag.
|
||||
# 2. BRANCH mode (fallback): follow origin/main HEAD.
|
||||
#
|
||||
# Both modes share the same safety flow:
|
||||
# - Fast-forward check (no auto-rewriting local history).
|
||||
# - Smoke-test the new binary BEFORE killing the running container.
|
||||
# - Pre-restart best-effort health probe.
|
||||
# - Poll /api/netstats after restart, fail loud if unhealthy in 60s.
|
||||
#
|
||||
# Semver guard
|
||||
# ────────────
|
||||
# If UPDATE_ALLOW_MAJOR=false (default), the script refuses to cross a major
|
||||
# version boundary (v1.x → v2.y). Operator must flip the env var or bump
|
||||
# manually, avoiding surprise breaking changes from unattended restarts.
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 — up to date or successfully updated
|
||||
# 1 — new container didn't become healthy
|
||||
# 2 — smoke test of new image failed
|
||||
# 3 — git state issue (fetch failed, not fast-forwardable, etc.)
|
||||
# 4 — semver guard blocked the update
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
REPO_DIR="${REPO_DIR:-/opt/dchain}"
|
||||
COMPOSE_FILE="${COMPOSE_FILE:-$REPO_DIR/deploy/single/docker-compose.yml}"
|
||||
IMAGE_NAME="${IMAGE_NAME:-dchain-node-slim}"
|
||||
CONTAINER="${CONTAINER:-dchain_node}"
|
||||
HEALTH_URL="${HEALTH_URL:-http://127.0.0.1:8080/api/netstats}"
|
||||
UPDATE_CHECK_URL="${UPDATE_CHECK_URL:-http://127.0.0.1:8080/api/update-check}"
|
||||
GIT_REMOTE="${GIT_REMOTE:-origin}"
|
||||
GIT_BRANCH="${GIT_BRANCH:-main}"
|
||||
UPDATE_ALLOW_MAJOR="${UPDATE_ALLOW_MAJOR:-false}"
|
||||
|
||||
log() { printf '%s %s\n' "$(date -Iseconds)" "$*"; }
|
||||
die() { log "ERROR: $*"; exit "${2:-1}"; }
|
||||
|
||||
command -v docker >/dev/null || die "docker not on PATH" 3
|
||||
command -v git >/dev/null || die "git not on PATH" 3
|
||||
command -v curl >/dev/null || die "curl not on PATH" 3
|
||||
|
||||
cd "$REPO_DIR" || die "cannot cd to REPO_DIR=$REPO_DIR" 3
|
||||
|
||||
# ── Auth header for /api/update-check + /api/netstats on private nodes ────
|
||||
auth_args=()
|
||||
[[ -n "${DCHAIN_API_TOKEN:-}" ]] && auth_args=(-H "Authorization: Bearer ${DCHAIN_API_TOKEN}")
|
||||
|
||||
# ── 1. Discover target version ────────────────────────────────────────────
|
||||
target_tag=""
|
||||
target_commit=""
|
||||
|
||||
if [[ -n "${DCHAIN_UPDATE_SOURCE_URL:-}" ]]; then
|
||||
log "querying release source via $UPDATE_CHECK_URL"
|
||||
check_json=$(curl -fsS -m 10 "${auth_args[@]}" "$UPDATE_CHECK_URL" 2>/dev/null || echo "")
|
||||
if [[ -n "$check_json" ]]; then
|
||||
# Minimal grep-based JSON extraction — avoids adding a jq dependency.
|
||||
# The shape is stable: we defined it in api_update_check.go.
|
||||
target_tag=$(printf '%s' "$check_json" | grep -o '"tag":"[^"]*"' | head -1 | sed 's/"tag":"\(.*\)"/\1/')
|
||||
update_available=$(printf '%s' "$check_json" | grep -o '"update_available":\(true\|false\)' | head -1 | cut -d: -f2)
|
||||
if [[ "$update_available" != "true" ]]; then
|
||||
log "up to date according to release source ($target_tag) — nothing to do"
|
||||
exit 0
|
||||
fi
|
||||
log "release source reports new tag: $target_tag"
|
||||
else
|
||||
log "release source query failed — falling back to git branch mode"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── 2. Fetch git ──────────────────────────────────────────────────────────
|
||||
log "fetching $GIT_REMOTE"
|
||||
git fetch --quiet --tags "$GIT_REMOTE" "$GIT_BRANCH" || die "git fetch failed" 3
|
||||
|
||||
local_sha=$(git rev-parse HEAD)
|
||||
if [[ -n "$target_tag" ]]; then
|
||||
# Release mode: target is the tag we just read from the API.
|
||||
if ! git rev-parse --verify "refs/tags/$target_tag" >/dev/null 2>&1; then
|
||||
die "release tag $target_tag unknown to local git — fetch may have missed it" 3
|
||||
fi
|
||||
target_commit=$(git rev-parse "refs/tags/$target_tag^{commit}")
|
||||
else
|
||||
# Branch mode: target is remote branch HEAD.
|
||||
target_commit=$(git rev-parse "$GIT_REMOTE/$GIT_BRANCH")
|
||||
target_tag=$(git describe --tags --abbrev=0 "$target_commit" 2>/dev/null || echo "$target_commit")
|
||||
fi
|
||||
|
||||
if [[ "$local_sha" == "$target_commit" ]]; then
|
||||
log "up to date at $local_sha — nothing to do"
|
||||
exit 0
|
||||
fi
|
||||
log "updating $local_sha → $target_commit ($target_tag)"
|
||||
|
||||
# ── 3. Semver guard ───────────────────────────────────────────────────────
|
||||
# Extract major components from vX.Y.Z tags; refuse to cross boundary unless
|
||||
# operator opts in. Treats non-semver tags (e.g. raw SHA in branch mode) as
|
||||
# unversioned — guard is a no-op there.
|
||||
current_tag=$(cat "$REPO_DIR/.last-update" 2>/dev/null || echo "")
|
||||
if [[ -z "$current_tag" ]]; then
|
||||
current_tag=$(git describe --tags --abbrev=0 "$local_sha" 2>/dev/null || echo "")
|
||||
fi
|
||||
current_major=$(printf '%s' "$current_tag" | sed -nE 's/^v([0-9]+)\..*/\1/p')
|
||||
target_major=$(printf '%s' "$target_tag" | sed -nE 's/^v([0-9]+)\..*/\1/p')
|
||||
if [[ -n "$current_major" && -n "$target_major" && "$current_major" != "$target_major" ]]; then
|
||||
if [[ "$UPDATE_ALLOW_MAJOR" != "true" ]]; then
|
||||
die "major version jump $current_tag → $target_tag blocked — set UPDATE_ALLOW_MAJOR=true to override" 4
|
||||
fi
|
||||
log "semver guard: accepting major jump $current_tag → $target_tag (UPDATE_ALLOW_MAJOR=true)"
|
||||
fi
|
||||
|
||||
# ── 4. Fast-forward / checkout ────────────────────────────────────────────
|
||||
if [[ -n "$target_tag" ]] && git rev-parse --verify "refs/tags/$target_tag" >/dev/null 2>&1; then
|
||||
# Release mode: check out the tag in detached HEAD.
|
||||
git checkout --quiet "$target_tag" || die "checkout $target_tag failed" 3
|
||||
else
|
||||
git merge --ff-only "$GIT_REMOTE/$GIT_BRANCH" || die "cannot fast-forward — manual merge required" 3
|
||||
fi
|
||||
|
||||
# ── 5. Build image with version metadata ──────────────────────────────────
|
||||
version_tag="$target_tag"
|
||||
version_commit="$target_commit"
|
||||
version_date=$(date -u +%Y-%m-%dT%H:%M:%SZ)
|
||||
version_dirty=$(git diff --quiet HEAD -- 2>/dev/null && echo false || echo true)
|
||||
|
||||
log "building image $IMAGE_NAME:$version_commit ($version_tag)"
|
||||
docker build --quiet \
|
||||
--build-arg "VERSION_TAG=$version_tag" \
|
||||
--build-arg "VERSION_COMMIT=$version_commit" \
|
||||
--build-arg "VERSION_DATE=$version_date" \
|
||||
--build-arg "VERSION_DIRTY=$version_dirty" \
|
||||
-t "$IMAGE_NAME:$version_commit" \
|
||||
-t "$IMAGE_NAME:latest" \
|
||||
-f deploy/prod/Dockerfile.slim . \
|
||||
|| die "docker build failed" 2
|
||||
|
||||
# ── 6. Smoke-test the new binary ──────────────────────────────────────────
|
||||
log "smoke-testing new image"
|
||||
smoke=$(docker run --rm --entrypoint /usr/local/bin/node "$IMAGE_NAME:$version_commit" --version 2>&1) \
|
||||
|| die "new image --version failed: $smoke" 2
|
||||
log "smoke ok: $smoke"
|
||||
|
||||
# ── 7. Recreate the container ─────────────────────────────────────────────
|
||||
log "recreating container $CONTAINER"
|
||||
docker compose -f "$COMPOSE_FILE" up -d --force-recreate node \
|
||||
|| die "docker compose up failed" 1
|
||||
|
||||
# ── 8. Wait for health ────────────────────────────────────────────────────
|
||||
log "waiting for health at $HEALTH_URL"
|
||||
for i in $(seq 1 30); do
|
||||
if curl -fsS -m 3 "${auth_args[@]}" "$HEALTH_URL" >/dev/null 2>&1; then
|
||||
log "node healthy after $((i*2))s — update done ($version_tag @ ${version_commit:0:8})"
|
||||
printf '%s\n' "$version_tag" > .last-update
|
||||
exit 0
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
|
||||
log "new container did not become healthy in 60s — dumping logs"
|
||||
docker logs --tail 80 "$CONTAINER" || true
|
||||
exit 1
|
||||
Reference in New Issue
Block a user