跳转到主要内容
WuKongIM 多节点集群提供高可用性、容灾能力和负载均衡,适用于对数据安全要求高的大型应用。

集群特点

优点
  • 高可用性和容灾能力强
  • 支持在线扩容
  • 多副本间实时自动备份
  • 负载均衡
缺点
  • 部署稍复杂
  • 需要多台机器
集群原则:WuKongIM 遵循 2n+1 原则,n 表示允许宕机的节点数量。
  • 允许 1 台机器宕机:需要 3 台机器(2×1+1=3)
  • 允许 2 台机器宕机:需要 5 台机器(2×2+1=5)

环境要求

  • 机器数量:4台或以上
  • 操作系统:Linux(推荐 Ubuntu)
  • 配置:2核4G 或 4核8G
  • Docker:24.0.4 或以上版本
示例服务器配置
角色说明内网IP外网IP
负载均衡和监控gateway10.206.0.2119.45.33.109
WuKongIM节点node1 (ID: 1)10.206.0.10146.56.249.208
WuKongIM节点node2 (ID: 2)10.206.0.12129.211.171.99
WuKongIM节点node3 (ID: 3)10.206.0.5119.45.175.82

部署步骤

1. 安装负载均衡和监控

gateway 节点创建安装目录:
mkdir ~/gateway
cd ~/gateway
创建 docker-compose.yml 文件:
version: '3.7'
services:
  prometheus:  # 监控服务
    image: registry.cn-shanghai.aliyuncs.com/wukongim/prometheus:v2.53.1
    volumes:
      - "./prometheus.yml:/etc/prometheus/prometheus.yml"
    ports:
      - "9090:9090"
  nginx:  # 负载均衡
    image: registry.cn-shanghai.aliyuncs.com/wukongim/nginx:1.27.0
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    ports:
      - "15001:15001"
      - "15100:15100"
      - "15200:15200"
      - "15300:15300"
      - "15172:15172"
创建 nginx.conf 文件(替换 IP 地址为实际地址):
user  nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log notice;
pid        /var/run/nginx.pid;

events {
    use epoll;
    worker_connections  4096;
    multi_accept on;
    accept_mutex off;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;
    sendfile        on;
    keepalive_timeout  65;

    # API 负载均衡
    upstream wukongimapi {
        server 10.206.0.10:5001;
        server 10.206.0.12:5001;
        server 10.206.0.5:5001;
    }

    # Demo 负载均衡
    upstream wukongimdemo {
        server 10.206.0.10:5172;
        server 10.206.0.12:5172;
        server 10.206.0.5:5172;
    }

    # Manager 负载均衡
    upstream wukongimanager {
        server 10.206.0.10:5300;
        server 10.206.0.12:5300;
        server 10.206.0.5:5300;
    }

    # WebSocket 负载均衡
    upstream wukongimws {
        server 10.206.0.10:5200;
        server 10.206.0.12:5200;
        server 10.206.0.5:5200;
    }

    # HTTP API 转发
    server {
        listen 15001;
        location / {
            proxy_pass http://wukongimapi;
            proxy_connect_timeout 20s;
            proxy_read_timeout 60s;
        }
    }

    # Demo 界面
    server {
        listen 15172;
        location / {
            proxy_pass http://wukongimdemo;
            proxy_connect_timeout 20s;
            proxy_read_timeout 60s;
        }
        location /login {
            rewrite ^ /chatdemo?apiurl=http://119.45.33.109:15001;
            proxy_pass http://wukongimdemo;
            proxy_connect_timeout 20s;
            proxy_read_timeout 60s;
        }
    }

    # Manager 界面
    server {
        listen 15300;
        location / {
            proxy_pass http://wukongimanager;
            proxy_connect_timeout 60s;
            proxy_read_timeout 60s;
        }
    }

    # WebSocket 转发
    server {
        listen 15200;
        location / {
            proxy_pass http://wukongimws;
            proxy_redirect off;
            proxy_http_version 1.1;
            proxy_read_timeout 180s;
            proxy_send_timeout 120s;
            proxy_connect_timeout 4s;
            proxy_set_header  X-Real-IP $remote_addr;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
        }
    }
}

# TCP 负载均衡
stream {
    upstream wukongimtcp {
        server 10.206.0.10:5100;
        server 10.206.0.12:5100;
        server 10.206.0.5:5100;
    }
    server {
        listen 15100;
        proxy_connect_timeout 4s;
        proxy_timeout 120s;
        proxy_pass wukongimtcp;
    }
}
创建 prometheus.yml 文件(替换 IP 地址为实际地址):
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: wukongim1-trace-metrics
    static_configs:
    - targets: ['10.206.0.10:5300']
      labels:
        id: "1"
  - job_name: wukongim2-trace-metrics
    static_configs:
    - targets: ['10.206.0.12:5300']
      labels:
        id: "2"
  - job_name: wukongim3-trace-metrics
    static_configs:
    - targets: ['10.206.0.5:5300']
      labels:
        id: "3"

2. 安装 WuKongIM 节点

在所有 WuKongIM 节点创建安装目录:
mkdir ~/wukongim
cd ~/wukongim
节点 1 配置(替换 IP 地址为实际地址):
version: '3.7'
services:
  wukongim:
    image: registry.cn-shanghai.aliyuncs.com/wukongim/wukongim:v2
    environment:
      - "WK_MODE=release"
      - "WK_CLUSTER_NODEID=1"
      - "WK_INTRANET_TCPADDR=10.206.0.10:5100"
      - "WK_CLUSTER_APIURL=http://10.206.0.10:5001"
      - "WK_CLUSTER_SERVERADDR=10.206.0.10:11110"
      - "WK_EXTERNAL_WSADDR=ws://119.45.33.109:15200"
      - "WK_EXTERNAL_TCPADDR=119.45.33.109:15100"
      - "WK_TRACE_PROMETHEUSAPIURL=http://10.206.0.2:9090"
      - "WK_CLUSTER_INITNODES=1@10.206.0.10 2@10.206.0.12 3@10.206.0.5"
    healthcheck:
      test: "wget -q -Y off -O /dev/null http://localhost:5001/health > /dev/null 2>&1"
      interval: 10s
      timeout: 10s
      retries: 3
    restart: always
    volumes:
      - ./wukongim_data:/root/wukongim
    ports:
      - 11110:11110  # 分布式节点通讯端口
      - 5001:5001    # 内网 API 通讯端口
      - 5100:5100    # TCP 端口
      - 5200:5200    # WebSocket 端口
      - 5300:5300    # 管理端端口
      - 5172:5172    # Demo 端口
节点 2 配置(替换 IP 地址为实际地址):
version: '3.7'
services:
  wukongim:
    image: registry.cn-shanghai.aliyuncs.com/wukongim/wukongim:v2
    environment:
      - "WK_MODE=release"
      - "WK_CLUSTER_NODEID=2"
      - "WK_CLUSTER_APIURL=http://10.206.0.12:5001"
      - "WK_CLUSTER_SERVERADDR=10.206.0.12:11110"
      - "WK_EXTERNAL_WSADDR=ws://119.45.33.109:15200"
      - "WK_EXTERNAL_TCPADDR=119.45.33.109:15100"
      - "WK_INTRANET_TCPADDR=10.206.0.12:5100"
      - "WK_TRACE_PROMETHEUSAPIURL=http://10.206.0.2:9090"
      - "WK_CLUSTER_INITNODES=1@10.206.0.10 2@10.206.0.12 3@10.206.0.5"
    healthcheck:
      test: "wget -q -Y off -O /dev/null http://localhost:5001/health > /dev/null 2>&1"
      interval: 10s
      timeout: 10s
      retries: 3
    restart: always
    volumes:
      - ./wukongim_data:/root/wukongim
    ports:
      - 11110:11110
      - 5001:5001
      - 5100:5100
      - 5200:5200
      - 5300:5300
      - 5172:5172
节点 3 配置(替换 IP 地址为实际地址):
version: '3.7'
services:
  wukongim:
    image: registry.cn-shanghai.aliyuncs.com/wukongim/wukongim:v2
    environment:
      - "WK_MODE=release"
      - "WK_CLUSTER_NODEID=3"
      - "WK_CLUSTER_APIURL=http://10.206.0.5:5001"
      - "WK_CLUSTER_SERVERADDR=10.206.0.5:11110"
      - "WK_EXTERNAL_WSADDR=ws://119.45.33.109:15200"
      - "WK_EXTERNAL_TCPADDR=119.45.33.109:15100"
      - "WK_INTRANET_TCPADDR=10.206.0.5:5100"
      - "WK_TRACE_PROMETHEUSAPIURL=http://10.206.0.2:9090"
      - "WK_CLUSTER_INITNODES=1@10.206.0.10 2@10.206.0.12 3@10.206.0.5"
    healthcheck:
      test: "wget -q -Y off -O /dev/null http://localhost:5001/health > /dev/null 2>&1"
      interval: 10s
      timeout: 10s
      retries: 3
    restart: always
    volumes:
      - ./wukongim_data:/root/wukongim
    ports:
      - 11110:11110
      - 5001:5001
      - 5100:5100
      - 5200:5200
      - 5300:5300
      - 5172:5172

3. 启动服务

启动顺序
  1. 先启动负载均衡和监控:
# 在 gateway 节点
cd ~/gateway
docker-compose up -d
  1. 再启动所有 WuKongIM 节点:
# 在每个 WuKongIM 节点
cd ~/wukongim
docker-compose up -d

4. 验证部署

检查服务状态
# 检查容器状态
docker-compose ps

# 查看日志
docker-compose logs -f
验证集群状态
# 检查集群节点
curl http://119.45.33.109:15001/cluster/nodes

# 检查健康状态
curl http://119.45.33.109:15001/health
访问服务

配置说明

关键环境变量

变量名说明示例值
WK_CLUSTER_NODEID节点 ID1, 2, 3
WK_CLUSTER_APIURL节点 API 地址http://10.206.0.10:5001
WK_CLUSTER_SERVERADDR节点通讯地址10.206.0.10:11110
WK_CLUSTER_INITNODES初始节点列表1@10.206.0.10 2@10.206.0.12 3@10.206.0.5
WK_EXTERNAL_WSADDR外部 WebSocket 地址ws://119.45.33.109:15200
WK_EXTERNAL_TCPADDR外部 TCP 地址119.45.33.109:15100

端口说明

端口说明访问方式
5001HTTP API内网访问
5100TCP 连接客户端连接
5200WebSocket客户端连接
5300管理界面Web 访问
5172Demo 界面Web 访问
11110集群通讯节点间通讯
15001负载均衡 API外网访问
15100负载均衡 TCP外网访问
15200负载均衡 WebSocket外网访问
15300负载均衡管理外网访问
15172负载均衡 Demo外网访问

故障排除

常见问题

节点无法加入集群
# 检查网络连通性
ping 10.206.0.10

# 检查端口是否开放
telnet 10.206.0.10 11110

# 查看节点日志
docker-compose logs wukongim
负载均衡无法访问
# 检查 nginx 配置
docker-compose exec nginx nginx -t

# 重启 nginx
docker-compose restart nginx
监控数据异常
# 检查 Prometheus 配置
curl http://119.45.33.109:9090/api/v1/targets

# 重启监控服务
docker-compose restart prometheus

日志查看

# 查看所有服务日志
docker-compose logs

# 查看特定服务日志
docker-compose logs wukongim
docker-compose logs nginx
docker-compose logs prometheus

# 实时查看日志
docker-compose logs -f wukongim

扩容操作

添加新节点到现有集群:
  1. 在新节点创建配置文件
  2. 设置新的节点 ID
  3. 更新 WK_CLUSTER_INITNODES 包含新节点
  4. 启动新节点服务
  5. 更新负载均衡配置

下一步