项目需要在ds中执行spark集群任务,并且交付方式是提供一个镜像,所以要把这3者做成一个镜像配置进行相应配置。
1.准备基础镜像
有大神已经制作好了spark+hadoop镜像,参考链接:https://zhuanlan.zhihu.com/p/421375012
我们下载此镜像
docker pull s1mplecc/spark-hadoop:3
然后准备ds-worker镜像
docker pull apache/dolphinscheduler-worker:latest
2.拷贝ds-worker容器中的/opt/dolphinscheduler文件夹到本地
先运行ds-worker镜像(具体可参考ds官方文档:https://dolphinscheduler.apache.org/zh-cn/docs/3.2.1/guide/start/docker)
然后拷贝/opt/dolphinscheduler至当前目录,c515是容器id:
docker cp c515:/opt/dolphinscheduler .
3.编写Dockerfile,然后制作镜像
FROM s1mplecc/spark-hadoop:3 ENV DOCKER true ENV TZ Asia/Shanghai ENV DOLPHINSCHEDULER_HOME /opt/dolphinscheduler RUN apt update ; \ apt install -y sudo ; \ rm -rf /var/lib/apt/lists/* WORKDIR $DOLPHINSCHEDULER_HOME ADD ./dolphinscheduler $DOLPHINSCHEDULER_HOME EXPOSE 1235
接着运行:
docker build -t dolphin-spark-hadoop:1 .
4.编写docker-compose.yml
首先参考下s1mplecc/spark-hadoop和dolphinscheduler提供的yml配置文件。如下:
s1mplecc/spark-hadoop:
version: '2' services: spark: image: s1mplecc/spark-hadoop:3 hostname: master environment: - SPARK_MODE=master - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no volumes: - ~/docker/spark/share:/opt/share ports: - '8080:8080' - '4040:4040' - '8088:8088' - '8042:8042' - '9870:9870' - '19888:19888' spark-worker-1: image: s1mplecc/spark-hadoop:3 hostname: worker1 environment: - SPARK_MODE=worker - SPARK_MASTER_URL=spark://master:7077 - SPARK_WORKER_MEMORY=1G - SPARK_WORKER_CORES=1 - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no volumes: - ~/docker/spark/share:/opt/share ports: - '8081:8081' spark-worker-2: image: s1mplecc/spark-hadoop:3 hostname: worker2 environment: - SPARK_MODE=worker - SPARK_MASTER_URL=spark://master:7077 - SPARK_WORKER_MEMORY=1G - SPARK_WORKER_CORES=1 - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no volumes: - ~/docker/spark/share:/opt/share ports: - '8082:8081'
还有dolphinscheduler:
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3.8" services: dolphinscheduler-postgresql: image: bitnami/postgresql:11.11.0 ports: - "5432:5432" profiles: ["all", "schema"] environment: POSTGRESQL_USERNAME: root POSTGRESQL_PASSWORD: root POSTGRESQL_DATABASE: dolphinscheduler volumes: - dolphinscheduler-postgresql:/bitnami/postgresql healthcheck: test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/5432"] interval: 5s timeout: 60s retries: 120 networks: - dolphinscheduler dolphinscheduler-zookeeper: image: bitnami/zookeeper:3.6.2 profiles: ["all"] environment: ALLOW_ANONYMOUS_LOGIN: "yes" ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons volumes: - dolphinscheduler-zookeeper:/bitnami/zookeeper healthcheck: test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/2181"] interval: 5s timeout: 60s retries: 120 networks: - dolphinscheduler dolphinscheduler-schema-initializer: image: ${HUB}/dolphinscheduler-tools:${TAG} env_file: .env profiles: ["schema"] command: [ tools/bin/upgrade-schema.sh ] depends_on: dolphinscheduler-postgresql: condition: service_healthy volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler networks: - dolphinscheduler dolphinscheduler-api: image: ${HUB}/dolphinscheduler-api:${TAG} ports: - "12345:12345" - "25333:25333" profiles: ["all"] env_file: .env healthcheck: test: [ "CMD", "curl", "http://localhost:12345/dolphinscheduler/actuator/health" ] interval: 30s timeout: 5s retries: 3 depends_on: dolphinscheduler-zookeeper: condition: service_healthy volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler networks: - dolphinscheduler dolphinscheduler-alert: image: ${HUB}/dolphinscheduler-alert-server:${TAG} profiles: ["all"] env_file: .env healthcheck: test: [ "CMD", "curl", "http://localhost:50053/actuator/health" ] interval: 30s timeout: 5s retries: 3 volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs networks: - dolphinscheduler dolphinscheduler-master: image: ${HUB}/dolphinscheduler-master:${TAG} profiles: ["all"] env_file: .env healthcheck: test: [ "CMD", "curl", "http://localhost:5679/actuator/health" ] interval: 30s timeout: 5s retries: 3 depends_on: dolphinscheduler-zookeeper: condition: service_healthy volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft networks: - dolphinscheduler dolphinscheduler-worker: image: ${HUB}/dolphinscheduler-worker:${TAG} profiles: ["all"] env_file: .env healthcheck: test: [ "CMD", "curl", "http://localhost:1235/actuator/health" ] interval: 30s timeout: 5s retries: 3 depends_on: dolphinscheduler-zookeeper: condition: service_healthy volumes: - dolphinscheduler-worker-data:/tmp/dolphinscheduler - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler networks: - dolphinscheduler networks: dolphinscheduler: driver: bridge volumes: dolphinscheduler-postgresql: dolphinscheduler-zookeeper: dolphinscheduler-worker-data: dolphinscheduler-logs: dolphinscheduler-shared-local: dolphinscheduler-resource-local:
我们要做的就是将ds部署文件里的worker节点替换成第三步制作好的新镜像dolphin-spark-hadoop,写好的yaml文件如下:
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3.8" services: dolphinscheduler-postgresql: image: bitnami/postgresql:11.11.0 ports: - "5432:5432" profiles: ["all", "schema"] environment: POSTGRESQL_USERNAME: root POSTGRESQL_PASSWORD: root POSTGRESQL_DATABASE: dolphinscheduler volumes: - dolphinscheduler-postgresql:/bitnami/postgresql healthcheck: test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/5432"] interval: 5s timeout: 60s retries: 120 networks: - dolphinscheduler dolphinscheduler-zookeeper: image: bitnami/zookeeper:3.6.2 profiles: ["all"] environment: ALLOW_ANONYMOUS_LOGIN: "yes" ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons volumes: - dolphinscheduler-zookeeper:/bitnami/zookeeper healthcheck: test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/2181"] interval: 5s timeout: 60s retries: 120 networks: - dolphinscheduler dolphinscheduler-schema-initializer: image: ${HUB}/dolphinscheduler-tools:${TAG} env_file: .env profiles: ["schema"] command: [ tools/bin/upgrade-schema.sh ] depends_on: dolphinscheduler-postgresql: condition: service_healthy volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler networks: - dolphinscheduler dolphinscheduler-api: image: ${HUB}/dolphinscheduler-api:${TAG} ports: - "12345:12345" - "25333:25333" profiles: ["all"] env_file: .env healthcheck: test: [ "CMD", "curl", "http://localhost:12345/dolphinscheduler/actuator/health" ] interval: 30s timeout: 5s retries: 3 depends_on: dolphinscheduler-zookeeper: condition: service_healthy volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler networks: - dolphinscheduler dolphinscheduler-alert: image: ${HUB}/dolphinscheduler-alert-server:${TAG} profiles: ["all"] env_file: .env healthcheck: test: [ "CMD", "curl", "http://localhost:50053/actuator/health" ] interval: 30s timeout: 5s retries: 3 volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs networks: - dolphinscheduler dolphinscheduler-master: image: ${HUB}/dolphinscheduler-master:${TAG} profiles: ["all"] env_file: .env healthcheck: test: [ "CMD", "curl", "http://localhost:5679/actuator/health" ] interval: 30s timeout: 5s retries: 3 depends_on: dolphinscheduler-zookeeper: condition: service_healthy volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft networks: - dolphinscheduler dolphinscheduler-worker-1: image: dolphin-spark-hadoop:1 hostname: master profiles: ["all"] env_file: .env environment: - SPARK_MODE=master - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no healthcheck: test: [ "CMD", "curl", "http://localhost:1235/actuator/health" ] interval: 30s timeout: 5s retries: 3 depends_on: dolphinscheduler-zookeeper: condition: service_healthy volumes: - dolphinscheduler-worker-data:/tmp/dolphinscheduler - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler ports: - '8080:8080' - '4040:4040' - '8088:8088' - '8042:8042' - '9870:9870' - '19888:19888' networks: - dolphinscheduler dolphinscheduler-worker-2: image: dolphin-spark-hadoop:1 hostname: worker1 profiles: ["all"] env_file: .env environment: - SPARK_MODE=worker - SPARK_MASTER_URL=spark://master:7077 - SPARK_WORKER_MEMORY=1G - SPARK_WORKER_CORES=1 - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no healthcheck: test: [ "CMD", "curl", "http://localhost:1235/actuator/health" ] interval: 30s timeout: 5s retries: 3 depends_on: dolphinscheduler-zookeeper: condition: service_healthy volumes: - dolphinscheduler-worker-data:/tmp/dolphinscheduler - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler ports: - '8081:8081' networks: - dolphinscheduler dolphinscheduler-worker-3: image: dolphin-spark-hadoop:1 hostname: worker2 profiles: ["all"] env_file: .env environment: - SPARK_MODE=worker - SPARK_MASTER_URL=spark://master:7077 - SPARK_WORKER_MEMORY=1G - SPARK_WORKER_CORES=1 - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no healthcheck: test: [ "CMD", "curl", "http://localhost:1235/actuator/health" ] interval: 30s timeout: 5s retries: 3 depends_on: dolphinscheduler-zookeeper: condition: service_healthy volumes: - dolphinscheduler-worker-data:/tmp/dolphinscheduler - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler ports: - '8082:8081' networks: - dolphinscheduler networks: dolphinscheduler: driver: bridge volumes: dolphinscheduler-postgresql: dolphinscheduler-zookeeper: dolphinscheduler-worker-data: dolphinscheduler-logs: dolphinscheduler-shared-local: dolphinscheduler-resource-local:
注意:s1mplecc/spark-hadoop里运行了一主两从,总共3个spark+hadoop节点,所以ds-worker也要对应起3个。如果要起更多worker节点的话,需要调整hadoop配置里的workers,此处需要重新制作镜像或挂载新的workers配置文件,同时ds-work节点也要和spark+hadoop相对应。
5.运行docker-compose
参考ds官方文档:
# 如果需要初始化或者升级数据库结构,需要指定profile为schema $ docker-compose --profile schema up -d # 启动dolphinscheduler所有服务,指定profile为all $ docker-compose --profile all up -d
注意要在第4步新制作的yaml文件目录下执行
6.进入容器内执行hadoop集群启动脚本和ds-worker启动脚本
进入容器后:
../start-hadoop.sh
然后再执行:
nohup /opt/dolphinscheduler/bin/start.sh > /dev/null 2>&1 &
稍后ds以及spark、hadoop就可以正常运行了,可以在ds上运行spark任务了^_^
标签:opt,logs,dolphinscheduler,worker,hadoop,SPARK,spark From: https://www.cnblogs.com/unique--soul/p/18222856