首页 > 其他分享 >【云原生】Apache Livy on k8s 讲解与实战操作

【云原生】Apache Livy on k8s 讲解与实战操作

时间:2022-11-07 23:13:02浏览次数:72  
标签:opt livy apache Values Livy Apache HOME spark k8s

目录

一、概述

Livy是一个提供Rest接口和spark集群交互的服务。它可以提交Spark Job或者Spark一段代码,同步或者异步的返回结果;也提供Sparkcontext的管理,通过Restful接口或RPC客户端库。Livy也简化了与Spark与应用服务的交互,这允许通过web/mobile与Spark的使用交互。

官网:https://livy.incubator.apache.org/
GitHub地址:https://github.com/apache/incubator-livy
关于Apache Livy更多介绍也可以参考我这篇文章:Spark开源REST服务——Apache Livy(Spark 客户端)

二、开始编排部署

1)部署包准备

这里也提供上面编译好的livy部署包,有需要的小伙伴可以自行下载:

链接:https://pan.baidu.com/s/1pPCbe0lUJ6ji8rvQYsVw9A?pwd=qn7i
提取码:qn7i

1)构建镜像

Dockerfile

FROM myharbor.com/bigdata/centos:7.9.2009

RUN rm -f /etc/localtime && ln -sv /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo "Asia/Shanghai" > /etc/timezone
RUN export LANG=zh_CN.UTF-8

### install tools
RUN yum install -y vim tar wget curl less telnet net-tools lsof

RUN groupadd --system --gid=9999 admin && useradd --system -m /home/admin --uid=9999 --gid=admin admin

RUN mkdir -p /opt/apache

ADD apache-livy-0.8.0-incubating-SNAPSHOT-bin.zip /opt/apache/
ENV LIVY_HOME=/opt/apache/apache-livy
RUN ln -s /opt/apache/apache-livy-0.8.0-incubating-SNAPSHOT-bin $LIVY_HOME

ADD hadoop-3.3.2.tar.gz /opt/apache/
ENV HADOOP_HOME=/opt/apache/hadoop
RUN ln -s /opt/apache/hadoop-3.3.2 $HADOOP_HOME
ENV HADOOP_CONFIG_DIR=${HADOOP_HOME}/etc/hadoop

ADD spark-3.3.0-bin-hadoop3.tar.gz /opt/apache/
ENV SPARK_HOME=/opt/apache/spark
RUN ln -s /opt/apache/spark-3.3.0-bin-hadoop3 $SPARK_HOME

ENV PATH=${LIVY_HOME}/bin:${HADOOP_HOME}/bin:${SPARK_HOME}/bin:$PATH

RUN chown -R admin:admin /opt/apache

WORKDIR $LIVY_HOME

ENTRYPOINT ${LIVY_HOME}/bin/livy-server start;tail -f ${LIVY_HOME}/logs/livy-root-server.out

【注意】hadoop包里的core-site.xmlhdfs-site.xmlyarn-site.xml

开始构建镜像

docker build -t myharbor.com/bigdata/livy:0.8.0 . --no-cache

### 参数解释
# -t:指定镜像名称
# . :当前目录Dockerfile
# -f:指定Dockerfile路径
#  --no-cache:不缓存

# 推送到harbor
docker push myharbor.com/bigdata/livy:0.8.0

2)创建livy chart模板

helm create livy

3)修改yaml编排

  • livy/values.yaml
replicaCount: 1

image:
  repository: myharbor.com/bigdata/livy
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: "0.8.0"

securityContext:
  runAsUser: 9999
  runAsGroup: 9999
  privileged: true

service:
  type: NodePort
  port: 8998
  nodePort: 31998
  • livy/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "livy.fullname" . }}
  labels:
    {{- include "livy.labels" . | nindent 4 }}
data:
  livy.conf: |-
    livy.spark.master = yarn
    livy.spark.deploy-mode = client
    livy.environment = production
    livy.impersonation.enabled = true
    livy.server.csrf_protection.enabled = false
    livy.server.port = {{ .Values.service.port }}
    livy.server.session.timeout = 3600000
    livy.server.recovery.mode = recovery
    livy.server.recovery.state-store = filesystem
    livy.server.recovery.state-store.url = /tmp/livy
    livy.repl.enable-hive-context = true
  livy-env.sh: |-
    export JAVA_HOME=/opt/apache/jdk1.8.0_212
    export HADOOP_HOME=/opt/apache/hadoop
    export HADOOP_CONF_DIR=/opt/apache/hadoop/etc/hadoop
    export SPARK_HOME=/opt/apache/spark
    export SPARK_CONF_DIR=/opt/apache/spark/conf
    export LIVY_LOG_DIR=/opt/apache/livy/logs
    export LIVY_PID_DIR=/opt/apache/livy/pid-dir
    export LIVY_SERVER_JAVA_OPTS="-Xmx512m"
  spark-blacklist.conf: |-
    spark.master
    spark.submit.deployMode

    # Disallow overriding the location of Spark cached jars.
    spark.yarn.jar
    spark.yarn.jars
    spark.yarn.archive

    # Don't allow users to override the RSC timeout.
    livy.rsc.server.idle-timeout
  • livy/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "livy.fullname" . }}
  labels:
    {{- include "livy.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "livy.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      {{- with .Values.podAnnotations }}
      annotations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      labels:
        {{- include "livy.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "livy.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: 8998
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /
              port: http
          readinessProbe:
            httpGet:
              path: /
              port: http
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          {{- with .Values.securityContext }}
          securityContext:
            runAsUser: {{ .runAsUser }}
            runAsGroup: {{ .runAsGroup }}
            privileged: {{ .privileged }}
          {{- end }}
          volumeMounts:
            - name: {{ .Release.Name }}-livy-conf
              mountPath: /opt/apache/livy/conf/livy.conf
              subPath: livy.conf
            - name: {{ .Release.Name }}-livy-env
              mountPath: /opt/apache/livy/conf/livy-env.sh
              subPath: livy-env.sh
            - name: {{ .Release.Name }}-spark-blacklist-conf
              mountPath: /opt/apache/livy/conf/spark-blacklist.conf
              subPath: spark-blacklist.conf
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      volumes:
        - name: {{ .Release.Name }}-livy-conf
          configMap:
            name: {{ include "livy.fullname" . }}
        - name: {{ .Release.Name }}-livy-env
          configMap:
            name: {{ include "livy.fullname" . }}
        - name: {{ .Release.Name }}-spark-blacklist-conf
          configMap:
            name: {{ include "livy.fullname" . }}

4)开始部署

helm install livy ./livy -n livy --create-namespace

NOTES

NOTES:
1. Get the application URL by running these commands:
  export NODE_PORT=$(kubectl get --namespace livy -o jsonpath="{.spec.ports[0].nodePort}" services livy)
  export NODE_IP=$(kubectl get nodes --namespace livy -o jsonpath="{.items[0].status.addresses[0].address}")
  echo http://$NODE_IP:$NODE_PORT

在这里插入图片描述

查看

kubectl get pods,svc -n livy -owide

在这里插入图片描述

web地址:http://192.168.182.110:31998/ui
在这里插入图片描述

5)测试验证

curl -s -XPOST -d '{"file":"hdfs://myhdfs/tmp/spark-examples_2.12-3.3.0.jar","className":"org.apache.spark.examples.SparkPi","name":"SparkPi-test"}'  -H "Content-Type: application/json"  http://local-168-182-110:31998/batches|python -m json.tool

在这里插入图片描述
在这里插入图片描述

6)卸载

helm uninstall livy -n livy

git地址:https://gitee.com/hadoop-bigdata/livy-on-k8s

Apache Livy on k8s 编排部署讲解就先到这里了,有疑问的小伙伴欢迎给我留言,后续会持续更新【云原生+大数据】相关的文章,请小伙伴耐心等待~

标签:opt,livy,apache,Values,Livy,Apache,HOME,spark,k8s
From: https://www.cnblogs.com/liugp/p/16867859.html

相关文章

  • Apache Beam基本架构
    ApacheBeam主要由BeamSDK和BeamRunner两部分组成。BeamSDK定义了开发分布式数据处理程序业务逻辑的API,它描述的分布式数据处理任务Pipeline则交给具体的BeamRunner(执行......
  • 搭建一个小巧完备的K8S环境(chrono《kubernetes入门实战课》笔记整理)
     【概念说明】kubernetes,因为k和s之间,有8个字母,所以通常又称为K8S。用来对容器进行调度和管理的,即用来对容器进行编排的。如果只有简单的几个镜像,确实不需要k8s,但是如果......
  • K8s 有损发布问题探究
    作者:魁予问题提出流量有损是在应用发布时的常见问题,其现象通常会反馈到流量监控上,如下图所示,发布过程中服务RT突然升高,造成部分业务响应变慢,给用户的最直观体验就是卡顿;或......
  • K8s 有损发布问题探究
    作者:魁予问题提出流量有损是在应用发布时的常见问题,其现象通常会反馈到流量监控上,如下图所示,发布过程中服务RT突然升高,造成部分业务响应变慢,给用户的最直观体验就是卡......
  • 基于k8s的发布系统的实现
    综述首先,本篇文章所介绍的内容,已经有完整的实现,可以参考这里。在微服务、DevOps和云平台流行的当下,使用一个高效的持续集成工具也是一个非常重要的事情。虽然市面上目前......
  • k8s实战入门——Pod
    PodPod是kubernetes集群进行管理的最小单元,程序要运行必须部署在容器中,而容器必须存在于Pod中。Pod可以认为是容器的封装,一个Pod中可以存在一个或者多个容器。kubernet......
  • 系统整理K8S的配置管理实战-建议收藏系列
    目录一、ConfigMap1.1、创建1.1.1、from-file1.1.2、from-env-file1.1.3、from-literal1.1.4、基于yaml文件创建1.2、Pod使用ConfigMap1.2.1、valueFrom1.2.2、envFrom1.2.......
  • 华为云 MRS 基于 Apache Hudi 极致查询优化的探索实践
    背景湖仓一体(LakeHouse)是一种新的开放式架构,它结合了数据湖和数据仓库的最佳元素,是当下大数据领域的重要发展方向。华为云早在2020年就开始着手相关技术的预研,并落地在华......
  • k8s实战入门——Namespace
    NamespaceNamespace是kubernetes系统中的一种非常重要资源,它的主要作用是用来实现多套环境的资源隔离或者多租户的资源隔离。默认情况下,kubernetes集群中的所有的Pod都是......
  • k8s 中的 ingress 使用细节
    k8s中的ingress什么是ingressk8s中使用Service为相同业务的Pod对象提供一个固定、统一的访问接口及负载均衡的能力,那么这些Service如何被外部的应用访问,其中常用的就......