flink部署在hadoop上:超详细步骤 整合Apache Hudi
flink部署在hadoop上:超详细步骤 整合Apache Hudi
在packaging下面各个组件中编译成功的jar包
将hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar放到flink1.13.1的lib目录下可以开启Hudi数据湖之旅了。1.2 配置Flink On yarn模式
flink-conf.yaml的配置文件如下
execution.target: yarn-per-job
#execution.target: local
execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION
#进行checkpointing的间隔时间(单位:毫秒)
execution.checkpointing.interval: 30000
execution.checkpointing.mode: EXACTLY_ONCE
#execution.checkpointing.prefer-checkpoint-for-recovery: true
classloader.check-leaked-classloader: false
jobmanager.rpc.address: dbos-bigdata-test005
# The RPC port where the JobManager is reachable.
jobmanager.rpc.port: 6123
akka.framesize: 10485760b
jobmanager.memory.process.size: 1024m
taskmanager.heap.size: 1024m
taskmanager.numberOfTaskSlots: 1
# The parallelism used for programs that did not specify and other parallelism.
parallelism.default: 1
env.java.home key: /usr/java/jdk1.8.0_181-cloudera
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha/
high-availability.zookeeper.quorum: dbos-bigdata-test003:2181 dbos-bigdata-test004:2181 dbos-bigdata-test005:2181
state.backend: filesystem
# Directory for checkpoints filesystem when using any of the default bundled
# state backends.
#
state.checkpoints.dir: hdfs://bigdata/flink-checkpoints
jobmanager.execution.failover-strategy: region
env.log.dir: /tmp/flink
high-availability.zookeeper.path.root: /flink
配置Flink环境变量
vim /etc/profile
以下是环境变量,根据自己的版本进行更改
#set default jdk1.8 env
exportJAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
exportJRE_HOME=/usr/java/jdk1.8.0_181-cloudera/jre
exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
exportHADOOP_CONF_DIR=/etc/hadoop/conf
exportHADOOP_CLASSPATH=`hadoop classpath`
exportHBASE_CONF_DIR=/etc/hbase/conf
exportFLINK_HOME=/opt/flink
exportHIVE_HOME=/opt/cloudera/parcels/CDH-6.3.0-1.cdh6.3.0.p0.1279813/lib/hive
exportHIVE_CONF_DIR=/etc/hive/conf
exportM2_HOME=/usr/local/maven/apache-maven-3.5.4
exportCANAL_ADMIN_HOME=/data/canal/admin
exportCANAL_SERVER_HOME=/data/canal/deployer
exportPATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:${FLINK_HOME}/bin:${M2_HOME}/bin:${HIVE_HOME}/bin:${CANAL_SERVER_HOME}/bin:${CANAL_ADMIN_HOME}/bin:$PATH
检查Flink是否正常
Hudi编译好的jar包和kafka的jar包放到Flink的lib目录下
![]()