flink部署在hadoop上:超详细步骤 整合Apache Hudi
flink部署在hadoop上:超详细步骤 整合Apache Hudi[flink@dbos-bigdata-test005 hive]$ cd /opt/flink[flink@dbos-bigdata-test005 flink]$ lltotal 496drwxrwxr-x 2 flink flink 4096 May 25 20:36 bindrwxrwxr-x 2 flink flink 4096 Nov 4 17:22 confdrwxrwxr-x 7 flink flink 4096 May 25 20:36 examplesdrwxrwxr-x 2 flink flink 4096 Nov 4 13:58 lib-rw-r--r-- 1 flink flink 11357 Oct 29 2019 LICENSEdrwxrwxr-x 2 flink flink 4096 May 25 20:37 licensesdrwxr-xr-x 2 fli
设置Hive辅助JAR目录
因为后面考虑到hudi的数据存到oss,所以要放这几个包进来(关于oss的配置详细可参考oss配置文档)
重启Hive,使配置生效
2. 测试demo创建kafka数据
//创建topic
kafka-topics --zookeeper dbos-bigdata-test003:2181 dbos-bigdata-test004:2181 dbos-bigdata-test005:2181/kafka --create --partitions 4 --replication-factor 3 --topic test
//删除topic
kafka-topics --zookeeper dbos-bigdata-test003:2181 dbos-bigdata-test004:2181 dbos-bigdata-test005:2181/kafka --delete--topic test
//生产数据
kafka-console-producer --broker-list dbos-bigdata-test003:9092 dbos-bigdata-test004:9092 dbos-bigdata-test005:9092 --topic test
//直接复制数据
{"TINYINT0": 6 "smallint1": 223 "int2": 42999 "bigint3": 429450 "float4": 95.47324181659323 "double5": 340.5755392968011 "decimal6": 111.1111 "boolean7": true "char8": "dddddd" "varchar9": "buy0" "string10": "buy1" "timestamp11": "2021-09-13 03:08:50.810"}
启动flink-sql
[flink@dbos-bigdata-test005 hive]$ cd /opt/flink
[flink@dbos-bigdata-test005 flink]$ ll
total 496
drwxrwxr-x 2 flink flink 4096 May 25 20:36 bin
drwxrwxr-x 2 flink flink 4096 Nov 4 17:22 conf
drwxrwxr-x 7 flink flink 4096 May 25 20:36 examples
drwxrwxr-x 2 flink flink 4096 Nov 4 13:58 lib
-rw-r--r-- 1 flink flink 11357 Oct 29 2019 LICENSE
drwxrwxr-x 2 flink flink 4096 May 25 20:37 licenses
drwxr-xr-x 2 flink flink 4096 Jan 30 2021 log
-rw-rw-r-- 1 flink flink 455180 May 25 20:37 NOTICE
drwxrwxr-x 3 flink flink 4096 May 25 20:36 opt
drwxrwxr-x 10 flink flink 4096 May 25 20:36 plugins
-rw-r--r-- 1 flink flink 1309 Jan 30 2021 README.txt
[flink@dbos-bigdata-test005 flink]$ ./bin/sql-client.sh
执行Hudi的Demo语句
Hudi 表分为 COW 和 MOR两种类型
COW 表适用于离线批量更新场景,对于更新数据,会先读取旧的 basefile,然后合并更新数据,生成新的basefile。
MOR 表适用于实时高频更新场景,更新数据会直接写入 log file 中,读时再进行合并。为了减少读放大的问题,会定期合并 log file 到 basefile 中。