快捷搜索:  汽车  科技

flink部署在hadoop上:超详细步骤 整合Apache Hudi

flink部署在hadoop上:超详细步骤 整合Apache Hudi[flink@dbos-bigdata-test005 hive]$ cd /opt/flink[flink@dbos-bigdata-test005 flink]$ lltotal 496drwxrwxr-x 2 flink flink 4096 May 25 20:36 bindrwxrwxr-x 2 flink flink 4096 Nov 4 17:22 confdrwxrwxr-x 7 flink flink 4096 May 25 20:36 examplesdrwxrwxr-x 2 flink flink 4096 Nov 4 13:58 lib-rw-r--r-- 1 flink flink 11357 Oct 29 2019 LICENSEdrwxrwxr-x 2 flink flink 4096 May 25 20:37 licensesdrwxr-xr-x 2 fli

设置Hive辅助JAR目录

flink部署在hadoop上:超详细步骤 整合Apache Hudi(1)

因为后面考虑到hudi的数据存到oss,所以要放这几个包进来(关于oss的配置详细可参考oss配置文档)

flink部署在hadoop上:超详细步骤 整合Apache Hudi(2)

重启Hive,使配置生效

flink部署在hadoop上:超详细步骤 整合Apache Hudi(3)

2. 测试demo

创建kafka数据

//创建topickafka-topics --zookeeper dbos-bigdata-test003:2181 dbos-bigdata-test004:2181 dbos-bigdata-test005:2181/kafka --create --partitions 4 --replication-factor 3 --topic test //删除topickafka-topics --zookeeper dbos-bigdata-test003:2181 dbos-bigdata-test004:2181 dbos-bigdata-test005:2181/kafka --delete--topic test//生产数据kafka-console-producer --broker-list dbos-bigdata-test003:9092 dbos-bigdata-test004:9092 dbos-bigdata-test005:9092 --topic test//直接复制数据{"TINYINT0": 6 "smallint1": 223 "int2": 42999 "bigint3": 429450 "float4": 95.47324181659323 "double5": 340.5755392968011 "decimal6": 111.1111 "boolean7": true "char8": "dddddd" "varchar9": "buy0" "string10": "buy1" "timestamp11": "2021-09-13 03:08:50.810"}

启动flink-sql

[flink@dbos-bigdata-test005 hive]$ cd /opt/flink[flink@dbos-bigdata-test005 flink]$ lltotal 496drwxrwxr-x 2 flink flink 4096 May 25 20:36 bindrwxrwxr-x 2 flink flink 4096 Nov 4 17:22 confdrwxrwxr-x 7 flink flink 4096 May 25 20:36 examplesdrwxrwxr-x 2 flink flink 4096 Nov 4 13:58 lib-rw-r--r-- 1 flink flink 11357 Oct 29 2019 LICENSEdrwxrwxr-x 2 flink flink 4096 May 25 20:37 licensesdrwxr-xr-x 2 flink flink 4096 Jan 30 2021 log-rw-rw-r-- 1 flink flink 455180 May 25 20:37 NOTICEdrwxrwxr-x 3 flink flink 4096 May 25 20:36 optdrwxrwxr-x 10 flink flink 4096 May 25 20:36 plugins-rw-r--r-- 1 flink flink 1309 Jan 30 2021 README.txt[flink@dbos-bigdata-test005 flink]$ ./bin/sql-client.sh

flink部署在hadoop上:超详细步骤 整合Apache Hudi(4)

执行Hudi的Demo语句

Hudi 表分为 COW 和 MOR两种类型COW 表适用于离线批量更新场景,对于更新数据,会先读取旧的 basefile,然后合并更新数据,生成新的basefile。MOR 表适用于实时高频更新场景,更新数据会直接写入 log file 中,读时再进行合并。为了减少读放大的问题,会定期合并 log file 到 basefile 中。

flink部署在hadoop上:超详细步骤 整合Apache Hudi(5)

猜您喜欢: