快捷搜索:  汽车  科技

关于hive sql基本操作(HiveSQL高级进阶技巧案例)

关于hive sql基本操作(HiveSQL高级进阶技巧案例)select bigint(double(current_timestamp())*1000)4)连续登录天数-升级版concat(from_unixtime(cast(substr(1234567890123 0 10) as bigint) 'yyyy-MM-dd HH:dd:ss') '.' substr(1234567890123 11 13))-- 取13位bigint当前系统时间戳size(extract_emoji(regexp_replace(text_comment_content '@.*?\\)' ''))) > 0-- 纯表情size(extract_emoji(regexp_replace(text_comment_content '@.*?\\)' ''

直接上干货,HiveSQL高级进阶技巧,重要性不言而喻。掌握这10个技巧(中),你的SQL水平将有一个质的提升!

1)经纬度计算距离

select t1.uid t2.uid t1.latitude t1.longitude t2.latitude t2.longitude -- 距离 6378137*2*ASIN(SQRT(POWER(SIN((t1.longitude-t2.longitude)*ACOS(-1)/360) 2) COS(t1.longitude*ACOS(-1)/180)*COS(t2.longitude*ACOS(-1)/180)*POWER(SIN((t1.longitude-t2.longitude)*ACOS(-1)/360) 2))) as distance_ from ( select uid latitude longitude from t1 where ) t1 left join ( select uid latitude longitude from t2 where ) t2 on t1.uid = t2.uid

2)带emoji表情/纯emoji表情

-- 含表情

size(extract_emoji(regexp_replace(text_comment_content '@.*?\\)' ''))) > 0

-- 纯表情

size(extract_emoji(regexp_replace(text_comment_content '@.*?\\)' ''))) = length(text_comment_content)

3)13bigint转时间戳

-- 13位bigint 转 yyyy-MM-dd HH:dd:ss.sss

concat(from_unixtime(cast(substr(1234567890123 0 10) as bigint) 'yyyy-MM-dd HH:dd:ss') '.' substr(1234567890123 11 13))

-- 取13位bigint当前系统时间戳

select bigint(double(current_timestamp())*1000)

4)连续登录天数-升级版

-- 用户最近一次连续观看博主天数(注:连续天数>=2;若博主未开播,用户观看连续状态不中断)

select uid bid continue_play_days -- 连续观看天数(最近一次) from ( select uid bid dt continue_play_days from ( select uid bid dt rn_diff sum(1) over(partition by uid bid rn_diff) as continue_play_days row_number()over(partition by uid bid order by rn_diff) as rn from ( select coalesce(t1.uid -1) as uid t2.bid t2.dt row_number()over(partition by t1.uid t2.bid order by t2.dt desc ) as rn_play rn_author rn_author - row_number()over(partition by t1.uid t2.bid order by t2.dt desc ) as rn_diff from ( select uid -- 用户ID bid -- 博主ID dt from t1 -- 用户观看博主表 where dt.. group by uid bid dt) t1 full join ( select bid dt row_number() over(partition by bid order by dt desc ) as rn_author from ( select bid -- 博主ID dt from t2 -- 博主开播表 where dt.. group by bid dt) u ) t2 on t1.bid = t2.bid and t1.dt=t2.dt ) s where uid <> -1 ) ss where rn = 1 ) t where continue_play_days >= 2

5)lateral view outer 坑、丢数

当explode函数里传入的数据是否为null lateral view explode(null) tmp as id 时 结果不显示任何数据**(注意:是指其他字段的数据也不返回,原表数据丢失**很容易被忽视);

lateral view outer explode(null) tmp as id 时 结果显示其他字段是有数据的 但id显示为null 原表数据不会丢失

SELECT * FROM t1 lateral view outer explode(get_json_object(`data` '$.key')) t2 AS notice WHERE dt =

6)多维分析 grouping=1

在cube多维处理时,对维度组合中的维度用coalease或nvl处理存在隐患,无法区分维度本身的取值为null,建议用grouping()处理,处理逻辑如下:

if(grouping(xxx)=1 'ALL' xxx) as xxx

附官方截图:

关于hive sql基本操作(HiveSQL高级进阶技巧案例)(1)

7)引擎不一致,执行结果不同

to_json(named_struct()) 拼接json字符串时MR、Spark引擎返回结果不一致

to_json(named_struct('userID' userid)) 拼接json字符串中key值存在大写字母时,在spark执行时是保持原值的,但是在mr执行时都会转成小写字母。两个引擎返回结果不一致。

大家有在工作中碰到吗,欢迎一起讨论学习

猜您喜欢: