关于hive sql基本操作(HiveSQL高级进阶技巧案例)
关于hive sql基本操作(HiveSQL高级进阶技巧案例)select bigint(double(current_timestamp())*1000)4)连续登录天数-升级版concat(from_unixtime(cast(substr(1234567890123 0 10) as bigint) 'yyyy-MM-dd HH:dd:ss') '.' substr(1234567890123 11 13))-- 取13位bigint当前系统时间戳size(extract_emoji(regexp_replace(text_comment_content '@.*?\\)' ''))) > 0-- 纯表情size(extract_emoji(regexp_replace(text_comment_content '@.*?\\)' ''
直接上干货,HiveSQL高级进阶技巧,重要性不言而喻。掌握这10个技巧(中),你的SQL水平将有一个质的提升!
1)经纬度计算距离
select t1.uid t2.uid
t1.latitude t1.longitude t2.latitude t2.longitude
-- 距离
6378137*2*ASIN(SQRT(POWER(SIN((t1.longitude-t2.longitude)*ACOS(-1)/360) 2)
COS(t1.longitude*ACOS(-1)/180)*COS(t2.longitude*ACOS(-1)/180)*POWER(SIN((t1.longitude-t2.longitude)*ACOS(-1)/360) 2))) as distance_
from ( select uid latitude longitude
from t1
where ) t1
left join (
select uid latitude longitude
from t2
where
) t2
on t1.uid = t2.uid
2)带emoji表情/纯emoji表情
-- 含表情
size(extract_emoji(regexp_replace(text_comment_content '@.*?\\)' ''))) > 0
-- 纯表情
size(extract_emoji(regexp_replace(text_comment_content '@.*?\\)' ''))) = length(text_comment_content)
3)13bigint转时间戳
-- 13位bigint 转 yyyy-MM-dd HH:dd:ss.sss
concat(from_unixtime(cast(substr(1234567890123 0 10) as bigint) 'yyyy-MM-dd HH:dd:ss') '.' substr(1234567890123 11 13))
-- 取13位bigint当前系统时间戳
select bigint(double(current_timestamp())*1000)
4)连续登录天数-升级版
-- 用户最近一次连续观看博主天数(注:连续天数>=2;若博主未开播,用户观看连续状态不中断)
select uid
bid
continue_play_days -- 连续观看天数(最近一次)
from (
select uid
bid
dt
continue_play_days
from
(
select uid bid dt rn_diff
sum(1) over(partition by uid bid rn_diff) as continue_play_days
row_number()over(partition by uid bid order by rn_diff) as rn
from
(
select
coalesce(t1.uid -1) as uid t2.bid t2.dt
row_number()over(partition by t1.uid t2.bid order by t2.dt desc ) as rn_play rn_author
rn_author - row_number()over(partition by t1.uid t2.bid order by t2.dt desc ) as rn_diff
from
( select uid -- 用户ID
bid -- 博主ID
dt
from t1 -- 用户观看博主表
where dt..
group by uid
bid
dt) t1
full join (
select bid dt
row_number() over(partition by bid order by dt desc ) as rn_author
from (
select bid -- 博主ID
dt
from t2 -- 博主开播表
where dt..
group by bid
dt) u
) t2
on t1.bid = t2.bid and t1.dt=t2.dt
) s
where uid <> -1
) ss
where rn = 1
) t
where continue_play_days >= 2
5)lateral view outer 坑、丢数
当explode函数里传入的数据是否为null lateral view explode(null) tmp as id 时 结果不显示任何数据**(注意:是指其他字段的数据也不返回,原表数据丢失**很容易被忽视);
lateral view outer explode(null) tmp as id 时 结果显示其他字段是有数据的 但id显示为null 原表数据不会丢失
SELECT *
FROM t1
lateral view outer explode(get_json_object(`data` '$.key')) t2 AS notice
WHERE dt =
6)多维分析 grouping=1
在cube多维处理时,对维度组合中的维度用coalease或nvl处理存在隐患,无法区分维度本身的取值为null,建议用grouping()处理,处理逻辑如下:
if(grouping(xxx)=1 'ALL' xxx) as xxx
附官方截图:
7)引擎不一致,执行结果不同
to_json(named_struct()) 拼接json字符串时MR、Spark引擎返回结果不一致
to_json(named_struct('userID' userid)) 拼接json字符串中key值存在大写字母时,在spark执行时是保持原值的,但是在mr执行时都会转成小写字母。两个引擎返回结果不一致。
大家有在工作中碰到吗,欢迎一起讨论学习