标题 创建时间 发布时间 最后一次修改 备注
Hive 常见问题速查 2020.08.25 2020.12.21 2021.01.15 /

函数大全

https://www.iteblog.com/archives/2258.html

Hive 结果采样

select * from my_table
where rand() <= 0.0001
distribute by rand()
sort by rand()
limit 10000;

--- source: http://www.joefkelley.com/736/

计算时间差

--- https://blog.csdn.net/kent7306/article/details/50441967
--- 计算前后差

--- https://blog.csdn.net/qq_33290422/article/details/81204532
--- 直接计算时间差

json 提取

--- refer: https://www.cnblogs.com/drjava/p/10486134.html
--- [{"name":"张三","sex":"男","age":"25"},{"name":"李","sex":"男","age":"47"}]

SELECT get_json_object(xjson,"$.[0]") FROM person;
--- {"name":"王二狗","sex":"男","age":"25"}

--- 按字段提取
SELECT get_json_object(xjson,"$.[0].age") FROM person;
--- 25

--- **通配提取,不带索引即可**
SELECT get_json_object(xjson,"$.[].age") FROM person;
--- ["25","47"]

临时表运用

--- refer: https://www.jianshu.com/p/d518e9f5d5f9
--- note: hive 嵌套使用不支持

with temp as (
    select * from xxx
)
select * from temp;

行转列

select user
   , concat_ws(',', collect_set(item)) as item_list -- collect set list 最后结果不同
from user_item_table
group by user;

结果存储

--- https://www.cnblogs.com/Allen-rg/p/9295615.html

取尾数

***
and substr(item,-1) = 0  -- 采样取尾数为0

left join 注意事项

--- select 无选定内容时,会出现null结果

关于作者