Hive实战项目——影音网站数据分析

如题所述

第1个回答  2022-07-14
统计 谷粒视频 网站的常规指标,各种 TopN 指标:

对将要处理的数据先进行一次数据清洗,过滤掉不合格的脏数据,同时调整数据的格式

pom.xml

ETLUtilMapper.java

ETLUtilDriver.java

处理前数据

处理后数据

gulivideo_ori

guli_user_ori

3.2.1 将表中category字段数组行转列
select views,hot from
gulivideo_orc lateral view explode (category) category_t as hot;t1

3.2.2 统计每个类别的观看总数
select hot,count(*)
from t1
group by hot;t2

3.2.3 获取观看前10的类别
select hot,total_view
from ()t2
order by total_view desc limit 10;

3.3.1 观看数top20视频
select views,category
from gulivideo_orc
order by views desc
limit 20;t1
3.3.2 所属类别
select views,category
from t1 lateral view explode(category)ct as category_name;

3.4.1 观看数top10,关联视频
select
videoid,views,category,relatedid
from
gulivideo_orc
order by
views desc
limit 50;t1

3.4.2 关联视频行转列
select distinct(r_id)
from
t1 lateral view explode(relatedid) relatedtable as r_id;t2

3.4.3 视频所属类别
select r_id,g.category
from
t2.join gulivideo_orc g on r_id = g.videoid;t3

select r_id,g.category
from
t2 join gulivideo_orc g on r_id = g.videoid;t3

3.4.4 类别展开
select category_name
from ()t3 lateral view explode(category)t as category_name;t4

3.4.5 统计类别个数
select category_name,count(*) hot
from
t4 group by category_name,t_sum;t5

3.4.6 所属类别排名
select * from
t5
order by hot desc;t6

1.找出上传前10的用户
select uploader,
videos
from
guli_user_orc
order by videos desc
limit 10;t1

2.找到上传的所有视频
select t1.uploader,
videoid,
views
from
()t1 join gulivideo_orc g
on
t.uploader=g.uploader
order by uploader,views desc; t2

1.统计所有类别对应的视频
select
category_name,videoid,views
from
gulivideo_orc
lateral view explode(category) t as category_name;t1

2.对每个类观看数排名
select *,rank() over(partition by category_name order by views desc) rank_no
from
()t1;t2

3.取前十
select * from
()t2
where rank_no<=10;
相似回答
大家正在搜