SQL Practice-Social Media Company

You just try your sql query and then slide the answer sql code.

SocialFlow Analytics Dashboard
BigQuery · Excel · SQL
Executive Dashboard
6-year performance summary · SocialFlow Inc. · 2020–2025
Monthly Active Users
320M
▲ 18.4% vs 2023
👤
Total Revenue (2024)
$4.82B
▲ 24.1% vs 2023
💰
DAU / MAU Ratio
27.2%
▲ 1.8pp vs 2023
📈
ARPU (Annual)
$15.06
▲ 4.8% vs 2023
🎯
User Churn Rate
8.3%
▼ -1.2pp vs 2023
⚠️
Avg Session Duration
28.4 min
▲ 3.1 min vs 2023
⏱️
Executive Summary: SocialFlow has grown from 145M MAU in 2020 to 320M in 2024, a +120% increase over 5 years. Revenue scaled 4× from $1.2B to $4.82B, driven primarily by ad monetization (78% revenue share). The DAU/MAU ratio of 27.2% signals healthy daily habit formation, while churn improved to 8.3% — the best since 2021. Premium subscriptions are the fastest-growing revenue segment at +41% YoY.
Annual Revenue Trend (2020–2025)
Total revenue by year — Ads + Subscriptions + Premium
Revenue Mix (2024)
Ad Revenue vs Subscriptions vs Premium Features
MAU Growth (2020–2025)
Monthly Active Users — Year-end figures (Millions)
Revenue by Region (2024)
Geographic revenue distribution
Year-over-Year Performance Summary
2020–2025
YearMAUDAURevenueARPUAd Rev ShareChurnAvg SessionYoY Growth
2020145M34M$1.20B$8.2882%11.2%21.3 minBaseline
2021198M48M$1.85B$9.3480%10.5%23.1 min▲ 54.2%
2022245M61M$2.60B$10.6179%9.8%25.4 min▲ 40.5%
2023270M69M$3.88B$14.3778%9.5%25.3 min▲ 49.2%
2024320M87M$4.82B$15.0678%8.3%28.4 min▲ 24.2%
2025 (est.)358M101M$5.90B$16.4876%7.8%30.1 min▲ 22.4%
User Growth Analysis
Acquisition, retention & demographic breakdown · 2020–2025
New Users (2024)
72M
▲ 22.0% vs 2023
🆕
30-Day Retention
68.5%
▲ 3.2pp vs 2023
🔄
Power Users (Weekly)
41.3M
▲ 15.6% vs 2023
Creator Accounts
4.2M
▲ 28.0% vs 2023
🎨
Dormant Users
38M
▼ Re-engagement needed
😴
Organic Acquisition
61%
▲ 4pp vs 2023
🌱
MAU vs DAU Trend (2020–2025)
Monthly & Daily Active Users — Annual snapshots (Millions)
User Acquisition Channels (2024)
How new users discover SocialFlow
User Age Group Distribution (2020 vs 2024)
Demographic shift — platform maturing audience
User Cohort Retention (Month 1 → Month 12)
2022–2024 Signup Cohorts
CohortMonth 1Month 2Month 3Month 6Month 9Month 12
Q1 2022 100% 72% 61% 48% 38% 31%
Q3 2022 100% 74% 63% 50% 40% 33%
Q1 2023 100% 76% 65% 53% 43% 36%
Q3 2023 100% 78% 67% 55% 45% 38%
Q1 2024 100% 80% 69% 57%
Revenue & Monetization
Ad revenue, subscriptions, ARPU & advertiser analysis · 2020–2025
Ad Revenue (2024)
$3.76B
▲ 21.8% vs 2023
📣
Subscription Revenue
$724M
▲ 41.0% vs 2023
Premium Features
$336M
▲ 18.3% vs 2023
🚀
Cost Per 1K Impressions
$5.82
▲ 8.2% vs 2023
📊
Premium Subscribers
9.2M
▲ 36.0% vs 2023
👑
Ad Spend Concentration
Top 100
= 42% of ad rev
🎯
Revenue Insight: Ad revenue remains the core at 78% of total, but subscription revenue is the fastest-growing segment at +41% YoY. The shift toward subscriptions reduces dependency on ad markets. North America contributes 48% of ad revenue despite being only 22% of MAU — indicating 3.6× monetization efficiency vs other regions. Video ads carry a 62% premium CPM over static image ads.
Revenue by Stream — 2020–2025 (Stacked)
Ad Revenue · Subscriptions · Premium Features (USD Billions)
ARPU by Region (2024)
Average Revenue Per User — geographic monetization gap
Ad Revenue by Content Format
Which content type drives the most ad revenue (2024)
Top Advertiser Industries (2024)
By Ad Spend
IndustryTotal SpendImpressionsAvg CPMClick RateYoY ChangeTier
E-Commerce / Retail$812M139.6B$5.822.41%▲ 28%Premium
Consumer Tech$624M96.1B$6.491.92%▲ 19%Premium
Financial Services$518M68.3B$7.581.38%▲ 22%Premium
Entertainment / Media$441M81.7B$5.403.12%▲ 35%Standard
Healthcare / Pharma$328M43.7B$7.511.05%▲ 41%Standard
Automotive$291M44.0B$6.611.18%▶ 2%Standard
Travel & Hospitality$248M39.4B$6.302.88%▲ 52%Standard
Education / EdTech$186M35.5B$5.242.65%▲ 48%Self-Serve
Engagement & Content Analysis
Content performance, creator ecosystem & interaction patterns · 2024
Avg Engagement Rate
4.82%
▲ 0.6pp vs 2023
❤️
Daily Posts Created
48.3M
▲ 31% vs 2023
✏️
Video Watch Time
2.1B hrs/day
▲ 44% vs 2023
▶️
Viral Content (>1M views)
12,400/day
▲ 22% vs 2023
🔥
Creator Revenue Share
$480M
▲ 60% vs 2023
🎨
Avg Comments/Post
23.1
▲ 18% vs 2023
💬
Engagement Rate by Content Type
Likes + Comments + Shares / Impressions · 2024
Content Type Distribution
Share of all posts created on platform (2024)
Engagement Trend by Content Type (2020–2024)
Reels/Short-video vs Photos vs Stories — engagement rate over time
Creator Tier Performance (2024)
By Follower Count
Creator TierFollowersCountAvg Eng. RateAvg CPMAvg Posts/WeekRevenue Gen.
Mega> 1M8,2002.1%$12.404.2$218M
Macro100K–1M142,0003.8%$8.606.1$312M
Micro10K–100K1,840,0006.4%$5.208.3$198M
Nano1K–10K2,210,0008.9%$3.1011.2$82M
Platform & Geographic Analysis
Device breakdown, OS split, top countries & regional monetization · 2024
iOS Share
38.2%
of all sessions
🍎
Android Share
49.6%
▲ 2.1pp vs 2023
🤖
Web / Desktop
12.2%
▼ 1.8pp vs 2023
💻
North America Users
70.4M
▲ 8.2% vs 2023
🌎
SEA Users (Fastest)
88.3M
▲ 34.1% vs 2023
🌏
Countries Active
148
▲ 12 new markets
🌍
Platform / Device Mix (2024)
Session share by device type
Top 10 Countries by MAU (2024)
Monthly Active Users — Millions
Mobile Platform Trend (2020–2025)
iOS vs Android session share — annual evolution
Regional Performance Summary (2024)
MAU + Revenue + ARPU
RegionMAURevenueARPUAvg SessionDAU/MAUGrowth YoY
North America70.4M$2,316M$32.9031.2 min34.1%▲ 8.2%
Europe62.1M$964M$15.5229.4 min28.6%▲ 11.4%
Southeast Asia88.3M$718M$8.1333.7 min31.2%▲ 34.1%
Latin America54.8M$398M$7.2627.1 min26.3%▲ 22.5%
Middle East & Africa28.6M$214M$7.4824.8 min22.4%▲ 41.2%
South Asia15.8M$112M$7.0926.3 min24.1%▲ 18.9%
Data Schema
BigQuery star schema — SocialFlow data warehouse · 100,000 session rows
Schema Design: The data warehouse follows a star schema pattern with fact_user_sessions as the central fact table (100K rows, 2020–2025) surrounded by 4 dimension tables. Each session row represents one user visit. Revenue is tracked at session level (ad impressions × CPM + subscription allocation). This design enables efficient BigQuery partitioning by session_date and clustering by country and platform.
📋
fact_user_sessions
FACT TABLE · 100,000 rows
🔑session_idSTRING
user_idSTRING
content_idSTRING
session_dateDATE
yearINT64
monthINT64
quarterSTRING
platformSTRING
countrySTRING
regionSTRING
session_duration_minFLOAT64
pages_viewedINT64
content_typeSTRING
action_typeSTRING
ad_impressionsINT64
ad_revenue_usdFLOAT64
subscription_rev_usdFLOAT64
total_revenue_usdFLOAT64
👤
dim_users
DIMENSION TABLE · ~75K users
🔑user_idSTRING
signup_dateDATE
countrySTRING
age_groupSTRING
genderSTRING
subscription_typeSTRING
is_creatorBOOLEAN
creator_tierSTRING
lifetime_value_usdFLOAT64
acquisition_channelSTRING
is_activeBOOLEAN
days_since_last_loginINT64
🎬
dim_content
DIMENSION TABLE · ~2M content items
🔑content_idSTRING
creator_idSTRING
categorySTRING
content_typeSTRING
created_dateDATE
total_viewsINT64
total_likesINT64
total_sharesINT64
total_commentsINT64
duration_secondsINT64
is_sponsoredBOOLEAN
📣
dim_advertisers
DIMENSION TABLE · ~8K advertisers
🔑advertiser_idSTRING
advertiser_nameSTRING
industrySTRING
countrySTRING
tierSTRING
annual_spend_usdFLOAT64
contract_startDATE
is_activeBOOLEAN
BigQuery SQL — Business Questions
15 production-grade queries · Basic → Intermediate → Advanced · Google BigQuery syntax
Filter:
B1
Monthly Active Users by Year (2020–2025)
Count distinct users who had at least one session per month
Basic
Business Question: How has our MAU grown year-over-year? Are we accelerating or decelerating user growth? This metric is the single most watched number by investors and product teams.
-- Monthly Active Users by Year and Month
SELECT
  year,
  month,
  FORMAT_DATE('%Y-%m', DATE(year, month, 1))    AS year_month,
  COUNT(DISTINCT user_id)                         AS monthly_active_users,
  LAG(COUNT(DISTINCT user_id)) OVER (
    ORDER BY year, month
  )                                               AS prev_month_mau,
  ROUND(
    (COUNT(DISTINCT user_id) - LAG(COUNT(DISTINCT user_id))
       OVER (ORDER BY year, month))
    / NULLIF(LAG(COUNT(DISTINCT user_id))
       OVER (ORDER BY year, month), 0) * 100, 2
  )                                               AS mom_growth_pct
FROM `socialflow.analytics.fact_user_sessions`
WHERE year BETWEEN 2020 AND 2025
GROUP BY year, month
ORDER BY year, month;
B2
Total Revenue by Year and Revenue Stream
Break down revenue into ad, subscription, and premium components
Basic
Business Question: How is our revenue diversifying over time? Investors want to see subscription revenue growing as a share of total to reduce ad market volatility risk.
-- Annual Revenue Breakdown by Stream
SELECT
  year,
  ROUND(SUM(ad_revenue_usd) / 1e9, 3)           AS ad_revenue_billion,
  ROUND(SUM(subscription_rev_usd) / 1e9, 3)     AS subscription_revenue_billion,
  ROUND(SUM(total_revenue_usd) / 1e9, 3)        AS total_revenue_billion,
  ROUND(
    SUM(ad_revenue_usd) / NULLIF(SUM(total_revenue_usd), 0) * 100, 1
  )                                               AS ad_revenue_share_pct,
  ROUND(
    SUM(subscription_rev_usd) / NULLIF(SUM(total_revenue_usd), 0) * 100, 1
  )                                               AS subscription_share_pct
FROM `socialflow.analytics.fact_user_sessions`
WHERE year BETWEEN 2020 AND 2025
GROUP BY year
ORDER BY year;
B3
Top 10 Countries by Total Sessions and Revenue
Identify the highest-value markets for resource allocation
Basic
Business Question: Where should we focus sales & marketing investment? Which markets have the best monetization efficiency (revenue per session)?
-- Top 10 Countries: Sessions, Revenue, ARPU
SELECT
  country,
  region,
  COUNT(*)                                        AS total_sessions,
  COUNT(DISTINCT user_id)                         AS unique_users,
  ROUND(SUM(total_revenue_usd), 2)               AS total_revenue_usd,
  ROUND(
    SUM(total_revenue_usd) / NULLIF(COUNT(DISTINCT user_id), 0), 2
  )                                               AS arpu_usd,
  ROUND(AVG(session_duration_min), 1)            AS avg_session_min
FROM `socialflow.analytics.fact_user_sessions`
WHERE year = 2024
GROUP BY country, region
ORDER BY total_revenue_usd DESC
LIMIT 10;
B4
Platform (iOS/Android/Web) Usage Split by Year
Track mobile dominance and web decline over time
Basic
Business Question: Are we investing appropriately in mobile vs web? How has the platform mix shifted? This drives our engineering roadmap prioritization.
-- Platform Session Share by Year
SELECT
  year,
  platform,
  COUNT(*)                                        AS sessions,
  ROUND(
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY year), 2
  )                                               AS session_share_pct,
  ROUND(AVG(session_duration_min), 2)            AS avg_duration_min,
  ROUND(AVG(ad_revenue_usd), 4)                  AS avg_ad_rev_per_session
FROM `socialflow.analytics.fact_user_sessions`
GROUP BY year, platform
ORDER BY year, sessions DESC;
B5
Content Type Performance — Engagement & Revenue
Compare Video, Photo, Story, Reel, Live across key metrics
Basic
Business Question: Which content types should we invest in algorithmically promoting? Short-form video is hypothesized to drive both engagement and ad revenue — confirm with data.
-- Content Type: Session Volume, Duration, Revenue
SELECT
  content_type,
  COUNT(*)                                        AS total_sessions,
  ROUND(AVG(session_duration_min), 2)            AS avg_session_min,
  ROUND(AVG(pages_viewed), 1)                    AS avg_pages_viewed,
  ROUND(AVG(ad_impressions), 1)                  AS avg_ad_impressions,
  ROUND(SUM(ad_revenue_usd), 2)                  AS total_ad_revenue,
  ROUND(
    SUM(ad_revenue_usd) * 100.0 / SUM(SUM(ad_revenue_usd)) OVER (), 2
  )                                               AS revenue_share_pct
FROM `socialflow.analytics.fact_user_sessions`
WHERE year = 2024
GROUP BY content_type
ORDER BY total_ad_revenue DESC;
I1
7-Day Rolling Average DAU with Anomaly Flag
Smooth daily volatility and flag >2σ deviations
Intermediate
Business Question: Is our DAU trend healthy or hiding volatility? Sudden drops may indicate app store issues, negative press, or algorithmic problems. We need an automated alert signal.
-- 7-Day Rolling DAU with Anomaly Detection
WITH daily_active AS (
  SELECT
    session_date,
    COUNT(DISTINCT user_id)   AS dau
  FROM `socialflow.analytics.fact_user_sessions`
  GROUP BY session_date
),
rolling AS (
  SELECT
    session_date,
    dau,
    AVG(dau) OVER (
      ORDER BY session_date
      ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    )                         AS dau_7d_avg,
    STDDEV(dau) OVER (
      ORDER BY session_date
      ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    )                         AS dau_30d_stddev,
    AVG(dau) OVER (
      ORDER BY session_date
      ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    )                         AS dau_30d_avg
  FROM daily_active
)
SELECT
  session_date,
  dau,
  ROUND(dau_7d_avg, 0)        AS dau_7d_rolling_avg,
  CASE
    WHEN ABS(dau - dau_30d_avg) > 2 * dau_30d_stddev THEN 'ANOMALY'
    WHEN ABS(dau - dau_30d_avg) > 1.5 * dau_30d_stddev THEN 'WARNING'
    ELSE 'NORMAL'
  END                         AS anomaly_flag
FROM rolling
ORDER BY session_date DESC;
I2
User Segmentation by RFM (Recency · Frequency · Monetization)
Score users 1–5 on each dimension and assign behavioral segments
Intermediate
Business Question: Who are our Champions vs At-Risk users? RFM segmentation lets the marketing team run targeted re-engagement campaigns for each segment rather than blasting everyone.
-- RFM Segmentation
WITH user_metrics AS (
  SELECT
    user_id,
    DATE_DIFF(
      DATE '2025-01-01',
      MAX(session_date), DAY
    )                         AS recency_days,
    COUNT(DISTINCT session_date) AS frequency,
    SUM(total_revenue_usd)    AS monetary
  FROM `socialflow.analytics.fact_user_sessions`
  WHERE year BETWEEN 2024 AND 2025
  GROUP BY user_id
),
rfm_scores AS (
  SELECT
    user_id,
    recency_days,
    frequency,
    ROUND(monetary, 2)        AS monetary,
    NTILE(5) OVER (ORDER BY recency_days DESC) AS r_score,
    NTILE(5) OVER (ORDER BY frequency)         AS f_score,
    NTILE(5) OVER (ORDER BY monetary)          AS m_score
  FROM user_metrics
)
SELECT
  user_id,
  r_score,
  f_score,
  m_score,
  (r_score + f_score + m_score)                 AS rfm_total,
  CASE
    WHEN r_score >= 4 AND f_score >= 4           THEN 'Champion'
    WHEN r_score >= 3 AND f_score >= 3           THEN 'Loyal User'
    WHEN r_score >= 4 AND f_score <= 2           THEN 'New User'
    WHEN r_score <= 2 AND f_score >= 4           THEN 'At Risk'
    WHEN r_score <= 2 AND f_score <= 2           THEN 'Lost User'
    ELSE 'Needs Attention'
  END                                             AS user_segment
FROM rfm_scores
ORDER BY rfm_total DESC;
I3
ARPU Trend by Subscription Tier and Region (2022–2024)
Track monetization efficiency across tiers over 3 years
Intermediate
Business Question: Are Premium subscribers becoming more or less valuable over time? Is the pricing strategy working — are we leaving money on the table in high-ARPU regions?
-- ARPU by Subscription Tier and Region
SELECT
  f.year,
  f.region,
  u.subscription_type,
  COUNT(DISTINCT f.user_id)                       AS user_count,
  ROUND(SUM(f.total_revenue_usd), 2)             AS total_revenue,
  ROUND(
    SUM(f.total_revenue_usd) / NULLIF(COUNT(DISTINCT f.user_id), 0), 2
  )                                               AS arpu_usd,
  ROUND(
    SUM(f.total_revenue_usd) / NULLIF(COUNT(DISTINCT f.user_id), 0)
    - LAG(
        SUM(f.total_revenue_usd) / NULLIF(COUNT(DISTINCT f.user_id), 0)
      ) OVER (
        PARTITION BY f.region, u.subscription_type
        ORDER BY f.year
      ), 2
  )                                               AS arpu_yoy_change
FROM `socialflow.analytics.fact_user_sessions` f
JOIN `socialflow.analytics.dim_users` u
  ON f.user_id = u.user_id
WHERE f.year BETWEEN 2022 AND 2024
GROUP BY f.year, f.region, u.subscription_type
ORDER BY f.year, f.region, arpu_usd DESC;
I4
Weekly Retention: D1 / D7 / D30 by Acquisition Channel
Which acquisition channels bring the highest-quality users?
Intermediate
Business Question: Our paid marketing team spends $180M/year acquiring users. But high install count ≠ high quality. Organic users may retain 2× better. We need retention-adjusted CAC by channel.
-- D1 / D7 / D30 Retention by Acquisition Channel
WITH user_first_session AS (
  SELECT
    f.user_id,
    u.acquisition_channel,
    MIN(f.session_date)       AS first_session_date
  FROM `socialflow.analytics.fact_user_sessions` f
  JOIN `socialflow.analytics.dim_users` u
    ON f.user_id = u.user_id
  WHERE f.year = 2024
  GROUP BY f.user_id, u.acquisition_channel
),
retention_base AS (
  SELECT
    ufs.user_id,
    ufs.acquisition_channel,
    ufs.first_session_date,
    MAX(CASE
      WHEN DATE_DIFF(f.session_date, ufs.first_session_date, DAY) BETWEEN 1 AND 1
      THEN 1 ELSE 0 END)      AS retained_d1,
    MAX(CASE
      WHEN DATE_DIFF(f.session_date, ufs.first_session_date, DAY) BETWEEN 6 AND 8
      THEN 1 ELSE 0 END)      AS retained_d7,
    MAX(CASE
      WHEN DATE_DIFF(f.session_date, ufs.first_session_date, DAY) BETWEEN 28 AND 32
      THEN 1 ELSE 0 END)      AS retained_d30
  FROM user_first_session ufs
  JOIN `socialflow.analytics.fact_user_sessions` f
    ON ufs.user_id = f.user_id
  GROUP BY ufs.user_id, ufs.acquisition_channel, ufs.first_session_date
)
SELECT
  acquisition_channel,
  COUNT(*)                                        AS new_users,
  ROUND(AVG(retained_d1) * 100, 1)              AS d1_retention_pct,
  ROUND(AVG(retained_d7) * 100, 1)              AS d7_retention_pct,
  ROUND(AVG(retained_d30) * 100, 1)             AS d30_retention_pct
FROM retention_base
GROUP BY acquisition_channel
ORDER BY d30_retention_pct DESC;
I5
Quarterly Revenue Seasonality Index
How much does Q4 outperform the annual average?
Intermediate
Business Question: Ad revenue spikes in Q4 due to holiday spending. Finance needs seasonality factors to set quarterly targets correctly — otherwise Q1 always looks like underperformance.
-- Quarterly Seasonality Index
WITH quarterly_rev AS (
  SELECT
    year,
    quarter,
    SUM(total_revenue_usd)    AS quarterly_revenue
  FROM `socialflow.analytics.fact_user_sessions`
  WHERE year BETWEEN 2021 AND 2024
  GROUP BY year, quarter
),
annual_avg AS (
  SELECT
    year,
    AVG(quarterly_revenue)    AS avg_quarterly_revenue
  FROM quarterly_rev
  GROUP BY year
)
SELECT
  q.year,
  q.quarter,
  ROUND(q.quarterly_revenue / 1e6, 2)            AS revenue_million,
  ROUND(
    q.quarterly_revenue / a.avg_quarterly_revenue, 3
  )                                               AS seasonality_index,
  CASE
    WHEN q.quarterly_revenue / a.avg_quarterly_revenue > 1.1 THEN 'Peak'
    WHEN q.quarterly_revenue / a.avg_quarterly_revenue < 0.9 THEN 'Trough'
    ELSE 'Normal'
  END                                             AS season_label
FROM quarterly_rev q
JOIN annual_avg a ON q.year = a.year
ORDER BY q.year, q.quarter;
A1
Cohort LTV Analysis — 12-Month Lifetime Value by Signup Year
Calculate cumulative revenue generated per user over 12 months post-signup
Advanced
Business Question: What is the true LTV of users acquired in 2022 vs 2023? We need this to set maximum CAC thresholds for paid acquisition and determine payback period for each cohort.
-- 12-Month Cohort LTV Analysis
WITH user_cohorts AS (
  SELECT
    u.user_id,
    DATE_TRUNC(u.signup_date, YEAR)               AS cohort_year,
    u.signup_date
  FROM `socialflow.analytics.dim_users` u
  WHERE u.signup_date IS NOT NULL
),
monthly_revenue AS (
  SELECT
    f.user_id,
    DATE_DIFF(
      DATE_TRUNC(f.session_date, MONTH),
      DATE_TRUNC(uc.signup_date, MONTH),
      MONTH
    )                                             AS months_since_signup,
    SUM(f.total_revenue_usd)                      AS monthly_rev
  FROM `socialflow.analytics.fact_user_sessions` f
  JOIN user_cohorts uc ON f.user_id = uc.user_id
  WHERE DATE_DIFF(f.session_date, uc.signup_date, MONTH) BETWEEN 0 AND 11
  GROUP BY f.user_id, months_since_signup
),
cumulative_ltv AS (
  SELECT
    uc.cohort_year,
    mr.months_since_signup,
    COUNT(DISTINCT mr.user_id)                    AS active_users,
    ROUND(SUM(mr.monthly_rev), 2)                 AS total_rev_at_month,
    ROUND(
      SUM(SUM(mr.monthly_rev)) OVER (
        PARTITION BY uc.cohort_year
        ORDER BY mr.months_since_signup
      ) / COUNT(DISTINCT uc.user_id), 2
    )                                             AS cumulative_ltv_per_user
  FROM user_cohorts uc
  JOIN monthly_revenue mr ON uc.user_id = mr.user_id
  GROUP BY uc.cohort_year, mr.months_since_signup
)
SELECT
  cohort_year,
  months_since_signup,
  active_users,
  cumulative_ltv_per_user
FROM cumulative_ltv
ORDER BY cohort_year, months_since_signup;
A2
Pareto Analysis — Top Creators Driving 80% of Engagement
Which 20% of creators generate 80% of platform engagement?
Advanced
Business Question: The creator economy is highly concentrated. Identifying the top 20% lets us build a VIP creator partnership program, prioritize support resources, and prevent high-value creator churn.
-- Pareto: Top Creators by Engagement (80/20 Rule)
WITH creator_engagement AS (
  SELECT
    c.creator_id,
    u.creator_tier,
    SUM(c.total_views)                            AS total_views,
    SUM(c.total_likes + c.total_shares + c.total_comments)
                                                  AS total_interactions,
    COUNT(c.content_id)                           AS content_pieces
  FROM `socialflow.analytics.dim_content` c
  JOIN `socialflow.analytics.dim_users` u
    ON c.creator_id = u.user_id
  WHERE u.is_creator = TRUE
    AND EXTRACT(YEAR FROM c.created_date) = 2024
  GROUP BY c.creator_id, u.creator_tier
),
ranked AS (
  SELECT
    creator_id,
    creator_tier,
    total_interactions,
    content_pieces,
    ROUND(total_interactions * 1.0 /
      SUM(total_interactions) OVER () * 100, 4)   AS interaction_share_pct,
    RANK() OVER (ORDER BY total_interactions DESC) AS engagement_rank,
    COUNT(*) OVER ()                               AS total_creators
  FROM creator_engagement
),
cumulative AS (
  SELECT
    *,
    SUM(interaction_share_pct) OVER (
      ORDER BY engagement_rank
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    )                                             AS cumulative_share_pct,
    ROUND(engagement_rank * 100.0 / total_creators, 1) AS creator_percentile
  FROM ranked
)
SELECT
  creator_id,
  creator_tier,
  engagement_rank,
  creator_percentile,
  ROUND(total_interactions, 0)                    AS total_interactions,
  ROUND(interaction_share_pct, 4)                AS share_pct,
  ROUND(cumulative_share_pct, 2)                 AS cumulative_share_pct,
  CASE WHEN cumulative_share_pct <= 80 THEN 'Top 20%' ELSE 'Long Tail' END
                                                  AS pareto_group
FROM cumulative
ORDER BY engagement_rank
LIMIT 200;
A3
User Session Funnel — View → Like → Share → Comment → Post
Calculate drop-off at each engagement action step
Advanced
Business Question: Where do users drop off in the engagement funnel? If only 3% of viewers comment, can product changes move this needle? Every +1pp on share rate = significant viral coefficient improvement.
-- Engagement Funnel Analysis
WITH action_counts AS (
  SELECT
    year,
    quarter,
    COUNTIF(action_type IN ('View','Like','Share','Comment','Post')) AS total_sessions,
    COUNTIF(action_type IN ('Like','Share','Comment','Post'))        AS liked_or_more,
    COUNTIF(action_type IN ('Share','Comment','Post'))               AS shared_or_more,
    COUNTIF(action_type IN ('Comment','Post'))                      AS commented_or_more,
    COUNTIF(action_type = 'Post')                                   AS posted
  FROM `socialflow.analytics.fact_user_sessions`
  WHERE year BETWEEN 2022 AND 2024
  GROUP BY year, quarter
)
SELECT
  year,
  quarter,
  total_sessions                                  AS step1_views,
  liked_or_more                                   AS step2_likes,
  shared_or_more                                  AS step3_shares,
  commented_or_more                               AS step4_comments,
  posted                                          AS step5_posts,
  ROUND(liked_or_more     * 100.0 / NULLIF(total_sessions, 0), 1)   AS like_rate_pct,
  ROUND(shared_or_more    * 100.0 / NULLIF(total_sessions, 0), 1)   AS share_rate_pct,
  ROUND(commented_or_more * 100.0 / NULLIF(total_sessions, 0), 1)   AS comment_rate_pct,
  ROUND(posted            * 100.0 / NULLIF(total_sessions, 0), 1)   AS post_rate_pct
FROM action_counts
ORDER BY year, quarter;
A4
Churn Prediction Features — 30-Day Behavioral Signals
Engineer features for an ML churn model using session patterns
Advanced
Business Question: Can we predict which users will churn before they do? This feature table feeds into our ML pipeline (XGBoost classifier). Proactive retention campaigns triggered 2 weeks early reduce churn by ~18%.
-- Churn Prediction Feature Engineering
WITH user_activity AS (
  SELECT
    user_id,
    -- Recency signals
    DATE_DIFF(DATE '2025-01-01', MAX(session_date), DAY)  AS days_since_last_session,
    -- Frequency signals
    COUNT(DISTINCT session_date)                           AS active_days_l30,
    COUNT(*)                                               AS total_sessions_l30,
    -- Engagement depth
    ROUND(AVG(session_duration_min), 2)                   AS avg_session_duration,
    ROUND(AVG(pages_viewed), 2)                           AS avg_pages_per_session,
    -- Content affinity
    COUNTIF(content_type = 'Video') * 1.0 / COUNT(*)     AS video_session_ratio,
    COUNTIF(action_type = 'Post') * 1.0 / COUNT(*)       AS posting_ratio,
    -- Revenue
    ROUND(SUM(total_revenue_usd), 4)                      AS revenue_l30,
    -- Trend: compare last 7 vs prior 23 days
    COUNTIF(session_date >= DATE_SUB(DATE '2025-01-01', INTERVAL 7 DAY))
                                                           AS sessions_last7d,
    COUNTIF(session_date < DATE_SUB(DATE '2025-01-01', INTERVAL 7 DAY))
                                                           AS sessions_prior23d
  FROM `socialflow.analytics.fact_user_sessions`
  WHERE session_date BETWEEN
    DATE_SUB(DATE '2025-01-01', INTERVAL 30 DAY)
    AND DATE '2025-01-01'
  GROUP BY user_id
)
SELECT
  ua.*,
  u.subscription_type,
  u.acquisition_channel,
  u.creator_tier,
  u.is_creator,
  -- Derived churn signal
  ROUND(sessions_last7d * 23.0 /
    NULLIF(sessions_prior23d * 7.0, 0), 3)                AS activity_trend_ratio,
  CASE
    WHEN days_since_last_session > 14 AND sessions_last7d = 0 THEN 1
    ELSE 0
  END                                                       AS churn_label
FROM user_activity ua
JOIN `socialflow.analytics.dim_users` u
  ON ua.user_id = u.user_id;
A5
Ad Revenue Attribution — CPM Optimization by Targeting Cohort
Which user segment × content type combination yields highest CPM?
Advanced
Business Question: Our ad team needs to price inventory intelligently. A 25–34 yr old Premium user watching a Reel commands a very different CPM than a Free user viewing a Story. This matrix drives dynamic floor pricing.
-- CPM Optimization Matrix: User Segment × Content Type
WITH session_cpm AS (
  SELECT
    f.user_id,
    f.content_type,
    f.platform,
    u.age_group,
    u.subscription_type,
    u.is_creator,
    f.ad_impressions,
    f.ad_revenue_usd,
    SAFE_DIVIDE(f.ad_revenue_usd, f.ad_impressions) * 1000 AS cpm
  FROM `socialflow.analytics.fact_user_sessions` f
  JOIN `socialflow.analytics.dim_users` u
    ON f.user_id = u.user_id
  WHERE f.year = 2024
    AND f.ad_impressions > 0
),
cpm_matrix AS (
  SELECT
    age_group,
    subscription_type,
    content_type,
    platform,
    COUNT(*)                                      AS session_count,
    SUM(ad_impressions)                           AS total_impressions,
    ROUND(SUM(ad_revenue_usd), 2)                AS total_ad_revenue,
    ROUND(AVG(cpm), 4)                           AS avg_cpm,
    ROUND(PERCENTILE_CONT(cpm, 0.5) OVER (
      PARTITION BY age_group, subscription_type, content_type, platform
    ), 4)                                        AS median_cpm,
    ROUND(STDDEV(cpm), 4)                        AS cpm_stddev
  FROM session_cpm
  GROUP BY age_group, subscription_type, content_type, platform
)
SELECT
  *,
  RANK() OVER (ORDER BY avg_cpm DESC)            AS cpm_rank
FROM cpm_matrix
WHERE session_count >= 100          -- statistical significance filter
ORDER BY avg_cpm DESC
LIMIT 50;

mutmainkalb sitesinden daha fazla şey keşfedin

Okumaya devam etmek ve tüm arşive erişim kazanmak için hemen abone olun.

Okumaya Devam Edin

mutmainkalb sitesinden daha fazla şey keşfedin

Okumaya devam etmek ve tüm arşive erişim kazanmak için hemen abone olun.

Okumaya Devam Edin