JOIN ME IN SOLVING A META/FACEBOOK SQL INTERVIEW QUESTION

Adekolu Peter
4 min readJul 25, 2023

--

Greetings, my fellow SQLNinjas! Once again, I’m thrilled to join you on another exciting journey to tackle an intriguing problem, this time with T-SQL. Oh, but there’s a twist! We’re diving into a challenge from the big leagues — Meta/Facebook, as presented by Stratascratch.

Our problem is to calculate each user’s average session time, defined as the time difference between a page_load and page_exit. Don’t fret; we’ll break it down into manageable steps, and soon it’ll all make perfect sense. Imagine yourself as a user loading a webpage for the first time, spending a few moments reading the content, and then gracefully exiting the page.

Now, picture this: you loaded the page at a specific time, right? Later, you exited the page at another time. In this problem, a session is precisely the time difference between your exit and load events. Simple enough! But hold on, there’s more. A user can only have one session per day, even if they visit the page multiple times. So, we’ll consider only the earliest load time and the latest exit time for each day as the user’s session.

With that understanding, let’s move on to the exciting part — solving the problem! We’ll be working with a table called facebook_web_log, containing user_id, timestamp, and action columns that indicate page_load and page_exit events. Our goal is to find the average session time for each user.

Sample Table Preview:

Imagine our output sample showcasing the user_id and their average session time in seconds:

user_id | Average_Session_Time (seconds)
------- | ------------------------
0 | 750
1 | 40

To get here, we’ll follow a logical flow, breaking the problem down step-by-step:

  1. Find the user’s session time for each day.
  2. Calculate the average session time for each user.

In our facebook_web_log table, we'll create two separate tables using common table expressions (CTEs) to help us along the way. The first one, PAGE_LOAD_TABLE,will store the maximum page_load time for each user per day using the MAX() function. Similarly, PAGE_EXIT_TABLE will hold the earliest page_exit time for each user per day using the MIN() function.

We can create these tables using the following T-SQL code:

    --FOR THE PAGE LOAD TABLE

WITH PAGE_LOAD_TABLE AS (

SELECT user_id, part_date, max(timestamp) as max_page_load
from
(SELECT user_id , CAST(timestamp as DATE) as part_date, timestamp
from facebook_web_log
where action = 'page_load'
group by user_id, timestamp) pl_table
group by user_id, part_date

),

--FOR THE PAGE EXIT TABLE

PAGE_EXIT_TABLE AS (
(SELECT user_id, part_date, min(timestamp) as min_page_exit
from
(SELECT user_id, CAST(timestamp as date) as part_date,
timestamp
from facebook_web_log
where action = 'page_exit'
group by user_id, timestamp ) pe_table
group by user_id, part_date )
)

Finally, to obtain the user’s session time for each day, we’ll combine these derived tables using a join on user_id and part_date. This allows us to access all necessary columns in one table and easily find the time difference between the min_page_exit column and max_page_loadcolumn.

As we’re almost done, we’ll make use of the AVG() function to calculate the average session time for each user, which can be created using the codes below:

--CALCULATING THE USER-AVERAGE SESSION(DIFFERENCE OF page_load and page_exit) 

SELECT pl.user_id,AVG(CAST(DATEDIFF(second, pl.max_page_load, pe.min_page_exit) AS FLOAT))
FROM page_load_table pl
join page_exit_table pe
on pl.user_id = pe.user_id
and pl.part_date = pe.part_date
group by pl.user_id

Here’s the final code snippet that brings it all together and there you have it the final solution to our problem — the average session time per user.

--FOR THE PAGE LOAD TABLE

WITH PAGE_LOAD_TABLE AS (

SELECT user_id, part_date, max(timestamp) as max_page_load
from
(SELECT user_id , CAST(timestamp as DATE) as part_date, timestamp
from facebook_web_log
where action = 'page_load'
group by user_id, timestamp) pl_table
group by user_id, part_date
),

--FOR THE PAGE EXIT TABLE

PAGE_EXIT_TABLE AS (
(SELECT user_id, part_date, min(timestamp) as min_page_exit
from
(SELECT user_id, CAST(timestamp as date) as part_date, timestamp
from facebook_web_log
where action = 'page_exit'
group by user_id, timestamp ) pe_table
group by user_id, part_date )
)

--CALCULATING THE USER-AVERAGE SESSION(DIFFERENCE OF page_load and page_exit)

SELECT pl.user_id,AVG(CAST(DATEDIFF(second, pl.max_page_load, pe.min_page_exit) AS FLOAT))
FROM page_load_table pl
join page_exit_table pe
on pl.user_id = pe.user_id
and pl.part_date = pe.part_date
group by pl.user_id

I hope you enjoyed our problem-solving approach as much as I did share it with you. But wait, I’m eager to hear what you think! Feel free to access the question here (if you already have a Stratascratch account) and give it a try yourself.
If you have any questions, suggestions, or tips to share, I’d be delighted to connect with you on LinkedIn or Gmail. Your feedback means a lot to me, and together, we can learn and grow.

--

--

Adekolu Peter
Adekolu Peter

No responses yet