- Newest
- Most votes
- Most comments
Hola.
I've not worked out an answer for you, but I do want to reassure you about performed.
I think (although I've not tested, but I will be soon) that when the sort order of the table matches the partition and ordering of the window function, the window function is in effect free.
Once you do work out which window function and how to use it, performance will not be a problem.
Your process can be completed in Redshift using "sql sessionization" techniques. Essentially you use a LAG() statement to find the start and end of each "session" (on/off period) and then aggregate over the session.
I recommend using temporary or staging tables for the phases of the sessionization. Breaking the logic into simpler pieces usually makes it easier to follow. Make sure to declare the dist key as the window partition column and the sort key as the partition column followed by the session timestamp. This will speed up the final calculation.
Several detailed articles on this technique:
• https://sonra.io/2017/08/14/redshifts-window-functions-advanced-use-case-sessionization/
• https://www.dataiku.com/learn/guide/code/reshaping_data/sessionization.html
• https://blog.modeanalytics.com/finding-user-sessions-sql/
Worked example:
DROP TABLE IF EXISTS test_data
;
CREATE TEMP TABLE test_data (status CHAR(3), macaddress VARCHAR(32), event_ts TIMESTAMP)
DISTKEY (macaddress) SORTKEY (macaddress, event_ts)
;
INSERT INTO test_data (status, macaddress, event_ts)
SELECT 'ON' , '606405abb378', '2019-04-22 06:00:00'::TIMESTAMP
UNION ALL SELECT 'OFF', '606405abb378', '2019-04-22 06:20:00'::TIMESTAMP
UNION ALL SELECT 'OFF', '606405abb378', '2019-04-22 06:40:00'::TIMESTAMP
UNION ALL SELECT 'OFF', '606405abb378', '2019-04-22 07:00:00'::TIMESTAMP
UNION ALL SELECT 'ON' , '606405abb378', '2019-04-22 07:20:00'::TIMESTAMP
UNION ALL SELECT 'ON' , '606405abb378', '2019-04-22 07:40:00'::TIMESTAMP
UNION ALL SELECT 'ON' , '606405abb378', '2019-04-22 08:00:00'::TIMESTAMP
UNION ALL SELECT 'ON' , '606405abb378', '2019-04-22 08:20:00'::TIMESTAMP
UNION ALL SELECT 'OFF', '606405abb378', '2019-04-22 08:40:00'::TIMESTAMP
;
--Has the status changed since the last message?
WITH status_changes AS (
SELECT macaddress
, event_ts
--We will use the timestamp of the next event as the end of this event
, NVL(LEAD(event_ts, 1) OVER (PARTITION BY macaddress ORDER BY event_ts),event_ts) next_event_ts
, status
, LAG(status, 1) OVER (PARTITION BY macaddress ORDER BY event_ts) lag_status
, CASE WHEN lag_status IS NULL THEN 1
WHEN lag_status <> status THEN 1
ELSE 0 END AS status_start
FROM test_data
--Generate a unique ID for each session
), status_sessions AS (
SELECT macaddress , event_ts, next_event_ts, status, lag_status, status_start
, SUM(status_start) OVER (PARTITION BY macaddress ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) session_id
FROM status_changes
)
--Final select
SELECT macaddress
, status
, session_id
, MIN(event_ts) start_ts
, MAX(next_event_ts) end_ts
, DATEDIFF('minute',start_ts,end_ts) session_minutes
FROM status_sessions
GROUP BY macaddress, status, session_id
ORDER BY macaddress, session_id
;
Output
macaddress | status | start_ts | end_ts | minutes
--------------+--------+------------------+------------------+---------
606405abb378 | ON | 2019-04-22 06:00 | 2019-04-22 06:20 | 20
606405abb378 | OFF | 2019-04-22 06:20 | 2019-04-22 07:20 | 60
606405abb378 | ON | 2019-04-22 07:20 | 2019-04-22 08:40 | 80
606405abb378 | OFF | 2019-04-22 08:40 | 2019-04-22 08:40 | 0
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago