The Time Machine Problem

How LAG and LEAD Let You Look Backward and Forward Without Breaking Your Query

Apr 24, 2026

This stack is part of an ongoing series on window functions. If you are landing here for the first time, start with last week’s piece: “I Wish I Could Group and Still See the Rows.” It builds the foundation this essay stands on. Once you have read it, come back. Everything here will make immediate sense.

Last week we talked about the wish. The moment every analyst feels when GROUP BY gives you the summary and takes the rows. We gave that wish a name and built the syntax that grants it one piece at a time.

This week the wish is different. But it comes from the same place.

I was at Antara Health, sitting with a monthly engagement table that the team relied on to understand whether the chronic illness management program was actually working. The columns were straightforward: total members, active members, engagement rate, total sessions, churned members, new members. One row per month, clean and well structured. The question the team was asking was equally straightforward: is engagement improving or declining compared to last month?

Simple question. The data existed. The answer should have taken five minutes.

Instead I spent an afternoon writing this:

sql

SELECT
    current.month,
    current.engagement_rate,
    previous.engagement_rate AS prev_engagement_rate,
    current.engagement_rate - previous.engagement_rate AS mom_change
FROM monthly_engagement current
LEFT JOIN monthly_engagement previous
    ON previous.month = DATE_TRUNC('month', current.month - INTERVAL '1 month')
ORDER BY current.month;

A self join. The table joined to itself. Current month on the left, previous month on the right, a date calculation threading them together. It worked. It produced the right numbers. And it was one of the most unnecessarily complicated queries I had written for a question that was, at its core, just: what was last month’s number?

I showed it to a senior analyst on the team. She looked at it for a moment and said: why didn’t you just use LAG?

I did not know what LAG was. By the end of that conversation I understood that I had built a time machine out of spare parts when SQL had a built-in one the whole time.

Let me be clear about what I am not going to do. I am not going to define LAG and LEAD the way every other article does. LAG returns the value from a previous row, LEAD returns the value from a future row, here is the syntax, good luck. That definition is accurate and completely useless until you understand exactly how each argument works and what happens to your data when you add or remove it.

What I am going to do instead is build the syntax in front of you. One argument at a time, on the same dataset, so you can see exactly what each addition does and why it exists. Same table. Same question. One new piece each time.

Here is the one idea that makes LAG and LEAD obvious.

Every row in your data exists in time. LAG lets each row reach backward and read the row behind it. LEAD lets each row reach forward and read the row ahead of it. No join required. No date arithmetic. Just a row looking over its shoulder, or around the corner, and reading what it finds.

That is it. That is the entire concept.

Think about a hospital ward where patients are checked every month and their vitals recorded in a logbook. One entry per month, in order. A nurse reading the logbook does not need to photocopy the previous page and lay it next to the current one to compare last month’s reading to this month’s. She just looks one entry back. The information is already there, in sequence, right behind the current row.

LAG is that backward glance. LEAD is looking one entry ahead to anticipate what is coming. The logbook is your table. The sequence is your ORDER BY. The glance is the function.

Self joins are what happen when you photocopy the previous page, tape it next to the current one, and carefully align the columns before you can read the comparison. It works. It is the long way around something that was always just a glance away.

Here is the dataset we will use for everything that follows. The monthly engagement table from Antara Health:

sql

SELECT * FROM monthly_engagement ORDER BY month;

month      | total_members | active_members | engagement_rate | churned_members | new_members
-----------|---------------|----------------|-----------------|-----------------|------------
2023-01-01 | 850           | 612            | 0.72            | 45              | 120
2023-02-01 | 920           | 644            | 0.70            | 61              | 70
2023-03-01 | 980           | 735            | 0.75            | 38              | 60
2023-04-01 | 1020          | 785            | 0.77            | 29              | 40
2023-05-01 | 1050          | 756            | 0.72            | 72              | 30
2023-06-01 | 1100          | 814            | 0.74            | 55              | 50

One row per month. Clean. Structured. Every column meaningful. Same table throughout. Watch what changes as we build.

Understanding LAG: the full syntax before we build

LAG takes three arguments. Not one. Three. And most articles only ever show you the first one, which is exactly why most analysts never fully understand what LAG can do.

Here is the complete signature:

sql

LAG(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)

Breaking it down in plain language:

LAG(column,    offset,      default)    OVER    (PARTITION BY ...    ORDER BY ...)
     |              |              |                       |                       |
Which column   How many     What to              Which rows          How to
to look        rows back    return if            belong              sequence
back at        to look      no row exists        together            the rows

We are going to build each argument one at a time. Same dataset. Every stage shows you exactly what that argument does to the data.

LAG Stage 1: Just the column

Start with the minimum. One argument. The column you want to look back at. No offset specified means the default is one row back. No default specified means NULL is returned when there is no previous row.

sql

SELECT
    month,
    engagement_rate,
    LAG(engagement_rate) OVER (ORDER BY month) AS prev_engagement_rate
FROM monthly_engagement
ORDER BY month;

month      | engagement_rate | prev_engagement_rate
-----------|-----------------|---------------------
2023-01-01 | 0.72            | NULL
2023-02-01 | 0.70            | 0.72
2023-03-01 | 0.75            | 0.70
2023-04-01 | 0.77            | 0.75
2023-05-01 | 0.72            | 0.77
2023-06-01 | 0.74            | 0.72

January returns NULL because there is no row behind it. It is the first entry in the logbook. Every other month reaches back exactly one row and reads the engagement rate it finds there.

The ORDER BY inside the OVER clause is what tells LAG which direction backward means. Without ORDER BY the backward glance lands on a random row. The function has no sequence to follow. ORDER BY is not optional with LAG. It is the definition of what previous means.

Now you can compute month over month change directly:

sql

SELECT
    month,
    engagement_rate,
    LAG(engagement_rate) OVER (ORDER BY month) AS prev_engagement_rate,
    ROUND(
        engagement_rate - LAG(engagement_rate) OVER (ORDER BY month),
    2) AS mom_change
FROM monthly_engagement
ORDER BY month;

month      | engagement_rate | prev_engagement_rate | mom_change
-----------|-----------------|----------------------|-----------
2023-01-01 | 0.72            | NULL                 | NULL
2023-02-01 | 0.70            | 0.72                 | -0.02
2023-03-01 | 0.75            | 0.70                 | +0.05
2023-04-01 | 0.77            | 0.75                 | +0.02
2023-05-01 | 0.72            | 0.77                 | -0.05
2023-06-01 | 0.74            | 0.72                 | +0.02

The self join that took an afternoon is now four lines. Same result. One backward glance per row.

The common mistake analysts make here: they write LAG without ORDER BY and get results that look plausible but are wrong. The function ran. It looked back. It looked back at a random row because no sequence was defined. Always include ORDER BY. It is not optional. It is the instruction that tells the glance which direction to face.

Ask yourself before moving forward: do I always need to look exactly one row back, or do I sometimes need to look further? If further, add the offset argument.

LAG Stage 2: Adding the offset

The second argument is the offset. It tells LAG how many rows back to look. The default is one. But sometimes the question requires looking two months back, three months back, or further.

sql

SELECT
    month,
    engagement_rate,
    LAG(engagement_rate, 1) OVER (ORDER BY month) AS prev_1_month,
    LAG(engagement_rate, 2) OVER (ORDER BY month) AS prev_2_months,
    LAG(engagement_rate, 3) OVER (ORDER BY month) AS prev_3_months
FROM monthly_engagement
ORDER BY month;

month      | engagement_rate | prev_1_month | prev_2_months | prev_3_months
-----------|-----------------|--------------|---------------|---------------
2023-01-01 | 0.72            | NULL         | NULL          | NULL
2023-02-01 | 0.70            | 0.72         | NULL          | NULL
2023-03-01 | 0.75            | 0.70         | 0.72          | NULL
2023-04-01 | 0.77            | 0.75         | 0.70          | 0.72
2023-05-01 | 0.72            | 0.77         | 0.75          | 0.70
2023-06-01 | 0.74            | 0.72         | 0.77          | 0.75

Each column reaches back a different distance. January through March accumulate NULLs as expected. There are no rows behind them at those distances. April is the first month with a full three month history visible in a single row.

This is how you build quarter over quarter comparisons, year over year comparisons, or any analysis where the relevant past is not immediately adjacent. LAG(engagement_rate, 12) looks back exactly twelve rows, one year of monthly data, without a single join.

The common mistake analysts make here: they use a large offset without checking whether enough historical rows exist in their data. LAG(column, 12) on six months of data returns NULL for every row because the twelfth previous row does not exist. Always verify your data history depth before choosing your offset.

Ask yourself before moving forward: what should happen when LAG reaches the beginning of the data and finds nothing? If NULL causes problems downstream, add the default argument.

LAG Stage 3: Adding the default

The third argument is the default value. It tells LAG what to return when there is no previous row. When the backward glance reaches the edge of the data and finds nothing.

Without a default, LAG returns NULL for those rows. Sometimes NULL is correct and meaningful. Sometimes NULL breaks downstream calculations. Averages that ignore NULLs, ratios that produce errors, dashboards that display blank cells where a real value would be more honest.

sql

SELECT
    month,
    engagement_rate,
    LAG(engagement_rate, 1, 0) OVER (ORDER BY month)             AS prev_rate_default_zero,
    LAG(engagement_rate, 1, engagement_rate) OVER (ORDER BY month) AS prev_rate_default_self
FROM monthly_engagement
ORDER BY month;

month      | engagement_rate | prev_rate_default_zero | prev_rate_default_self
-----------|-----------------|------------------------|------------------------
2023-01-01 | 0.72            | 0                      | 0.72
2023-02-01 | 0.70            | 0.72                   | 0.72
2023-03-01 | 0.75            | 0.70                   | 0.70
2023-04-01 | 0.77            | 0.75                   | 0.75
2023-05-01 | 0.72            | 0.77                   | 0.77
2023-06-01 | 0.74            | 0.72                   | 0.72

Default zero means January’s previous rate is treated as zero, making the month over month change for January appear as positive 0.72. That implies the program went from zero engagement to 72% in one month. Technically accurate. Also misleading if the program existed before January and zero is not an honest representation of history.

Default self means January’s previous rate is treated as its own current rate, making the month over month change appear as zero. No change. Not strictly true either, but often the safest choice when you want to avoid NULL without distorting the trend.

The right default depends entirely on what the number means in your specific context. Zero is rarely the honest answer. NULL is often the most truthful. Default self is the safest option when NULL is not acceptable downstream.

The common mistake analysts make here: they default to zero without thinking about what zero implies in their data. In an engagement rate context, zero means no engagement existed. That is almost never what you want to communicate about the first month of data. Think before you default.

Ask yourself before moving forward: does my data have multiple groups that should be compared only within themselves, like conditions, regions, or product lines? If yes, add PARTITION BY.

LAG Stage 4: Adding PARTITION BY

This is where LAG becomes genuinely powerful for any multi-group dataset. PARTITION BY tells LAG to reset at the boundary of each group, so the backward glance never crosses from one group into another.

Imagine the monthly engagement table now has a condition column, tracking engagement separately for Diabetes, Hypertension, and Hyperlipidemia patients:

sql

SELECT * FROM monthly_engagement_by_condition ORDER BY condition, month;

condition      | month      | active_members | engagement_rate
---------------|------------|----------------|----------------
Diabetes       | 2023-01-01 | 287            | 0.74
Diabetes       | 2023-02-01 | 301            | 0.71
Diabetes       | 2023-03-01 | 334            | 0.76
Hypertension   | 2023-01-01 | 198            | 0.70
Hypertension   | 2023-02-01 | 210            | 0.69
Hypertension   | 2023-03-01 | 241            | 0.73
Hyperlipidemia | 2023-01-01 | 127            | 0.68
Hyperlipidemia | 2023-02-01 | 133            | 0.71
Hyperlipidemia | 2023-03-01 | 160            | 0.75

Without PARTITION BY, LAG crosses condition boundaries. The January Hypertension row looks back and reads the March Diabetes row as its previous value. That is not a previous month. That is a completely different condition. The backward glance has jumped the fence.

sql

-- WRONG: No PARTITION BY — LAG crosses condition boundaries
SELECT
    condition,
    month,
    engagement_rate,
    LAG(engagement_rate) OVER (ORDER BY condition, month) AS prev_rate_WRONG
FROM monthly_engagement_by_condition
ORDER BY condition, month;

condition      | month      | engagement_rate | prev_rate_WRONG
---------------|------------|-----------------|----------------
Diabetes       | 2023-01-01 | 0.74            | NULL
Diabetes       | 2023-02-01 | 0.71            | 0.74
Diabetes       | 2023-03-01 | 0.76            | 0.71
Hypertension   | 2023-01-01 | 0.70            | 0.76  -- WRONG: this is March Diabetes
Hypertension   | 2023-02-01 | 0.69            | 0.70
Hypertension   | 2023-03-01 | 0.73            | 0.69
Hyperlipidemia | 2023-01-01 | 0.68            | 0.73  -- WRONG: this is March Hypertension
Hyperlipidemia | 2023-02-01 | 0.71            | 0.68
Hyperlipidemia | 2023-03-01 | 0.75            | 0.71

January Hypertension shows 0.76 as its previous engagement rate. That is March Diabetes. The glance jumped the fence. The numbers look plausible. They are completely wrong.

sql

-- CORRECT: PARTITION BY condition keeps groups separate
SELECT
    condition,
    month,
    engagement_rate,
    LAG(engagement_rate) OVER (PARTITION BY condition ORDER BY month) AS prev_rate_CORRECT,
    ROUND(
        engagement_rate - LAG(engagement_rate) OVER (PARTITION BY condition ORDER BY month),
    2) AS mom_change
FROM monthly_engagement_by_condition
ORDER BY condition, month;

condition      | month      | engagement_rate | prev_rate_CORRECT | mom_change
---------------|------------|-----------------|-------------------|------------
Diabetes       | 2023-01-01 | 0.74            | NULL              | NULL
Diabetes       | 2023-02-01 | 0.71            | 0.74              | -0.03
Diabetes       | 2023-03-01 | 0.76            | 0.71              | +0.05
Hypertension   | 2023-01-01 | 0.70            | NULL              | NULL
Hypertension   | 2023-02-01 | 0.69            | 0.70              | -0.01
Hypertension   | 2023-03-01 | 0.73            | 0.69              | +0.04
Hyperlipidemia | 2023-01-01 | 0.68            | NULL              | NULL
Hyperlipidemia | 2023-02-01 | 0.71            | 0.68              | +0.03
Hyperlipidemia | 2023-03-01 | 0.75            | 0.71              | +0.04

Now every condition resets at its own boundary. January Hypertension returns NULL, correct, because there is no previous Hypertension month to look back at. The glance stays inside the fence.

The principle is this: whenever your data has groups and you need LAG to respect those groups, add PARTITION BY. Without it LAG will cross boundaries silently and produce numbers that look right and mean nothing.

Understanding LEAD: the same four stages, the opposite direction

LEAD is LAG’s mirror. Identical syntax. Identical arguments. The only difference is direction. Instead of looking backward at the row behind, it looks forward at the row ahead.

sql

LEAD(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)

Same map. Different direction:

LEAD(column,   offset,      default)    OVER    (PARTITION BY ...    ORDER BY ...)
      |              |              |                      |                       |
Which column   How many     What to              Which rows          How to
to look        rows         return if            belong              sequence
forward at     forward      no row exists        together            the rows

We will build it the same way. One argument at a time.

LEAD Stage 1: Just the column

sql

SELECT
    month,
    engagement_rate,
    LEAD(engagement_rate) OVER (ORDER BY month) AS next_engagement_rate
FROM monthly_engagement
ORDER BY month;

month      | engagement_rate | next_engagement_rate
-----------|-----------------|---------------------
2023-01-01 | 0.72            | 0.70
2023-02-01 | 0.70            | 0.75
2023-03-01 | 0.75            | 0.77
2023-04-01 | 0.77            | 0.72
2023-05-01 | 0.72            | 0.74
2023-06-01 | 0.74            | NULL

June returns NULL because there is no row ahead of it. It is the last entry in the logbook. Every other month reaches forward one row and reads the engagement rate waiting there.

January already knows that next month engagement will drop to 0.70. April already knows that next month will drop to 0.72. Combined with LAG you can see that April is sitting at its peak, about to fall. The data already contains that future. LEAD just lets each row read it.

The common mistake analysts make here: they confuse LEAD with forecasting or prediction. LEAD does not predict future data that does not exist. It reads future rows that are already in your table. If June has not been recorded yet, LEAD on May returns NULL. It is a forward glance through existing data, not a crystal ball.

LEAD Stage 2: Adding the offset

sql

SELECT
    month,
    engagement_rate,
    LEAD(engagement_rate, 1) OVER (ORDER BY month) AS next_1_month,
    LEAD(engagement_rate, 2) OVER (ORDER BY month) AS next_2_months,
    LEAD(engagement_rate, 3) OVER (ORDER BY month) AS next_3_months
FROM monthly_engagement
ORDER BY month;

month      | engagement_rate | next_1_month | next_2_months | next_3_months
-----------|-----------------|--------------|---------------|---------------
2023-01-01 | 0.72            | 0.70         | 0.75          | 0.77
2023-02-01 | 0.70            | 0.75         | 0.77          | 0.72
2023-03-01 | 0.75            | 0.77         | 0.72          | 0.74
2023-04-01 | 0.77            | 0.72         | 0.74          | NULL
2023-05-01 | 0.72            | 0.74         | NULL          | NULL
2023-06-01 | 0.74            | NULL         | NULL          | NULL

January can see the next three months from where it sits. April can only see two months ahead before the data runs out. This is how you build forward looking windows, identifying which months have enough future data to make a meaningful comparison and which are approaching the edge of what is known.

LEAD Stage 3: Adding the default

sql

SELECT
    month,
    engagement_rate,
    LEAD(engagement_rate, 1, engagement_rate) OVER (ORDER BY month) AS next_rate_default_self,
    CASE
        WHEN LEAD(engagement_rate) OVER (ORDER BY month) > engagement_rate THEN 'Improving'
        WHEN LEAD(engagement_rate) OVER (ORDER BY month) < engagement_rate THEN 'Declining'
        WHEN LEAD(engagement_rate) OVER (ORDER BY month) IS NULL THEN 'No Data'
        ELSE 'Stable'
    END AS next_month_trend
FROM monthly_engagement
ORDER BY month;

month      | engagement_rate | next_rate_default_self | next_month_trend
-----------|-----------------|------------------------|------------------
2023-01-01 | 0.72            | 0.70                   | Declining
2023-02-01 | 0.70            | 0.75                   | Improving
2023-03-01 | 0.75            | 0.77                   | Improving
2023-04-01 | 0.77            | 0.72                   | Declining
2023-05-01 | 0.72            | 0.74                   | Improving
2023-06-01 | 0.74            | 0.74                   | No Data

The trend column is computed directly from LEAD. No subquery, no join, no separate calculation. January already knows it is heading into a declining month. April is at its peak and already knows the fall is coming. The CASE statement converts the forward glance into a human readable signal that any stakeholder can act on immediately.

LEAD Stage 4: Adding PARTITION BY

Exactly like LAG, LEAD without PARTITION BY will cross group boundaries and read the wrong future row. The fix is identical. Add PARTITION BY to keep the forward glance inside the correct fence.

sql

SELECT
    condition,
    month,
    engagement_rate,
    LEAD(engagement_rate) OVER (PARTITION BY condition ORDER BY month) AS next_rate,
    CASE
        WHEN LEAD(engagement_rate) OVER (PARTITION BY condition ORDER BY month) > engagement_rate
            THEN 'Improving'
        WHEN LEAD(engagement_rate) OVER (PARTITION BY condition ORDER BY month) < engagement_rate
            THEN 'Declining'
        WHEN LEAD(engagement_rate) OVER (PARTITION BY condition ORDER BY month) IS NULL
            THEN 'No Data'
        ELSE 'Stable'
    END AS next_month_trend
FROM monthly_engagement_by_condition
ORDER BY condition, month;

condition      | month      | engagement_rate | next_rate | next_month_trend
---------------|------------|-----------------|-----------|------------------
Diabetes       | 2023-01-01 | 0.74            | 0.71      | Declining
Diabetes       | 2023-02-01 | 0.71            | 0.76      | Improving
Diabetes       | 2023-03-01 | 0.76            | NULL      | No Data
Hypertension   | 2023-01-01 | 0.70            | 0.69      | Declining
Hypertension   | 2023-02-01 | 0.69            | 0.73      | Improving
Hypertension   | 2023-03-01 | 0.73            | NULL      | No Data
Hyperlipidemia | 2023-01-01 | 0.68            | 0.71      | Improving
Hyperlipidemia | 2023-02-01 | 0.71            | 0.75      | Improving
Hyperlipidemia | 2023-03-01 | 0.75            | NULL      | No Data

Every condition stays inside its own fence. March of every condition returns No Data because there is no future row within that condition’s partition. The forward glance respects the boundary.

Bringing LAG and LEAD together: the complete picture

The most powerful use of LAG and LEAD is when you combine them. Giving every row both a memory and a glimpse of the future simultaneously. Here is the complete query that gives the Antara Health team everything they need in a single clean result:

sql

SELECT
    month,
    active_members,
    LAG(active_members) OVER (ORDER BY month)                               AS prev_active_members,
    LEAD(active_members) OVER (ORDER BY month)                              AS next_active_members,
    engagement_rate,
    LAG(engagement_rate) OVER (ORDER BY month)                              AS prev_engagement,
    LEAD(engagement_rate) OVER (ORDER BY month)                             AS next_engagement,
    ROUND(engagement_rate - LAG(engagement_rate) OVER (ORDER BY month), 2)  AS mom_change,
    churned_members,
    LAG(churned_members) OVER (ORDER BY month)                              AS prev_churned,
    LEAD(churned_members) OVER (ORDER BY month)                             AS next_churned,
    CASE
        WHEN LEAD(engagement_rate) OVER (ORDER BY month) > engagement_rate  THEN 'Improving'
        WHEN LEAD(engagement_rate) OVER (ORDER BY month) < engagement_rate  THEN 'Declining'
        WHEN LEAD(engagement_rate) OVER (ORDER BY month) IS NULL            THEN 'No Data'
        ELSE 'Stable'
    END AS next_month_trend
FROM monthly_engagement
ORDER BY month;

month      | active | prev_active | next_active | eng_rate | prev_eng | next_eng | mom_change | churned | prev_churned | next_churned | trend
-----------|--------|-------------|-------------|----------|----------|----------|------------|---------|--------------|--------------|----------
2023-01-01 | 612    | NULL        | 644         | 0.72     | NULL     | 0.70     | NULL       | 45      | NULL         | 61           | Declining
2023-02-01 | 644    | 612         | 735         | 0.70     | 0.72     | 0.75     | -0.02      | 61      | 45           | 38           | Improving
2023-03-01 | 735    | 644         | 785         | 0.75     | 0.70     | 0.77     | +0.05      | 38      | 61           | 29           | Improving
2023-04-01 | 785    | 735         | 756         | 0.77     | 0.75     | 0.72     | +0.02      | 29      | 38           | 72           | Declining
2023-05-01 | 756    | 785         | 814         | 0.72     | 0.77     | 0.74     | -0.05      | 72      | 29           | 55           | Improving
2023-06-01 | 814    | 756         | NULL        | 0.74     | 0.72     | NULL     | +0.02      | 55      | 72           | NULL         | No Data

Now look at what April is telling you. Active members are at their highest, 785. Engagement rate is at its peak, 0.77. Everything looks healthy. But LEAD on churn shows that next month churn will spike from 29 to 72. The trend column says Declining. April is the calm before the storm. And this table shows it clearly while April is still happening, not after May has already arrived.

That is the table I wish I had built at Antara Health on the first day of that project. History, current state, and direction of travel, in a single query that replaced an afternoon of self join archaeology.

The principle is this: LAG gives rows a memory. LEAD gives rows a forecast. Together they give every row a complete sense of time, where it has been, where it is, and where it is heading.

Here is what the cascade looks like when analysts reach for self joins instead of LAG and LEAD.

Stage 1: The First Self Join Works. The analyst writes the join, gets the month over month comparison, delivers the report. The query is long but correct. Nobody questions it.

Stage 2: The Query Gets Handed to Someone Else. A new analyst joins the team and needs to update the report. They open the query and spend an hour understanding why the table is being joined to itself before they can touch a single line. What should have been a five minute update becomes an afternoon of archaeology.

Stage 3: The Date Logic Breaks. The date arithmetic inside the join breaks when the data format changes slightly. The analyst who wrote it is no longer on the team. Nobody knows why the join is structured the way it is. The debugging session takes a day.

Stage 4: The Numbers Come Back Wrong. The fix introduces a subtle error in the date calculation. The engagement rates for two months are transposed. The stakeholder notices. Trust erodes.

Stage 5: Someone Rebuilds It From Scratch. Another analyst rewrites the entire thing using LAG. It takes twenty minutes. It is readable. It has been correct ever since.

I call this the Complexity Tax. The hidden cost of solving a simple problem with a complicated tool. You pay it once when you write the query. You pay it again every time someone reads it. You pay it a third time when it breaks and nobody knows why.

LAG and LEAD exist to eliminate that tax entirely.

Here is the part where I tell you something that might sting.

Most analysts who write self joins for time based comparisons are not doing it because self joins are the right tool. They are doing it because self joins are the tool they learned first. And switching to a better tool requires admitting the old approach was unnecessary.

That admission is uncomfortable. It means the complicated query you spent an afternoon on could have been four lines. It means the debugging session that cost you a day was optional. It means the complexity was not sophistication. It was a gap in your toolkit dressed up as thoroughness.

In my years of working across analytics teams I have watched senior analysts defend self joins long after they discovered LAG existed. Not because self joins were better. Because changing how you work requires acknowledging how you worked before.

The analysts who grow fastest feel no loyalty to the tools they learned first. They ask only one question: is there a simpler way? When the answer is yes they switch without hesitation and never look back.

There is a simpler way. There has always been a simpler way. It was called LAG.

Here is where this connects to your trajectory.

Junior analysts solve time based comparison problems with whatever tool they encountered first. Usually a self join, sometimes a subquery, occasionally a spreadsheet export with manual calculations. The answer is correct. The path to it is unnecessarily complicated.

Senior analysts recognize the shape of the problem before they write a single line. They hear the words month over month, compared to last period, trend over time and immediately know which tool the question demands. They write four lines where a junior analyst writes forty. The query is readable, maintainable, and correct the first time.

The mindset shift is this: stop solving the problem and start recognizing the pattern. Time based comparisons are a pattern. LAG and LEAD are the answer to that pattern. Once you have internalized the connection between the two the tool selects itself.

The next time you find yourself joining a table to itself to compare a row to the row before it, stop. Ask yourself: am I just looking one row backward? If yes, close the join. Open LAG. Write four lines. Move on.

Last week we gave the first wish a name. The wish to group and still see the rows.

This week the wish was different. The wish to look backward without losing your place. The wish to see what is coming without leaving the current row. The wish to give every row a memory and a glimpse of what comes next.

That wish has two names. And now you know both of them.

The data remembers everything. Learn to look both ways.

If this essay helped something click, and you want to support more writing like it, you can buy me a coffee here: Paypal:phelisiajeruto@gmail.com

It keeps the ideas coming and the analogies flowing. No pressure, always appreciated.

If you have ever stared at a case study prompt and felt completely lost about how to structure your thinking, my ebook, Data Case Study Mastery Guide, breaks down the exact framework I use. No fluff, just a system that works. You can find it here.

Next week: FIRST_VALUE and LAST_VALUE. The two window functions that let you anchor every row to the beginning or end of its partition, and why they solve problems that neither LAG nor LEAD can touch.

Join Phelisia Jeruto’s subscriber chat

Available in the Substack app and on web

Analytics with PJ

Discussion about this post

Ready for more?