The Trajectory Problem: Why FIRST_VALUE and LAST_VALUE See What Every Other Report Misses
Seeing where you started and where you ended without losing the row in between.
This essay is part of an ongoing series on window functions. If you are landing here for the first time, start at the beginning: “I Wish I Could Group and Still See the Rows” covers the foundation. Then read “The Time Machine Problem” on LAG and LEAD. Once you have read both, come back. Everything here will make immediate sense.
The clinicians kept raising concerns about specific patients. And the data kept telling us everything was fine.
We were tracking patient engagement across a chronic illness care program. Every interaction recorded: consultations, follow-ups, check-ins, missed sessions. Every patient had a score, updated with each interaction, reflecting how actively they were participating in their own care.
Our reporting was thorough. Engagement this week. Engagement this month. Cohort averages by condition. Trend lines by provider. On paper, the program looked stable. Averages holding. No dramatic drops. No obvious red flags.
But the clinicians kept saying the same thing in different ways: something is off with this patient. Something is off with that one. And when we pulled the numbers, the numbers disagreed. The patient looked average. The patient looked fine.
It took weeks before someone asked the question that changed everything.
Not: where is this patient now? But: where did this patient start?
That shift, from snapshot to trajectory, from current state to distance traveled, was the moment I understood what FIRST_VALUE and LAST_VALUE actually exist to answer. Not where someone is. Where they are relative to where they began. And whether the direction of travel is one a clinician should be worried about.
Let me be clear about what I am not going to do. I am not going to define FIRST_VALUE as the function that returns the first value in a partition and LAST_VALUE as the function that returns the last. Every article does that. You leave with a definition and no understanding of when or why you would actually reach for either one.
What I am going to do instead is show you the specific problem that neither LAG nor LEAD can solve, why FIRST_VALUE and LAST_VALUE were built for exactly that problem, and how to build the syntax one piece at a time until the logic is instinctive.
Here is the one idea that makes FIRST_VALUE and LAST_VALUE obvious.
LAG looks one row back. LEAD looks one row forward. But sometimes the question is not about the adjacent row. Sometimes the question is about the anchor. The very first moment. The very last moment. And how every row in between relates to one of those two fixed points.
FIRST_VALUE anchors every row to the beginning of its partition. LAST_VALUE anchors every row to the end. Together they give you trajectory, not just position.
That is the entire concept.
Think about a patient’s medical file. Every visit is recorded in order. A doctor reviewing the file does not just read the most recent entry and form a judgment. They read from the beginning. They ask: what was this patient like when they first came in? How does that compare to where they are now? Is the distance between those two points a story of progress or a story of decline?
That question, from first to last, is the one FIRST_VALUE and LAST_VALUE answer. Every row in the file already knows where the file began and where it currently ends. The doctor does not need to flip back to page one every time. The context travels with every entry.
LAG and LEAD are about looking at the row next to you. FIRST_VALUE and LAST_VALUE are about knowing where the story started and where it stands right now, no matter which page you are on.
Here is the dataset we will use for everything that follows. The patient engagement table from Antara Health:
sql
SELECT * FROM patient_engagement ORDER BY patient_id, session_date;
patient_id | session_date | engagement_score | session_type | status
-----------|--------------|------------------|--------------|--------
P001 | 2023-01-05 | 92 | consultation | active
P001 | 2023-01-12 | 88 | follow-up | active
P001 | 2023-01-20 | 75 | follow-up | active
P001 | 2023-02-02 | 60 | missed | at-risk
P002 | 2023-01-06 | 55 | consultation | active
P002 | 2023-01-14 | 62 | follow-up | active
P002 | 2023-01-22 | 70 | follow-up | active
P002 | 2023-02-03 | 78 | consultation | active
P003 | 2023-01-07 | 80 | consultation | active
P003 | 2023-01-15 | 80 | follow-up | active
P003 | 2023-01-23 | 79 | follow-up | active
P003 | 2023-02-04 | 78 | follow-up | activeThree patients. Multiple sessions each. All ordered by date. P001 started high and dropped. P002 started low and climbed. P003 looks stable throughout. In a standard snapshot report all three might look unremarkable. The trajectory tells a completely different story.
Same table throughout. Watch what changes as we build.
Understanding FIRST_VALUE: the full syntax before we build
FIRST_VALUE takes one argument and sits inside an OVER clause. But it has a critical detail that most articles gloss over entirely, the frame clause, and without understanding it FIRST_VALUE will silently return wrong results in ways that look correct.
Here is the complete signature:
sql
FIRST_VALUE(column) OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)Breaking it down in plain language:
FIRST_VALUE(column) OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN ...)
| | | |
Which column Which rows How to How wide to
to anchor to belong sequence open the window
the first value together the rowsWe are going to build each piece one at a time. Same dataset. Every stage shows you exactly what each piece does to the data.
FIRST_VALUE Stage 1: Just the column and OVER
Start with the minimum. The column you want to anchor to and the OVER clause that opens the window.
sql
SELECT
patient_id,
session_date,
engagement_score,
FIRST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
) AS first_engagement_score
FROM patient_engagement
ORDER BY patient_id, session_date;patient_id | session_date | engagement_score | first_engagement_score
-----------|--------------|------------------|----------------------
P001 | 2023-01-05 | 92 | 92
P001 | 2023-01-12 | 88 | 92
P001 | 2023-01-20 | 75 | 92
P001 | 2023-02-02 | 60 | 92
P002 | 2023-01-06 | 55 | 55
P002 | 2023-01-14 | 62 | 55
P002 | 2023-01-22 | 70 | 55
P002 | 2023-02-03 | 78 | 55
P003 | 2023-01-07 | 80 | 80
P003 | 2023-01-15 | 80 | 80
P003 | 2023-01-23 | 79 | 80
P003 | 2023-02-04 | 78 | 80Every row now knows the first engagement score recorded for its patient. P001 started at 92. Every subsequent row for P001 carries that anchor. P002 started at 55. P003 started at 80.
The anchor travels with every row. No matter which session you are looking at, the starting point is always visible.
PARTITION BY patient_id tells FIRST_VALUE to reset for each patient. Without it every row across all patients would anchor to the single first row in the entire table, which is meaningless. ORDER BY session_date tells it which row is first, the earliest session date within each partition.
The common mistake analysts make here: they forget PARTITION BY and every row anchors to the same first value across the entire dataset. The numbers look plausible. They are wrong. Always ask: first within what group? The answer to that question is your PARTITION BY.
Ask yourself before moving forward: is the result I am seeing truly the first value, or is the window frame silently expanding and changing what first means? If you are not sure, add the frame clause.
FIRST_VALUE Stage 2: Adding the frame clause
This is the part most articles skip. And it is the part that causes silent errors in real data.
When you add ORDER BY inside a window function without specifying a frame clause, SQL uses a default frame. That default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. For FIRST_VALUE this is usually fine because the first value does not change as you move through rows.
But the moment you have ties in your ORDER BY column, the default frame can produce unexpected results. The safest practice, the one that makes your intent explicit and your query readable, is to always specify the frame clause with FIRST_VALUE.
sql
SELECT
patient_id,
session_date,
engagement_score,
FIRST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS first_engagement_score
FROM patient_engagement
ORDER BY patient_id, session_date;The result is identical to Stage 1 on this dataset. But the intent is now explicit.
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means: start from the very first row in this partition and include every row up to the one currently being processed. For FIRST_VALUE this means the anchor is always the first row in the partition, no matter where in the sequence the current row sits.
Breaking it down word by word. ROWS means we are counting individual rows. UNBOUNDED PRECEDING means go all the way back to the first row in the partition without stopping. CURRENT ROW means the window closes at the row currently being evaluated. Together they produce a window that always includes the first row of the partition, which is exactly what FIRST_VALUE needs to return the correct anchor consistently.
The common mistake analysts make here: they skip the frame clause entirely and rely on the default. On clean data with no ties this works. On real data with duplicate dates, reprocessed records, or edge cases, the default frame can silently return the wrong first value. Specify the frame. Make your intent explicit. Your future self will thank you.
FIRST_VALUE Stage 3: Using the anchor to compute trajectory
Now that every row knows where the patient started, we can compute the exact change from baseline for every session.
sql
SELECT
patient_id,
session_date,
engagement_score,
FIRST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS baseline_score,
engagement_score - FIRST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS change_from_baseline
FROM patient_engagement
ORDER BY patient_id, session_date;patient_id | session_date | engagement_score | baseline_score | change_from_baseline
-----------|--------------|------------------|----------------|---------------------
P001 | 2023-01-05 | 92 | 92 | 0
P001 | 2023-01-12 | 88 | 92 | -4
P001 | 2023-01-20 | 75 | 92 | -17
P001 | 2023-02-02 | 60 | 92 | -32
P002 | 2023-01-06 | 55 | 55 | 0
P002 | 2023-01-14 | 62 | 55 | +7
P002 | 2023-01-22 | 70 | 55 | +15
P002 | 2023-02-03 | 78 | 55 | +23
P003 | 2023-01-07 | 80 | 80 | 0
P003 | 2023-01-15 | 80 | 80 | 0
P003 | 2023-01-23 | 79 | 80 | -1
P003 | 2023-02-04 | 78 | 80 | -2Now the story is visible. P001 started at 92 and has drifted 32 points below baseline. In a snapshot report that shows only current engagement, P001 at 60 looks similar to P003 at 78 drifting down from 80. But the trajectory tells a completely different story. P001 is in freefall. P003 is essentially stable. The number alone does not reveal that. The distance from baseline does.
P002 started at 55 and has climbed 23 points above baseline. In a threshold based report that asks who is below 70, P002 would have appeared as a problem patient for the first two sessions. The trajectory reveals something entirely different: a patient who is improving steadily and consistently. Not a problem. A success.
That is what the clinicians were sensing and the snapshots were hiding. The direction of travel. The distance from where each patient began.
Decline from baseline is risk. Improvement from baseline is progress. The number in isolation is neither.
Understanding LAST_VALUE: the same structure, the opposite anchor
LAST_VALUE anchors every row to the last value in its partition rather than the first. The syntax is identical. The frame clause is not optional with LAST_VALUE. It is essential. And here is why.
Without an explicit frame clause, LAST_VALUE uses the default frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. For LAST_VALUE this means the last value within the window ending at the current row, which changes with every row. The result is that LAST_VALUE without an explicit frame does not return the last value in the partition. It returns the current row’s value, because the current row is always the last row in a window that ends at the current row.
This is the most common LAST_VALUE mistake in SQL. The query runs. The results look plausible. Every row shows its own current engagement score as the last value. Nothing errors. Everything is wrong.
The fix is explicit and simple: extend the frame to UNBOUNDED FOLLOWING.
sql
LAST_VALUE(column) OVER (
PARTITION BY ...
ORDER BY ...
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)UNBOUNDED FOLLOWING means extend the window all the way to the last row in the partition. Now every row can see the final value regardless of where in the sequence it sits.
LAST_VALUE Stage 1: Without the correct frame
This is what goes wrong when analysts skip the frame clause:
sql
-- WRONG: Default frame means LAST_VALUE returns current row value
SELECT
patient_id,
session_date,
engagement_score,
LAST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
) AS last_engagement_score_WRONG
FROM patient_engagement
ORDER BY patient_id, session_date;patient_id | session_date | engagement_score | last_engagement_score_WRONG
-----------|--------------|------------------|----------------------------
P001 | 2023-01-05 | 92 | 92
P001 | 2023-01-12 | 88 | 88
P001 | 2023-01-20 | 75 | 75
P001 | 2023-02-02 | 60 | 60
P002 | 2023-01-06 | 55 | 55
P002 | 2023-01-14 | 62 | 62
P002 | 2023-01-22 | 70 | 70
P002 | 2023-02-03 | 78 | 78Every row shows its own current value as the last value. No errors. No warnings. Completely wrong. This is the silent failure that makes LAST_VALUE dangerous without the frame clause.
LAST_VALUE Stage 2: With the correct frame
sql
-- CORRECT: UNBOUNDED FOLLOWING extends the window to the end of the partition
SELECT
patient_id,
session_date,
engagement_score,
LAST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_engagement_score
FROM patient_engagement
ORDER BY patient_id, session_date;patient_id | session_date | engagement_score | last_engagement_score
-----------|--------------|------------------|----------------------
P001 | 2023-01-05 | 92 | 60
P001 | 2023-01-12 | 88 | 60
P001 | 2023-01-20 | 75 | 60
P001 | 2023-02-02 | 60 | 60
P002 | 2023-01-06 | 55 | 78
P002 | 2023-01-14 | 62 | 78
P002 | 2023-01-22 | 70 | 78
P002 | 2023-02-03 | 78 | 78
P003 | 2023-01-07 | 80 | 78
P003 | 2023-01-15 | 80 | 78
P003 | 2023-01-23 | 79 | 78
P003 | 2023-02-04 | 78 | 78Now every row knows the most recent engagement score recorded for its patient. P001’s most recent score is 60. Every row for P001 carries that reality. P002’s most recent score is 78. P003’s most recent score is 78.
The principle is this: always specify the frame clause with LAST_VALUE. Without ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, LAST_VALUE silently returns the wrong answer on every row except the last one. It is the most common window function mistake in production SQL and it leaves no error behind to find it.
Bringing FIRST_VALUE and LAST_VALUE together: the complete trajectory
This is the query that changed how the care team at Antara Health understood their patients. Every patient. Every session. Baseline, current state, and the exact trajectory between them, in a single result:
sql
SELECT
patient_id,
session_date,
engagement_score,
session_type,
status,
FIRST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS baseline_score,
LAST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS current_score,
LAST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) -
FIRST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS trajectory,
CASE
WHEN LAST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) -
FIRST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) <= -20 THEN 'High Risk'
WHEN LAST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) -
FIRST_VALUE(engagement_score) OVER (
PARTITION BY patient_id
ORDER BY session_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) >= 10 THEN 'Improving'
ELSE 'Stable'
END AS patient_trajectory
FROM patient_engagement
ORDER BY patient_id, session_date;patient_id | session_date | engagement_score | baseline | current | trajectory | patient_trajectory
-----------|--------------|------------------|----------|---------|------------|-------------------
P001 | 2023-01-05 | 92 | 92 | 60 | -32 | High Risk
P001 | 2023-01-12 | 88 | 92 | 60 | -32 | High Risk
P001 | 2023-01-20 | 75 | 92 | 60 | -32 | High Risk
P001 | 2023-02-02 | 60 | 92 | 60 | -32 | High Risk
P002 | 2023-01-06 | 55 | 55 | 78 | +23 | Improving
P002 | 2023-01-14 | 62 | 55 | 78 | +23 | Improving
P002 | 2023-01-22 | 70 | 55 | 78 | +23 | Improving
P002 | 2023-02-03 | 78 | 55 | 78 | +23 | Improving
P003 | 2023-01-07 | 80 | 80 | 78 | -2 | Stable
P003 | 2023-01-15 | 80 | 80 | 78 | -2 | Stable
P003 | 2023-01-23 | 79 | 80 | 78 | -2 | Stable
P003 | 2023-02-04 | 78 | 80 | 78 | -2 | StableNow look at what this table is actually saying.
P001 has a trajectory of negative 32. Every single session for P001 now carries that label: High Risk. Not because this week’s number is low. Because the distance from where they started to where they are now is 32 points of decline. A clinician looking at P001’s current score of 60 in isolation might flag it or might not depending on the threshold. A clinician looking at a trajectory of negative 32 from a baseline of 92 has no ambiguity. This patient is drifting. And they have been drifting since the beginning.
P002 has a trajectory of positive 23. Every session for P002 carries the label: Improving. P002’s first session score of 55 would have triggered concern in a threshold based report. The trajectory reveals the opposite story. This patient started low and has climbed consistently. Not a problem patient. A patient making genuine progress.
P003 is stable. Two points of drift over four sessions. No alarm needed.
This is what the clinicians were sensing. Not current state. Trajectory. And this is what the snapshot reports could not show them, because snapshots have no memory of where each patient began.
The principle is this: FIRST_VALUE gives you context. LAST_VALUE gives you reality. Together they give you trajectory. And in any domain where change over time matters more than current position, trajectory is the number that drives the right decision.
Here is what the cascade looks like when analytics teams rely on snapshots instead of trajectory.
Stage 1: The Report Looks Fine. Current engagement scores are pulled, averages computed, thresholds checked. Most patients are above the line. The program looks healthy.
Stage 2: The Clinicians Keep Raising Concerns. They see something in their patient interactions that the numbers are not capturing. They flag specific patients. The data does not confirm the concern. The disconnect grows.
Stage 3: The Wrong Patients Get Attention. Threshold based alerts fire on patients who are currently low but improving steadily. Those patients get intervention resources they do not need. Patients who are declining from a high baseline stay invisible because their current score is still above the threshold.
Stage 4: The Right Patients Get Missed. P001, who started at 92 and has dropped to 60, sits above a threshold of 55. No alert fires. No intervention is triggered. The decline that started months ago goes unaddressed until the score drops low enough to finally cross the line.
Stage 5: The Intervention Arrives Too Late. By the time the threshold based system flags the patient, the decline is advanced. The relationship with the care team has weakened. Re-engagement is harder and more expensive than early intervention would have been.
I call this the Snapshot Trap. You measure where people are and make decisions based on position alone. You miss the patients who are moving in the wrong direction because they have not yet arrived at the threshold. And you flag the patients who are moving in the right direction because they have not yet cleared it.
Trajectory fixes the Snapshot Trap. Not by replacing current state, but by adding the context that makes current state meaningful.
Here is the part where I tell you something that might sting.
Most analytics teams build threshold based reports because thresholds are simple. You pick a number. You check whether the current value is above or below it. You act on the ones that fall below. It is clean. It is automated. It is consistently wrong for a significant portion of the population it is meant to serve.
Because a threshold treats 60 as 60 regardless of whether it was reached from 92 or from 45. It treats position as if it exists independently of history. And in any domain where the direction of change matters, that treatment produces systematically bad decisions.
In my years of working across analytics teams, the analysts who build the most impactful work are the ones who ask not just where are we but how did we get here and where are we going. That question requires memory. It requires knowing the beginning. It requires a function that travels with every row and carries the starting point forward through the entire sequence.
Most analysts never reach for FIRST_VALUE and LAST_VALUE because most analytics courses never teach them. They are not exotic. They are not advanced. They are just the answer to a question most teams have not learned to ask yet.
The question is: compared to where they started, where are they now?
Here is where this connects to your trajectory.
Junior analysts build snapshot reports. They answer the question as it is asked: what is the current state? They pull the number, check the threshold, flag the exceptions, and deliver the report. The work is correct. It is incomplete.
Senior analysts build trajectory reports. They hear the question and ask the question behind the question: compared to what? Current state compared to last week is LAG. Current state compared to the very beginning is FIRST_VALUE. Current state compared to the very end is LAST_VALUE. The right function depends on what the comparison actually needs to be.
The mindset shift is this: stop thinking of analysis as a snapshot and start thinking of it as a story. Every row in your data is a moment in a sequence. FIRST_VALUE is the opening line. LAST_VALUE is where the story currently stands. The distance between them is what the story is actually about.
The next time a stakeholder asks you to build a patient report, a customer report, or any report that tracks people over time, before you pull a single number ask one question: are we measuring where they are, or are we measuring how far they have traveled? If the answer is how far they have traveled, you already know which functions to reach for.
The clinicians were right. Something was off. The data just did not know how to show it yet.
Every patient carries a history. Every score exists at a distance from where it began. The number alone does not tell you whether 60 is a crisis or a plateau. The baseline does.
FIRST_VALUE gives you context. LAST_VALUE gives you reality. Together they give you the only thing that actually drives the right decision.
The difference between 90 dropping to 60 and 60 holding at 60 is not in the number. It is in the journey.
Learn to read the distance.
If this essay helped something click, and you want to support more writing like it, you can buy me a coffee. PayPal: Phelisiajeruto@gmail.com
It keeps the ideas coming and the analogies flowing. No pressure, always appreciated.
If you have ever stared at a case study prompt and felt completely lost about how to structure your thinking, my ebook, Data Case Study Mastery Guide, breaks down the exact framework I use. No fluff, just a system that works. You can find it here.
Next week: NTILE. The window function that divides your data into equal buckets and why it is the cleanest way to build quartile analysis, patient risk tiers, and performance rankings without writing a single CASE statement.




