Intraday Evaluation¶
Work in progress — playground deployment
This page documents the evaluation framework currently deployed on the playground. It is a draft and may change before the production rollout — treat figures and examples here as preliminary.
In intraday forecasting challenges, you submit a new forecast every hour (i.e., every session / gate closure). Each forecast predicts the next 24 hours, starting from that session’s gate closure time.
As a result, the same future timestamp t ends up with multiple forecast values, one from each earlier submission that includes it.
Scoring happens per timestamp, not per submission. Every forecast that reaches t contributes equally to that timestamp's score.
Once you have scores for all timestamps, your daily score becomes the average of those scores (see Evaluation).
This page covers what is specific to Intraday forecasting challenges and answers the following questions:
- How does Predico organize the multiple intraday forecasts that reach the same timestamp
t? - What if you skip a session / gate closure?
- How do multiple forecasts for the same timestamp combine into a single score?
- How is your daily score computed, and what can prevent you from getting one?
The K-slots model¶
How does Predico organize the multiple intraday forecasts that reach the same timestamp t?
For each target timestamp t, you will have up to 24 submissions that reach it (i.e., one per gate). We call these the K slots that cover t. The slot at lead L = 0 includes the intraday forecast for the most recent gate (shortest lead); L = K−1 is the earliest gate that still covers t (longest lead time in the forecast horizon).
t at progressively shorter lead times.
Today's operational schedule
We have hourly sessions with intraday challenges. Each challenge covers a forecast window of 24 hours starting from the gate closure of that session, so K = 24.
Filling the slots¶
What if you skip a session / gate closure?
Each of the K slots is filled by the first forecast source you have available, in this order:
| Priority | Source | When it applies |
|---|---|---|
| 1 | Live submission | You submitted to that exact gate. |
| 2 | Intraday forward-fill | An earlier intraday submission still covers t. |
| 3 | Cross-horizon substitute | A D+1 or D+N forecast you submitted earlier covers t. |
| — | Empty | None of the above → t gets no score → the entire day gets no score. |
Live is your best forecast at that lead. Forward-fill catches the case where you skip a gate. Cross-horizon catches the case where you only committed to longer horizons. Anything still empty fails the day.
Scoring the timestamp¶
How do multiple forecasts for the same timestamp combine into a single score?
Once all K slots are filled, the timestamp’s score is the average of the K slot contributions. Using an equal-weighted average makes the score easy to interpret. It also encourages participants to submit the best forecast they can for every lead time, rather than focusing only on the easiest short-lead forecasts and skipping the rest.
This ensures the intraday challenge rewards a complete forecasting strategy across the full 24-hour horizon, not just performance on short-lead targets.
Daily score and qualification¶
How is your daily score computed, and what can prevent you from getting one?
Your daily score is the average of all timestamp scores for that day. Plain and simple. The only twist is that if any timestamp fails to get a score (i.e., at least one of its K slots is empty), the whole day fails qualification and gets a penalty score instead.
Qualification — all slots or nothing¶
Strict full coverage
Every applicable slot of every timestamp must be filled: by a live submission, by forward-fill, or by a cross-horizon (D+1 / D+N) substitute.
One empty applicable slot → the whole day fails → peer 75th-percentile penalty.
A failed day's score is replaced by the 75th-percentile of qualifying peers' scores for that day, and counts as a "missed day" for monthly league eligibility (see Rewards).
Cold-start / not-applicable slots¶
A slot is only required if an intraday session actually existed at its gate-closure hour. The set of valid gate hours is the resource's own intraday gate closures (rounded to the hour). A slot whose gate-closure hour had no intraday session is not-applicable — it is not a forecaster gap:
- it is exempt from the strict full-coverage check (it never needed filling);
- it is excluded from the per-timestamp mean (it skips the live / forward-fill / cross-horizon chain entirely — it is not cross-horizon-filled);
- a timestamp whose every slot is not-applicable is dropped from the day mean (not failed), and the day still needs at least 50% of its timestamps to survive.
This matters for the first operational intraday day: its early-morning timestamps have long-lead slots whose gates fall before the first-ever intraday session ever fired. Those slots are not-applicable (the session could not have existed), rather than missing. So the first day is scored on the slots and timestamps that could exist, instead of receiving a blanket penalty.
A real (session-backed) slot left uncovered still fails the day — the exemption applies only to gates that genuinely had no session. On normal days every gate has a session, so nothing is not-applicable and the strict full-coverage rule above applies unchanged.
Worked examples¶
Concrete scenarios with metric calculations.
All four scenarios look at the same target timestamp t = 12:00 D so the K = 24 slots are directly comparable.
Numbers inside the squares are illustrative Q50 contributions, squared residuals (aka contributions), where smaller is better.
Contributions naturally vary by lead time:
- shorter leads (right side of the strip, L=0) tend to be more accurate
- longer leads (left side, L=23) less so.
Forward-fill borrows the adjacent earlier gate's forecast, so its contribution lands very close to what a live submission would have given.
A D+1 substitute, by contrast, was issued ~24h ahead of the target, so its contributions are in a noticeably higher band.
Scenario 1 — Perfect coverage (live everywhere)¶
Setup. You submitted at every gate from 13:00 D−1 onwards. All 24 slots are filled by live intraday submissions.
The staircase below shows the same situation from the gate-timing perspective: one row per gate, ordered by real time. Every gate from 13:00 D−1 onwards has a live submission.
Result. Day qualifies with the best achievable per-timestamp score. Every slot is a live submission, nothing borrowed, nothing substituted. The gentle left-to-right gradient is expected: long-lead forecasts (L=23) are harder than short-lead ones (L=0).
Scenario 2 — A few skipped gates (forward-fill saves you)¶
Setup. Same as Scenario 1, but you skipped 2 gates, say 04:00 D (slot L=7) and 08:00 D (slot L=3).
Forward-fill borrows the most recent earlier intraday submission's value at t for those 2 slots, so they reuse a forecast that is one gate older than a live submission would have been.
The staircase below shows the same situation from the gate-timing perspective: one row per gate, ordered by real time. The 2 dashed bars are the skipped gates (L=7 and L=3); forward-fill borrows their slot value from the previous gate (one row up), so coverage is intact.
Result. Day qualifies. The dashed positions are gates you skipped. Predico auto-fills them with the value from the most recent earlier intraday submission (a 1-hour-older forecast for the same target t).
The borrowed contribution is essentially the same magnitude as a live one, so your per-timestamp score stays very close to Scenario 1.
Scenario 3 — Intraday only, no longer-horizon commitment¶
Setup. You submit at every intraday gate on D, but you never submit a D+1 (or longer) forecast for D. The 11 long-lead slots correspond to gates that fired on D−1, before you started submitting, and forward-fill can't reach across that day boundary, so those slots stay empty.
The staircase below shows the same situation from the gate-timing perspective: one row per gate, ordered by real time. The 11 long-lead slots come from gates that fired on D−1, before you started submitting; forward-fill can't reach across that day boundary, so they stay empty.
Result. Day fails qualification when any slot is empty, the per-timestamp score cannot be computed. Your daily score is replaced by the peer 75th-percentile, and the day counts as a missed day for monthly league eligibility.
Scenario 4 — Intraday + D+1 substitute¶
Setup. Same intraday submissions as Scenario 3, plus one D+1 forecast that covers target day D. The D+1 forecast substitutes into every slot you don't already cover with a live submission, including the 11 long-lead D−1 gates.
The staircase below shows the same situation from the gate-timing perspective: the 11 long-lead slots (from D−1 gates you didn't submit) are now backfilled by the D+1 forecast.
Result. Day qualifies, but the 11 long-lead slots carry the D+1 residual (~13–17 MW² vs. ~2–4 for live), dragging the per-timestamp score well above Scenario 1. You stay in the running for the monthly league, but probably with higher errors.