[{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"},{"content":"It\u0026rsquo;s the last Friday of the quarter. The OKRs are up on the screen, a number sitting next to each Key Result. 0.8. 0.6. One sad little 0.3 that everyone scrolls past a bit too quickly.\nThe room runs the ritual. The 0.8s get a nod. The 0.3 gets a wince and an action to \u0026ldquo;dig into what went wrong there.\u0026rdquo; Someone reminds everyone that 0.7 is the sweet spot, so really, we\u0026rsquo;re doing fine. Scores logged. Next quarter\u0026rsquo;s targets loaded. Everyone feels like they\u0026rsquo;ve been held to account.\nYou just spent an hour judging the quality of your decisions by the quality of your outcomes.\nThere\u0026rsquo;s a name for that. It isn\u0026rsquo;t accountability.\nResulting, for people who\u0026rsquo;ve never lost money at poker # Resulting is judging a decision by how it turned out. Annie Duke named it, and poker players learn it the hard way: if you grade every hand by whether you won the pot, you start praising the idiot who called all-in with rubbish and happened to hit, and bollocking the player who made the correct fold and watched the cards fall the other way.\nA good decision can have a bad outcome. A bad decision can have a good outcome. On the one hand, the result tells you almost nothing about the quality of the call.\nYou already know this. You\u0026rsquo;d never let someone say \u0026ldquo;I bet my mortgage on red, it came up red, therefore that was sound financial planning.\u0026rdquo; It\u0026rsquo;s obviously broken. The result was a coin flip; the decision was reckless either way.\nAnd then on the last Friday of the quarter, you put a 0.3 on the screen and call it a disappointment.\nBut a quarter isn\u0026rsquo;t one hand # Fair. And this is where the poker comparison hits its limit, so let me not paper over it.\nPoker has a clean break. You make the call, then you have no say in the cards. That gap is the whole reason resulting is a fallacy there: the decision and the outcome are genuinely separate events.\nA quarter has no such gap. It\u0026rsquo;s 90 days of continuous decisions, adjustments and execution, all of it feeding the result. The team that ran honest experiments and learned a pile made dozens of calls after the opening bet. So a low score can genuinely reflect bad execution, not only bad luck. Outcomes carry signal, just a noisy one.\nHere\u0026rsquo;s the twist: that makes the score worse to grade, not safer. Because the number is carrying three things at once. The quality of the bet. The quality of the doing. And the weather, the part nobody in the room controlled. A single figure between 0 and 1 can\u0026rsquo;t tell you which is which. Was that 0.4 a sharp, well-executed bet in a market that simply didn\u0026rsquo;t move? A sound bet fumbled in delivery? A weak bet that got lucky to reach 0.4 at all? The score blends all three and hands you the blend as a verdict.\nAnd you can\u0026rsquo;t lean on volume to wash the noise out, the way a poker player can. A quarter is three or four bets, each one different, each played once. The clock ran for 90 days; the result is still a single tangled resolution, not a sample you can trust.\nOver enough bets and enough quarters, a pattern does surface. A team that consistently places good bets wins more of them over the year. That\u0026rsquo;s an argument for watching the quality of your bets over time, not for reading this quarter\u0026rsquo;s number as a grade.\nWhat the number is actually measuring # Picture two Key Results.\nThe first was a real bet. The team looked at a genuinely uncertain market, picked a hard target, ran honest experiments, executed well, and landed at 0.4 because the world didn\u0026rsquo;t move the way anyone hoped. Good call, bad cards.\nThe second was sandbagged. The team quietly picked a target they were already 90% sure they\u0026rsquo;d hit, did the obvious work, and cruised to a 1.0. No risk taken. Nothing learned. A foregone conclusion wearing the costume of an ambitious goal.\nOn grading Friday, the 0.4 gets the wince and the 1.0 gets the nod.\nYou\u0026rsquo;ve just rewarded the worst decision in the room and punished the best one. And you\u0026rsquo;ve taught everyone watching exactly what to do next quarter.\nIt cuts both ways, and the upward cut is worse # Most people, when they finally clock resulting, worry about the unfair 0.3. The good bet that got marked down.\nThe dangerous one is the other direction. The 1.0 that was luck or sandbagging gets canonised. The approach that produced it becomes \u0026ldquo;what good looks like.\u0026rdquo; It goes in the deck. It gets copied to other teams. You scale the thing that happened to win, with no idea whether the decision behind it was any good.\nA team that grades outcomes drifts toward hittable targets. They get safer every quarter, because safe targets score well and ambitious bets score badly, and the spreadsheet doesn\u0026rsquo;t know the difference between a sandbag and a triumph. Goodhart\u0026rsquo;s law, on a quarterly timer.\nThe branding bit # Here\u0026rsquo;s why it survives when roadmap grading and velocity worship get laughed out of the room.\nThe score feels like rigour. A number between 0 and 1, logged against a target, reviewed on a cadence. It has the grammar of measurement. It looks like the opposite of hand-waving. Leaders reach for it because \u0026ldquo;we scored 0.68 against our objectives\u0026rdquo; sounds defensible in a way that \u0026ldquo;we placed some sensible bets and a few didn\u0026rsquo;t land\u0026rdquo; never does, even when the second sentence is the more honest one.\nSo resulting gets a dashboard, a quarterly ceremony, and a respectable name. Same bias a poker player would recognise in a heartbeat, with better branding.\nThe steelman, and the part that actually bites # The serious OKR people are ahead of me on some of this. The \u0026ldquo;0.7 is the sweet spot\u0026rdquo; convention exists precisely to punish sandbagging: hit 1.0, and you\u0026rsquo;re told you aimed too low. Better guidance has been said for years - grades are meant to start a conversation, not end one. So before anyone tells me I\u0026rsquo;m flogging a 2014 ritual: I know the good version exists.\nHold the good version in your head, though, and the real problem gets sharper, because now you have to pick your poison.\nA score has one honest virtue: it\u0026rsquo;s hard to fake after the fact. You hit the number, or you didn\u0026rsquo;t. What it invites instead is faking before the fact, which is what sandbagging is. You game the scoreboard by setting it to a low score. The 0.7 norm is a patch for that, and a leaky one, because a culture can learn to sandbag to 0.7 as happily as to 1.0.\nNow grade the bet instead of the result, which is what I\u0026rsquo;ve been pointing to. That kills the sandbag because a safe bet is, by definition, a bad bet. But it opens the opposite hole. Once the outcome is known, \u0026ldquo;it was a brave, well-reasoned bet\u0026rdquo; is the easiest story in the world to tell, and the most articulate person in the room tells it best. Grade bets carelessly and you\u0026rsquo;ve traded a scoreboard you can rig in advance for a story you can rewrite in hindsight.\nSo neither end is safe alone. The score resists the story and invites the sandbag. The bet resists the sandbag and invites the story. There\u0026rsquo;s only one thing that closes both holes, and it\u0026rsquo;s the same move either way. You write it down first.\nGrade what you wrote down before you knew # A score grades the destination. The road you took, the only part you get to keep, goes unmarked. So grade the road, under one rule that keeps it honest: you don\u0026rsquo;t get to invent the standard after you\u0026rsquo;ve seen the result.\nWhich means the bet has to be written down before the quarter, in a form that can embarrass you later.\nThe bet, and what would prove it wrong. \u0026ldquo;Improve retention\u0026rdquo; won\u0026rsquo;t do it. The sentence has to have teeth: \u0026ldquo;We believe personalised re-engagement lifts 30-day retention; if it hasn\u0026rsquo;t moved 5 points by quarter end, the bet is dead, and the money moves.\u0026rdquo; Now, \u0026ldquo;was it a good bet\u0026rdquo; has an anchor that isn\u0026rsquo;t a story. You\u0026rsquo;re holding the call to a standard you set when you didn\u0026rsquo;t even know the answer. That\u0026rsquo;s the thing poker has, and OKR grading throws away: the odds were knowable before the cards turned.\nThe learning, measured against what you said you\u0026rsquo;d learn. \u0026ldquo;We learned loads\u0026rdquo; doesn\u0026rsquo;t count, because everyone learned loads. What did you believe in week one that you don\u0026rsquo;t believe now, and did you kill the bets that earned killing? A hypothesis you wrote down and can hold up against reality is hard to fake. The free-floating \u0026ldquo;we did our best\u0026rdquo; is the easiest fake there is.\nAnd yes, this is gameable too. Put \u0026ldquo;bets killed\u0026rdquo; on the wall as the number to beat, and teams will manufacture theatrical kills and perform their learning, the same sandbag in a new costume. Goodhart doesn\u0026rsquo;t spare my metric just because I prefer it. A tighter measure won\u0026rsquo;t save you. The guard is that you\u0026rsquo;re checking pre-committed reasoning against what actually happened, out loud, and a bet you timestamped in week one is much harder to narrate your way out of than a result you\u0026rsquo;re explaining after the fact. The honesty comes from the timestamp.\nOne honest limit, because it\u0026rsquo;s a different fight. Grading the bet well tells you whether the call was sound. It doesn\u0026rsquo;t widen the motorway. A good bet placed on a system that can\u0026rsquo;t carry it still stalls, and no grading ritual fixes that, which is the argument in You Don\u0026rsquo;t Rise to the Level of Your Goals, You Flow to the Level of Your Systems. This piece is about judging the call. That one\u0026rsquo;s about building the road.\nWhy we won\u0026rsquo;t, and the small thing that helps # The honest reason teams keep grading outcomes is that the number is easy, and bet quality is hard to defend upward. \u0026ldquo;We killed three bets and learned our retention assumption was wrong\u0026rdquo; doesn\u0026rsquo;t fit in a cell. A score does. It\u0026rsquo;s comforting precisely because it hides everything that matters.\nSo here\u0026rsquo;s the small move, and it costs one sentence per Key Result. At the start of next quarter, before any work begins, write the kill criteria next to each one: \u0026ldquo;this bet is dead if ___ hasn\u0026rsquo;t happened by ___.\u0026rdquo; That\u0026rsquo;s it.\nAnnie Duke calls these kill criteria. Her argument in Quit is that you have to set them while you can still think straight, because in the moment, you\u0026rsquo;ll always find a reason to push on. Sunk cost doesn\u0026rsquo;t show up on day one.\nDo that, and grading Friday changes character. You stop inventing a story to fit a number. You hold this quarter\u0026rsquo;s result against a standard your past self set before anyone knew how it would land. The sandbagged bet has no kill criteria worth the name, and it shows. The brave one that missed has people reading what you wrote in week one and arguing about the call, which is the conversation you actually wanted.\nThe scoreboard was never the point. The quality of the bets was. One of those you can improve. The other one you just read out.\nWhen did your team last call a low score a good decision, out loud, in the room? That\u0026rsquo;s the tell. If the answer is never, you\u0026rsquo;re not grading performance. You\u0026rsquo;re resulting.\n","date":"14 June 2026","externalUrl":null,"permalink":"/writing/grading-okrs-is-resulting/","section":"Writing","summary":"","title":"Grading OKRs is just resulting with better branding","type":"writing"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/leadership/","section":"Tags","summary":"","title":"Leadership","type":"tags"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/okrs/","section":"Tags","summary":"","title":"OKRs","type":"tags"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/","section":"Paul Brown · Building capability, not dependency · Product \u0026 Flow Practitioner","summary":"","title":"Paul Brown · Building capability, not dependency · Product \u0026 Flow Practitioner","type":"page"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/product/","section":"Tags","summary":"","title":"Product","type":"tags"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/categories/product-practice/","section":"Categories","summary":"","title":"Product Practice","type":"categories"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"Pieces on probabilistic forecasting, flow metrics, Kanban as strategy, and the conversations that make grown-up uncertainty stick in delivery organisations. Cross-posted on Medium under Thrivve Partners and paulisthrivving.\n","date":"14 June 2026","externalUrl":null,"permalink":"/writing/","section":"Writing","summary":"","title":"Writing","type":"writing"},{"content":"","date":"7 June 2026","externalUrl":null,"permalink":"/tags/change/","section":"Tags","summary":"","title":"Change","type":"tags"},{"content":"","date":"7 June 2026","externalUrl":null,"permalink":"/tags/flow/","section":"Tags","summary":"","title":"Flow","type":"tags"},{"content":"","date":"7 June 2026","externalUrl":null,"permalink":"/categories/flow-and-delivery/","section":"Categories","summary":"","title":"Flow and Delivery","type":"categories"},{"content":"","date":"7 June 2026","externalUrl":null,"permalink":"/tags/rituals/","section":"Tags","summary":"","title":"Rituals","type":"tags"},{"content":"Two teams kill story points in the same week. Both have the same flow data sitting in their work tracker, an afternoon away from a cycle-time scatterplot and a forecast. Neither is short of an instrument.\nThe first one falls apart anyway. The scatterplot is right there, and the team doesn\u0026rsquo;t believe it. They\u0026rsquo;ve trusted the velocity number for two years, and they trust this \u0026ldquo;new\u0026rdquo; honest one for about a day. Anxiety floods the gap between losing the old reassurance and earning the new one; someone quietly starts keeping the old number on a private spreadsheet for comfort, and within a month, leadership has decided this flow thing doesn\u0026rsquo;t work.\nThe second one exhales. Same scatterplot, same forecast, but this team had been watching those signals predict reality for a couple of months before anyone touched estimation. The fake number had been masking a slow rot in their cycle times, and once it went away, they finally had to look at how long things actually took. They don\u0026rsquo;t like what they see, and that\u0026rsquo;s the first useful Tuesday they\u0026rsquo;ve had in months.\nSame move, same tooling, but opposite outcomes. The difference came down to whether the team trusted the new number yet, and that\u0026rsquo;s the part the standard change-management advice fumbles.\nThe advice that\u0026rsquo;s half right # It goes like this - a ritual does two jobs: the operational one written on the tin, and an emotional one nobody wrote down. Story points produce a velocity number (operational) and the reassurance that \u0026ldquo;we\u0026rsquo;re okay\u0026rdquo; (emotional). When you bin the ritual, you bin both. So build the replacement before you cut the old thing loose, run the new signals in parallel, let confidence in the new thing outgrow reliance on the old thing, and trade up slowly.\nMost of that is right. But the word \u0026ldquo;build\u0026rdquo; hides the whole problem, because it makes the replacement sound like construction work that takes weeks. The metric isn\u0026rsquo;t construction work. The events are already in your tracker. Point a tool at them, and you have cycle time, throughput, ageing and a forecast before the kettle boils. The instrument is never the thing that takes the time.\nSo when a transition stalls on \u0026ldquo;we\u0026rsquo;re not ready, we need to build the metrics first,\u0026rdquo; be suspicious. The metrics are an afternoon away, and everyone in the room knows it. \u0026ldquo;We need to build the metrics first\u0026rdquo; is usually not a tooling problem. It\u0026rsquo;s a comfort: a procedural delay that lets the team put off the actual hard part, which is giving up a reassuring fake number for an honest one they don\u0026rsquo;t trust yet. The tooling excuse is itself an anaesthetic.\nStrip away the excuse, and you can see what the parallel-running period was ever for: earning the team\u0026rsquo;s belief in a tool that already exists. Which means the only thing you are ever really replacing is the emotional job. And here the standard advice quietly assumes something false: that every emotional job is worth replacing.\nIt\u0026rsquo;s good advice for half of them. For the other half, it\u0026rsquo;s a recipe for re-numbing a team that needed to wake up. Some comforting rituals are holding the building up. Some are taping the team\u0026rsquo;s eyes shut, and you do not treat those the same way.\nTwo kinds of comfort # Scaffolding comfort holds real function up. Remove it, and the team loses the ability to do something genuine: coordinate with stakeholders, set expectations, and make a decision. The velocity number was fake, but it was the only language the team and the business shared for \u0026ldquo;when.\u0026rdquo; Pull it before the team trusts the replacement and a real capability goes with it. The wobble is real loss.\nAnaesthetic comfort tapes the team\u0026rsquo;s eyes shut. Remove it, and the team loses nothing real. What they lose is the ability to look away from something they should have been looking at for months. The green dashboard that let everyone ignore creeping cycle times. The \u0026ldquo;Done\u0026rdquo; column that delivered a hit of completion while outcomes quietly flatlined. The average time that read healthy because it buried the long tail where the actual suffering lived.\nHere is the test in a single question: When you take the comfort away, does the team lose the ability to do something, or only the ability to not see something?\nLose the ability to do something: scaffolding. Replace it first.\nLose the ability to look away: anaesthetic. The discomfort is the intervention. Replace it too smoothly, and you have just installed a fresh anaesthetic with better branding.\nThe prescriptions invert # For scaffolding, the gentle-transition playbook is exactly right, and this is where the replacement mapping earns its place. The team needs each job that the old ritual was quietly doing, handed over before you take the old one away. The signals below take an afternoon to stand up. What takes time is the team coming to trust them.\nA new sense of certainty: \u0026ldquo;70% of items like this finish in 5 days or less, based on the last 90 days of work we shipped.\u0026rdquo; A forecast does the reassurance job that velocity was faking, without the fake precision. A new sense of progress: cycle time reducing, throughput holding steady, ageing items being dealt with instead of rotting in column three. A signal that moves so they can feel movement. A new sense of control: WIP limits they set, swarming the stuck item instead of starting a new one, and a five-minute flow check-in that catches problems while they\u0026rsquo;re small. Stand those up, then leave them running where the team can watch them predict reality, week after week, while the old number is still there to fall back on. The parallel period isn\u0026rsquo;t construction time. It\u0026rsquo;s the time it takes for the new signal to become believable. Only when confidence in it has outgrown reliance on the old number do you let go of the old comfort. The instrument is quick. The trust is slow. Don\u0026rsquo;t confuse the two.\nFor anaesthetic, you do close to the opposite, and it\u0026rsquo;s worth being precise about what that does and doesn\u0026rsquo;t mean. It does not mean pushing the team off a ledge. A ledge is removing scaffolding with nothing underneath and no warning: a real capability gone, no agency, the team finding out by falling. That damages trust, and it should. You never engineer that.\nRemoving an anaesthetic isn\u0026rsquo;t a ledge, because no capability moves. The team can still coordinate, forecast, and decide. What changes is that a truth the ritual was muffling, the cycle-time scatter they were avoiding, the ageing chart, the long tail the average was hiding, is now visible instead of hidden. The floor is still there. They\u0026rsquo;re looking at an uncomfortable number, and the discomfort is the data doing its job. That\u0026rsquo;s discomfort they can act on, not fear imposed on them.\nSo the move is restraint, not cruelty: stop hiding the truth, keep every real capability intact, stand next to the team and name what they\u0026rsquo;re looking at, and then don\u0026rsquo;t rush to make them comfortable about it. The mistake is reaching for a soothing replacement the moment the room goes quiet. Do that, and you\u0026rsquo;ve spent effort making them comfortable about a problem you needed them to be alert to. The thing you put in place of an anaesthetic is a true signal they can respond to.\nThe trust argument cuts the other way here, and this is the part that so often gets missed. The breach of trust isn\u0026rsquo;t removing the anaesthetic - it\u0026rsquo;s the anaesthetic itself. A team kept calm by a green dashboard while delivery quietly degrades has been misled, usually by people who meant well, and when it surfaces, and it always surfaces, the wound is \u0026ldquo;why did everyone act like this was fine?\u0026rdquo; Letting the true number show treats the team as adults. Maintaining the comfortable fiction is what actually costs you their trust.\nThis is the part no \u0026ldquo;manage the transition gently\u0026rdquo; post will tell you, because the entire genre treats all comfort as worth preserving. Chesterton\u0026rsquo;s fence says don\u0026rsquo;t remove what you don\u0026rsquo;t understand the purpose of. Fair. But some fences are blindfolds, and \u0026ldquo;I don\u0026rsquo;t understand why this is here\u0026rdquo; is sometimes the correct prelude to taking it down rather than a reason to leave it standing.\nMost rituals are both at once, and that\u0026rsquo;s the real skill # Velocity is the awkward case, because it does both jobs at the same time. It is scaffolding for stakeholder coordination and anaesthetic for the team\u0026rsquo;s avoidance of cycle-time truth. Treat it as purely one or the other, and you get it wrong in opposite directions: replace the whole thing smoothly and you re-numb the team, rip the whole thing out, and you break the conversation with the business.\nThe honest move is to split the jobs and handle each on its own terms. Replace the scaffolding job, give stakeholders a real probabilistic forecast they can plan against, while deliberately not replacing the anaesthetic job. Let the team sit with what their actual flow data says. You are putting the load-bearing piece back and letting the blindfold drop in the same week, and those are two different decisions wearing one ritual\u0026rsquo;s clothes.\nThat\u0026rsquo;s the work: figuring out which emotional jobs deserve to be replaced and which deserve to be removed, because they live within the same ritual and point in opposite directions.\nFour questions before you kill a ritual # When this goes, does the team lose the ability to do something, or only the ability to not see something? Which is it doing more of right now: holding real function up, or taping their eyes shut? If it\u0026rsquo;s scaffolding, what signal does the same emotional job honestly, and is it standing yet? If it\u0026rsquo;s anaesthetic, what truth has it been muffling, and am I willing to let that truth show rather than reach for something soothing to replace it? Answer one and two, and you know which playbook you\u0026rsquo;re in. Most failed migrations are just the wrong playbook: a team running the scaffolding playbook on an anaesthetic, or the anaesthetic playbook on scaffolding, then surprised when the building either sways or stays comfortably asleep. The new system was rarely the problem.\nIf you\u0026rsquo;ve taken a comforting ritual away from a team and watched them either wobble or wake up, I\u0026rsquo;d love to know which it was, and whether you could tell in advance. That\u0026rsquo;s the call I still get wrong.\n","date":"7 June 2026","externalUrl":null,"permalink":"/writing/some-rituals-are-scaffolding-some-are-anaesthetic/","section":"Writing","summary":"","title":"Some rituals are scaffolding. Some are anaesthetic.","type":"writing"},{"content":"","date":"10 May 2026","externalUrl":null,"permalink":"/tags/forecasting/","section":"Tags","summary":"","title":"Forecasting","type":"tags"},{"content":"","date":"10 May 2026","externalUrl":null,"permalink":"/tags/kanban/","section":"Tags","summary":"","title":"Kanban","type":"tags"},{"content":"The team forecasted 85% confidence by the end of Q2.\nIt shipped mid-Q3.\nIn the post-mortem, someone leaned back, folded their arms, and said the line that quietly kills every probabilistic practice I\u0026rsquo;ve ever helped a team adopt:\n\u0026ldquo;So this forecasting stuff doesn\u0026rsquo;t really work for us, does it?\u0026rdquo;\nHere\u0026rsquo;s the uncomfortable truth: the forecast did its job. The conversation around it didn\u0026rsquo;t.\nThe setup that keeps repeating # You\u0026rsquo;ve seen this play out. Maybe you\u0026rsquo;re living it right now.\nA team gets serious about flow. They stop pretending story points were ever a forecasting tool. They start using real data, ranges, and confidence levels. They forecast at 85%, the conservative standard most practitioners settle on, because it gives leadership something they can plan around without anyone pretending the future is a spreadsheet or a crystal ball.\nThe first few forecasts land inside the range. Confidence in the practice grows. People stop snorting when you say \u0026ldquo;probabilistic.\u0026rdquo;\nThen one misses. Not by a little. By weeks.\nAnd in the room afterwards, the language quietly reverts. \u0026ldquo;Can we just commit to a date?\u0026rdquo; \u0026ldquo;This isn\u0026rsquo;t giving us what we need.\u0026rdquo; \u0026ldquo;Maybe Monte Carlo is fine for big teams, but\u0026hellip;\u0026rdquo;\nSix months of cultural work, gone in one sentence.\nHere\u0026rsquo;s what actually happened # An 85% forecast that misses is the 15% showing up on schedule.\nThat\u0026rsquo;s the model working, not breaking.\nForecast a hundred things at 85%, and you should miss roughly fifteen of them. The whole point of saying \u0026ldquo;85%\u0026rdquo; out loud was to acknowledge that the other 15% was real. You priced it in. You named it. You agreed to live with it. So did they (or so you thought).\nAnd then the 15% happened, and everyone forgot they\u0026rsquo;d agreed.\nThe room hears \u0026ldquo;85%\u0026rdquo; and, silently and instantly, translates it into \u0026ldquo;definitely, with paperwork.\u0026rdquo; You smuggled a contract into a probability. The contract broke. The probability did exactly what it said it would.\nThe bias that does the damage # There\u0026rsquo;s a name for what\u0026rsquo;s happening in that post-mortem.\nAnnie Duke, the former professional poker player turned decision-making writer, calls it resulting: judging the quality of a decision by the quality of the outcome. They look like the same thing. They aren\u0026rsquo;t.\nA good forecast can have a bad outcome. A bad forecast can have a good outcome. The two are loosely correlated and never identical. Poker players learn this the hard way, because if you judge every hand by whether you won the pot, you\u0026rsquo;ll start playing terrible cards that happened to win and folding strong hands that happened to lose. Judge the decision. The result is downstream noise.\nMost delivery rooms cannot hold this distinction without scaffolding.\nWatch the next time something misses. See how quickly the conversation collapses the model into the result. \u0026ldquo;It missed, so it must have been wrong.\u0026rdquo; It\u0026rsquo;s the same logic as \u0026ldquo;I bet on red, it came up black, therefore betting was a stupid idea.\u0026rdquo;\nYou wouldn\u0026rsquo;t accept that reasoning at a poker table. Don\u0026rsquo;t accept it in a delivery review.\nThe four things that could have gone wrong # When a forecast misses, the room jumps straight to \u0026ldquo;the model was wrong.\u0026rdquo; But there are at least four things in play here, and a single data point can\u0026rsquo;t tell them apart:\nThe 15% showed up. The model was fine. The unlikely happened. This is the cause people most want to avoid naming, because there\u0026rsquo;s nothing to fix and nothing (no one?) to blame. It also can\u0026rsquo;t be proven from a single miss, which is part of what makes it uncomfortable to sit with.\nYour system isn\u0026rsquo;t stable. Cycle times are spread so wide that any forecast becomes a lottery. The model is downstream of a sick system. The forecast is doing its best with the data the system gave it.\nSomething material changed. New tech, new team, new scope, new dependency. The past stopped predicting the future partway through. The forecast was made with information that later, quietly went out of date.\nThe model itself was actually wrong. Wrong sampling window, wrong assumptions, wrong base data. This happens, and it\u0026rsquo;s worth ruling out, but it\u0026rsquo;s also the cause people reach for first because it\u0026rsquo;s the most actionable. A miss looks like a maths problem long before it looks like an assumption problem.\nA single miss can\u0026rsquo;t tell you which of the four fired. That\u0026rsquo;s not a flaw in the list; it\u0026rsquo;s a property of probability. You need many forecasts and tracked outcomes to make frequency claims about any of these, which is why this post is about how to respond to a miss with discipline rather than how to assign it a cause from one data point.\nThree places to look in the conversation # Three places worth examining alongside the model:\nTranslation. Nobody in the room knew what 85% felt like in their lives. (Hint: it\u0026rsquo;s roughly a miss one in seven.) Without a felt reference, percentages are decoration. Try this in your next forecast meeting: \u0026ldquo;If we ran this same thing seven times, we\u0026rsquo;d expect to miss it once. We\u0026rsquo;re betting we\u0026rsquo;re not in the one.\u0026rdquo; Watch the energy in the room change.\nPre-commitment. Nobody said \u0026ldquo;if this misses, here\u0026rsquo;s what we\u0026rsquo;ll do\u0026rdquo; before the forecast was made. So when it missed, the only available script was blame. The fix is small and embarrassing in its obviousness: agree on the response before you need it.\n\u0026ldquo;If the re-forecast pushes past the date by more than a week (or 5%, or whatever band we\u0026rsquo;ve agreed), we go back to stakeholders the same day, show them the new range, and open the trade-off conversation: move the date, cut scope, or accept the risk.\u0026rdquo;\nNow, a miss has a script.\nTwo things matter about that trigger. First, the trigger is the next forecast itself. Gut feel doesn\u0026rsquo;t count. \u0026ldquo;It feels like we\u0026rsquo;re slipping\u0026rdquo; doesn\u0026rsquo;t fire it. The data does. Which only works if you\u0026rsquo;re actually doing the next forecast.\nSecond, the trigger has a tolerance band baked in. Monte Carlo is noisy, especially week-to-week, and a forecast that wobbles a day or two in either direction is what I call Monte Carlo Hula Hula, the model swinging its hips a bit, not the system actually changing. Agree on the band up front, and you stop dragging stakeholders into renegotiations over the wobble. The trigger fires when the drift is real, not when the model is just \u0026ldquo;dancing\u0026rdquo;.\nCadence. Most teams forecast at the start and reopen the conversation only when something has already gone wrong. Don\u0026rsquo;t mistake forecasting for prophecy. Real forecasting is a rolling instrument. Re-forecast often, and re-forecast cheaply. Do it whenever new information arrives. Most of the pain in the post-miss conversation comes from the fact that nobody re-forecasted in the four weeks before the miss became inevitable.\nTighten the range # Here\u0026rsquo;s the practitioner move teams almost always miss.\nWhen something misses, the instinct is \u0026ldquo;let\u0026rsquo;s forecast at 95% next time, that\u0026rsquo;ll fix it.\u0026rdquo; It won\u0026rsquo;t. All you\u0026rsquo;ve done is widened the range so the upper bound sits even further out, and bought one more round of \u0026ldquo;definitely, with paperwork\u0026rdquo; before the next 5% shows up.\nThe mature move: keep the confidence level the same, and make the range tighter.\nA tighter range comes from a more stable system. Less variation in cycle time. Tighter bounds on work item age. Fewer surprises when halfway home. Less work sitting in queues. The model is downstream of all of this. You fix forecasts by fixing the system the forecast reads from.\nIf your forecast range is \u0026ldquo;between three weeks and four months,\u0026rdquo; look at why your cycle times are spread across a continent. A fancier algorithm won\u0026rsquo;t help.\nAnd here\u0026rsquo;s the practical question that always shows up next: \u0026ldquo;How do we know when our throughput has actually changed versus when it\u0026rsquo;s just doing what throughput does?\u0026rdquo; This is where XmR charts (also called process behaviour charts) earn their keep. Plot your daily throughput, calculate the natural process limits from the moving range, and you get a picture of what routine variation looks like for your team. Anything inside the limits is the system breathing. Anything that breaks through the limits or shows a run of points drifting in one direction is a signal worth investigating.\nIt\u0026rsquo;s the same discipline as the tolerance band on the re-forecast trigger, applied one layer deeper. With it, you can finally tell the difference between \u0026ldquo;our system changed\u0026rdquo; and \u0026ldquo;our system is fine, we just had a quiet week.\u0026rdquo;\nChart: ValueFlow\nThe post-miss conversation: a recipe # When the next forecast misses (and it will, because that\u0026rsquo;s what probabilistic means), try this:\nOpen with the distribution. \u0026ldquo;We agreed there was a 1-in-7 chance of missing. That\u0026rsquo;s the world we said we\u0026rsquo;d accept.\u0026rdquo; Get the language of probability into the room before the language of blame arrives. Separate decision quality from outcome quality. Name the resulting trap explicitly. \u0026ldquo;A miss inside the range we agreed to means the range was honest.\u0026rdquo; Audit the conversation. What did people hear when you said 85%? What do they need to hear next time? Ask whether the system was stable. Stability is whether arrival and departure rates are roughly in balance, and whether the variation you\u0026rsquo;re seeing is common cause (the system breathing) or special cause (something actually changed). Put your throughput on an XmR chart. Pull up work item age. Look at the cycle time spread. If arrivals are outpacing departures, your queues are quietly filling up, and every future forecast will drift right. If the XmR shows a signal you missed, the forecast wasn\u0026rsquo;t reading from the system you thought it was. If those charts are a mess, you have a flow problem wearing a forecasting costume. Pre-commit before the next forecast. Agree on what happens on a miss before the miss happens. If you do nothing else, do the first one. It rewires the room.\nWhat this is really about # The team that survives the first miss is the team that learns probabilistic literacy lives at the leadership layer.\nThe room ran out of tolerance for uncertainty before the forecast did.\nThat\u0026rsquo;s a leadership problem dressed up as a forecasting problem. Because the team can do everything right (run good simulations, present honest ranges, name confidence levels) and one leader saying \u0026ldquo;just give me a date\u0026rdquo; in the wrong meeting will untrain the lot of them in thirty seconds.\nCalibrated confidence is grown-up thinking. Grown-ups sometimes miss too.\nThe practice survives if the conversation does its job.\nBringing it home # Pull up your last missed forecast and ask three honest questions:\nWas the miss inside the confidence range we agreed to? Did anyone in the room translate \u0026ldquo;85%\u0026rdquo; as \u0026ldquo;definitely\u0026rdquo;? What did we change after the miss: the model, or the conversation? If you changed the model, you fixed the wrong thing.\nThe forecast was probably right. The conversation was wrong.\nStop tuning the maths. Start tuning the room.\n","date":"10 May 2026","externalUrl":null,"permalink":"/writing/forecast-was-right-conversation-wrong/","section":"Writing","summary":"","title":"The forecast was right. The conversation was wrong.","type":"writing"},{"content":"For years, I thought my job as a trainer and coach was to change teams.\nThen a pharma conference taught me I was wrong.\nI was speaking at an internal event for a large pharma company. Two hundred people in the room: product managers, delivery leads, engineers, a scattering of senior leaders. My slot was lean experimentation. How small, safe-to-fail experiments bridge product and delivery. How learning happens through both success and failure. How the teams that thrive are the ones where people can say “that didn’t work, here’s what we learned” without anyone’s career ending.\nIt had gone well. Good questions. People stayed behind during the coffee break to discuss how they’d apply it. You can feel it when a room is leaning in.\nThen the head of Product took the stage to talk about building a 12-month roadmap. Halfway through, his slide went up:\n“We want innovation, but we will not tolerate failure.”\nI was dying quietly in the audience.\nBecause in one sentence, he’d just untrained everyone. And he had no idea he’d done it.\nThe Untraining Moment # Here’s what I’ve come to call that instant: The Untraining Moment.\nIt’s the specific moment when a leader, usually without realising it, cancels out what their team was just taught. Sometimes it’s a slide. Sometimes it’s a passing comment in a roadmap review. Sometimes it’s who gets promoted, or who gets quietly sidelined after a project didn’t land.\nThe training ends. The Untraining begins. And the Untraining always wins, because the leader is the louder signal in the room — every day, in every meeting, in every decision. A trainer speaks to a team for a day. A leader speaks to them for the next twelve months.\nThis is the thing I had to learn the hard way: leaders don’t sponsor training. They ARE training, continuously, whether they know it or not.\nEvery reward is a lesson. Every punishment is a lesson. Every slide, every aside, every eye-roll at the word “experiment” is a lesson. Your team is always being trained. The only question is by whom.\nWhat the pharma room actually learned # Watch what happens in real time when “we will not tolerate failure” lands in a room that’s just spent the morning being taught to experiment.\nThe product managers do the maths. If failure isn’t tolerated, you don’t run experiments, you run launches. You don’t validate with customers but build business cases that get approved. You don’t kill features, you polish them until they look inevitable. The job stops being “find what’s true” and becomes “defend what’s committed.”\nThe delivery leads do a different calculation. If failure isn’t tolerated upstream, everything that reaches them is non-negotiable. Flow, right-sizing, WIP limits; those are luxuries you earn when leadership is prepared to hear that something’s going to take longer, or isn’t worth doing at all. They’d just been told it isn’t, so the backlog stays stuffed, the deadlines stay fixed, and the team burns out shipping stuff nobody validated.\nThe engineers do the oldest calculation in the book. Keep their heads down. Don’t volunteer anything risky. Don’t say the roadmap is fiction, even when they know it is.\nEverything I’d taught that morning required one thing: safety to say “this didn’t work.” The head of Product had just removed it in thirty seconds.\nHe wasn’t a fool. He thought he was being decisive. He thought “we will not tolerate failure” was leadership, a high standard, a commitment to quality, and a reassurance to his own bosses that the roadmap would land. If you’d asked him whether he supported experimentation and learning, he’d have said yes, of course. He’d probably have said he loved the morning’s talk, but I never asked.\nThat’s the thing about Untraining. Leaders almost never know they’re doing it.\nWhy this breaks product AND delivery # What makes the Untraining Moment especially brutal is that it doesn’t just undo one team’s training. It triggers a schism between them.\nTrain product to think in outcomes whilst delivery is still measured on output, and you haven’t built friction, you’ve drawn a border. Two tribes, same company, opposing scoreboards. Product wants to pause and learn. Delivery needs to ship to hit their numbers. Stand-ups become negotiations. Retros become hostage exchanges.\nTrain delivery to manage flow whilst product keeps stuffing the pipeline with unvalidated bets, and you’ve built a beautifully efficient machine for shipping the wrong things faster. The team doesn’t break. It just stops caring whether the work was worth doing.\nAnd if you train both tribes to experiment, learn, and fail safely — then stand up and say “we will not tolerate failure”, you’ve just told them their new shared language is career-limiting. The Untraining Moment doesn’t just cost you the training. It costs you the truce.\nThis is why I always insist on training product and delivery together now. Not because it’s tidier, but because when the Untraining Moment lands, both tribes need to recognise it. Otherwise, each of them thinks the other one broke the peace.\nHow to spot your own Untraining Moments # The uncomfortable work for leaders isn’t sponsoring training. It’s noticing when you’re undoing it.\nWatch for:\nThe “but” after the encouragement. “I love that we’re experimenting, but make sure it doesn’t delay the launch.” The team only hears what comes after the but. The hero you just promoted. Did they get there by delivering outcomes and learning from failures? Or by shipping on time and never admitting the bits that didn’t work? Every promotion is a syllabus. The meeting tone-shift. When a team says “we tried that, and it didn’t work”, does the room lean in with curiosity or move on faster? Your face is a slide. The word “innovation” on the same page as “no failure.” If you’ve ever put this on a slide, in a town hall, or in a performance review, you’ve had an Untraining Moment. You just haven’t had anyone brave enough to point it out. The silence after bad news. If no one brings you bad news early, it’s not because there isn’t any. Silence is a curriculum, and yours is being taught daily. None of these are dramatic, and that’s what makes them dangerous. The Untraining Moment is almost never a speech. It’s a look, a slide, a promotion, a joke, a silence. Small signals, repeated daily, louder than any trainer. Your move # If you’re commissioning training for your product and delivery teams, the most useful thing you can do isn’t to pick the right trainer, but to audit yourself for Untraining Moments before, during, and especially after the training lands.\nThree questions to sit with:\nWhat does a good failure look like in this organisation? If you can’t answer, your team can’t either, which means every failure is about to be a bad one. Where do we actively make it safe to say “we were wrong”? Not “where do we tolerate it.” Where do we make it safe. If the answer is “nowhere really,” psychological safety for everything else falls with it. Are product and delivery measured on the same things, or on opposing things? If product is on outcomes and delivery is on throughput, you don’t have a product-and-delivery organisation. You have two tribes in a cold war, and no amount of training will end it. My advice is to read your own slides the morning after training ends. Read them as someone who spent that day being taught to experiment, to learn, to bring two tribes together around the truth rather than the plan. Ask yourself if anything you’re about to say would make that person give up.\nBecause if it would, you’re not leading innovation, you’re leading the Untraining.\n","date":"18 April 2026","externalUrl":null,"permalink":"/writing/the-untraining-moment/","section":"Writing","summary":"","title":"The untraining moment: why your training never survives Monday","type":"writing"},{"content":"She was one of the best delivery managers I’d worked with; both consistent and curious in equal measure. She’d taken the flow check-in and made it her own; not a ritual she performed for my benefit, but a genuine daily conversation she had with her team about their system. In the coaching sessions, I used to score the quality of the check-in from 1 to 10, and she consistently scored 7s and 8s. The team were reading the board rather than reporting tickets, and the oldest items led the conversation. The expedites got named and challenged.\nThen I stepped back, as you’re supposed to.\nFive weeks later, I rejoined a check-in one morning on her request, and the board told a different story. The 30-day SLE had gone south by 40%. The check-in had quietly stopped happening. And two weeks after that, she handed in her notice.\nShe told me, “I can’t hold back the swell or steer the ship.”\nThat sentence has stayed with me.\nWhat happened in those five weeks # It wasn’t a failure of skill or commitment, but a pressure event. A sustained period of high demand, with expedited requests piling in from multiple directions, and stakeholders wanting things now if not sooner, the kind of organisational noise that makes everything feel urgent and nothing feel manageable.\nAnd when everything feels urgent, the first thing to get sacrificed is the practice that kept urgency in proportion.\nThe flow check-in stopped because there wasn’t enough time. Except there was, I have yet to meet a team who don’t have five minutes. What there wasn’t was a system robust enough to survive this kind of pressure. The practice had been built into the calm. It hadn’t been designed into the chaos.\nSo the check-in went. And with it went the only daily mechanism that surfaced how much work was actually in flight, how old the oldest items were, and how far beyond capacity the team had drifted. The expedites kept arriving. The WIP kept climbing. The SLE stretched. Nobody saw it happening, because the thing designed to make it visible had been the first casualty.\nThis is what drift looks like. Not a dramatic collapse. A quiet, reasonable, entirely understandable series of small decisions that each make sense in isolation and are catastrophic in aggregate.\nThe bit nobody writes about # There’s a standard critique of Kanban that I’ve heard many times. It’s cold. Mechanistic. Focused on tickets, flow, and metrics, not on the people doing the work. A process that treats humans as throughput variables.\nI want to push back on that hard.\nBecause what happened over those five weeks was not a flow problem but a human one that first showed up in the flow data. The SLE going south 40% isn’t an abstract metric. It’s a signal that a team was overwhelmed. More was being asked of them than the system could absorb. That someone, somewhere, was carrying a weight that the work items on the board were only partially describing.\nWIP control is not a productivity optimisation. It’s a protection mechanism.\nWhen you actively manage what’s in flight, when you name the expedites, challenge the urgency, make visible what’s actually being carried, you are doing something fundamentally humane. You are making it harder for demand to overwhelm people invisibly. You are creating the conditions for someone to say, “We are at capacity, and this new thing needs to wait.”\nThat conversation is only possible if the system is making the situation visible. Without it, the answer to every new request is “yes” because there’s no shared picture of what yes actually costs.\nShe couldn’t hold back the swell. But the swell shouldn’t have been invisible.\nThe flow check-in is a pressure relief valve. Design it like one. # The flow check-in, when it’s working, is part of what makes demand visible before it becomes damage. Five weeks of expedites with no check-in means five weeks of that demand landing without being named, without being challenged, without anyone asking: what are we stopping in order to start this?\nThe practice didn’t fail her. The practice stopped. And it stopped because it had been built as a habit, not designed as a valve.\nA habit depends on conditions staying roughly the same. A valve is specifically designed for the moments when they don’t. You don’t skip the pressure relief mechanism when things get hard. That’s precisely when it matters most.\nThis is why I’m pedantic about how the flow check-in is designed and who owns it. Not because of process correctness. Because a valve that only works when the delivery manager is in the room isn’t a valve, it’s a person. And people get overwhelmed. People hand in their notice. People, quite reasonably, are the first thing to buckle when the swell comes up.\nA valve that functions as protection has three properties:\nThe team owns it, not the delivery manager. I’ve written before about what it looks like when a team starts reading the system themselves, and what it takes to get there. If the check-in only happens because one person runs it, the practice is one absence away from stopping. The reason I work hard to transfer facilitation to the team and get to the point where anyone in the room can run a recognisable check-in is not about resilience for its own sake. It’s because a team that owns the practice has a collective pressure relief mechanism. A team that watches someone else run it has nothing to fall back on when that person is swamped. The artefacts are always visible, not summoned. If the Work Item Age chart only appears when the delivery manager opens it, the signal requires a person to exist. When it’s always on the board, and part of the shared environment the team lives in, the oldest item is visible to everyone, all the time. The data doesn’t wait for someone to notice. And crucially, when the pressure is highest, and the check-in feels most skippable, the board is still there. Still showing what’s ageing. Still making the swell visible, whether or not anyone has called a meeting about it. Every expedited arrival raises a cost question. What waits for this to move? Not as a bureaucratic gate, but as a shared understanding. When the cost of urgency is invisible, demand is limitless. When it’s named, the team has something to stand behind in the conversation with the stakeholder; I have written more about what unpriced urgency actually costs here. The valve releases pressure only if it can also indicate the source of the pressure. To be clear, none of this would have automatically saved her. Some swells are too strong, some organisations are too committed to the fiction that everything is urgent, for any system to absorb fully. But a check-in designed this way, i.e., team-owned, always visible, cost-aware, gives people something to point at that isn’t themselves. It turns “I’m overwhelmed” from a personal admission into a systems observation. And that is a profoundly different conversation to have. The question the SLE was trying to answer # A 30-day SLE stretch by 40% means work that was typically completed in 30 days now takes 42 days. That’s not just a number. It’s five weeks of accumulated pressure made measurable after the fact.\nThe thing about flow metrics is that they tell you the truth, even when nobody in the system is comfortable saying it out loud. The SLE doesn’t care about stakeholder politics, the urgency of the expedited request, or the fact that the delivery manager was doing her absolute best. It just shows what happened.\nAnd what happened was: the system was overwhelmed; the practice that was making it visible stopped; and by the time the signal was unmistakable, the person carrying it had already decided she couldn’t stay.\nThat’s what drift costs. Not always. But sometimes.\nThe goal of the flow check-in was never a better standup. It was a team that could read their own system, and a system that could protect them when things got hard.\nSimple isn’t easy. Sustaining the practice through pressure is the actual work. And that work starts not with a better check-in, but with a structure that doesn’t depend on calm conditions to survive.\nShe shouldn’t have had to hold back the swell alone. The system should have been making the swell visible.\n","date":"3 April 2026","externalUrl":null,"permalink":"/writing/practice-didnt-fail-system-did/","section":"Writing","summary":"","title":"The practice didn’t fail. The system did.","type":"writing"},{"content":"","date":"3 April 2026","externalUrl":null,"permalink":"/tags/wip/","section":"Tags","summary":"","title":"WIP","type":"tags"},{"content":"","externalUrl":null,"permalink":"/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"}]