Please report on your experience with the session in comments to this post. And, if you’re happy to, please post a link to your code.
One day the software development industry will be gown-up enough to talk about what we do in its own terms, but for now we have, for some reason, to use metaphors. And we love, for some reason†, transport related ones. A recent entry is the (Agile) Release Train. It’s a terrible metaphor. Here are some other mass–transportation metaphors for how you might organise the release schedule for your software development activities, ordered from less to more meritorious in respect of the figures of merit batch size, and the cost of delay if you miss one.
In these metaphors, people stand for features.
- Ocean liner—several thousands of people move slowly at great expense once in a while. Historically, this has been the status quo for scheduling releases.
- Wide–body airliner—several hundreds of people move quickly, a couple of times a day
- Train—several hundreds of people move at moderately high speeds several times a day
Some failure modes of the train metaphor…
Although the intent of a “release train” is that it leaves on time no matter what, and you can get on or off at any time until then, and if you miss this one, another will be a long in a little while, in practice we see attempts to either:
- cram more and more people onto the train in a desperate attempt to avoid having to wait for the next one, à la rush-hour in many large cities, or
- add more and more rolling stock to the train to avoid having to run another one
More generally, what a metaphor based on trains will mean to you may depend a lot on your personal experience of trains. For my colleagues in Zürich trains are frequent, swift, punctual, reliable, capacious & cheap. For my colleagues in London, not so much any of those things…
- Tram—several dozens of people move significantly faster than walking pace every few minutes
I like TDD. But if you reading this blog you probably knew that already.
“coding like a bastard” considered harmful
I can remember a time when I thought, because I’d been taught so and it seemed to make sense, that software—programs—were designed on paper, using ▭s and →s of various kinds and, in several of my earlier jobs, more than a few ∀s and ∃s and that the goodness of a design was determined by printing it out in a beautifully formatted document† and having an older, wiser, better—well, older and therefore presumably wiser, and more senior so presumably better, anyway—designer write comments and suggestions all over it in red pen during a series of grotesquely painful “review” meetings and then tying to fix it up until the older etc. designer was happy with it. After which came an activity know at the time as “coding like a bastard” after which came the agony of integration, after which came the dismaying emotional wasteland of the “test and debug” activity which took a duration essentially unbounded above, even in principle.
Hmm, now I come to write it down like that, it seems like that was a colossally idiotic way to proceed.
There were some guidelines about what made a good design. There were the design patterns. There were Parnas’s papers, such as On the Criteria… and there were all these textbook ideas about various kinds of coupling an cohesion and…ah, yes, the textbooks. Well, there was Pressman‘s, and there was Sommerville‘s and some more specialist volumes and some all–rans. When I returned to university after a fairly hair–raising time in my first job as a programmer, wanting to learn how to do this software thing properly, we used whatever edition of Somerville as current at the time—its now in its 9th—as our main textbook for the “software engineering” component: project management, planing, risk, that sort of thing.
So, it’s a bit…startling, we might say, to see Ian’s writeup of his experiment with TDD dealt with by Uncle Bob in quite such…robust, we might say, terms. Startling even for someone as…forthright, we might say, as I usually am myself.
As described, doing TDD wrongly
Thing is, though, Ian is, as described, doing TDD wrongly. And the disappointments that he reports with it are those commonly experienced by… by… by people who are very confident—rightly or wrongly and in Ian’s case, probably rightly—in their ability to design software well. I used to be very confident—perhaps wrongly, but I don’t think so—of my ability to design software. I mean to say, I could produce systems in C++ which worked at all—mid 1990s C++, at that—and this is no mean feat.
Interestingly, at the time I first heard about TDD by reading and then conversing with Kent and Ron and those guys on the wiki—C2, I mean, the wiki—I was already firmly convinced of the benefits of comprehensive automated unit testing, having been made to do that by a previous boss—who had himself learned it long before that—but of course we wrote the tests after we wrote the code, or, to be more honest about it, while debugging. And, yes, even with that experience behind me, I though that TDD sounded just crazy. Because it to someone used to the ▭s and →s and design as an activity that goes on away from a keyboard and especially to someone who does that well—or believes that they do—it does sound crazy.
And so a lot of the objections to TDD that Ian makes in his blog post seem early familiar to me. And not only because I’ve heard them often from others since I started embracing TDD.
Thorough, if unnecessarily harsh
Well, anyway, Bob’s critique of what Ian reports is pretty thorough, if unnecessarily harshly worded in places, but there are a few observations that I’d add.
Test-first or test-driven driven development (TDD) is an approach to software development where you write the tests before you write the program.
Apart from the fact that writing tests first is merely necessary, but very much not sufficient, for doing TDD, so far so good.
You write a program to pass the test, extend the test or add further tests and then extend the functionality of the program to pass these tests.
You build up a set of tests over time that you can run automatically every time you make program changes.
That does happen,yes…
The aim is to ensure that, at all times, the program is operational and passes the all tests.
Yep. I especially like the distinction between merely passing all the tests and also being operational.
You refactor your program periodically to improve its structure and make it easier to read and change.
and, sadly, this last sentence misses a key practice of TDD and largely invalidates what comes before. Which practice is that you refactor your code with maniacal determination up to as frequently as after every green bar.
Every. Green. Bar.
Technically, we could claim to be doing something “periodically” if we did it every 29th of February or every millisecond but I think that to say we do something “periodically” points to a lower frequency. But in TDD we should be refactoring often. Very often. Many times an hour. To really do TDD requires that we spend quite a large proportion of all the time invested any given programming exercise on refactoring. So, Ian has kind–of fallen at the first hurdle because he’s not really doing TDD right in the first instance.
Now, it used to be a frequent complaint about TDD advocates that we sounded like Communists: it was claimed that we would immediately respond to anyone who said that “I tried TDD and it doesn’t work” by claiming that they weren’t even doing TDD, really, in the same way that fans of Communism would contend that it had never really been tried properly so, hey, it might work, you don’t know.
Not a useful response
The thing is, though, a lot of people who dismiss TDD really haven’t tried it properly—and a lot who say that they do TDD aren’t doing it right either and are missing some benefits, but that’s another story—so of course they didn’t get the advertised effect. And by now we have lots of examples of people who really have tried TDD properly and the interesting and positive results they’ve obtained. Ian did not try doing TDD properly.
And then since that wasn’t going so well, he stopped even trying to:
[…] as I started implementing a GUI, the tests got harder to write and I didn’t think that the time spent on writing these tests was worthwhile.
Well, yes, we know that writing automated tests for GUIs is 1) hard and 2) relatively low value. But this:
So, I became less rigid (impure, perhaps) in my approach, so that I didn’t have automated tests for everything and sometimes implemented things before writing tests.
is not a useful response.
One useful response is to use something like MVC, or MVP, or ports-and-adaptors or one of the many other ways to make the GUI very, very thin and do automated tests behind that and test the actual GUI by hand. But from this point on Ian has basically invalidated his own exercise in TDD because although he wasn’t really doing it to begin with he was at least trying but it turned out to be tough and so he stopped trying. And also stopped learning. Which is a missed opportunity for him, and also for the rest of us. I encourage Ian to try again, maybe with some coaching, and see how that goes, because I would be genuinely interest to see how a seasoned software engineering academic gets on with that.
Not your daddy’s COBOL compiler
Think-first rather than test-first is the way to go.
he also says:
I started programming at a time where computer time was limited and you had to spend time looking at and thinking about the program as a whole.
Yes. There’s a whole hour long presentation that I have about this but—the microeconomics of programming have changed in quite a fundamental way over the last few decades. Even since I started working.
In my second job as a programmer I worked on a product written in C++ where, no joke, a full build was something you started on Friday lunchtime and went down the pub, hoping that it would be finished by the time you strolled in late on Monday morning. Even incremental builds on just the sub-system I was working on took “go have a cup of tea” amounts of time. Running our comprehensive automated unit test suite (written post hoc, as described above) took “go have lunch” amounts of time.
The time period that Ian is talking about was much worse even than that. In that era the rare and expensive resource was machine cycles and they need to be dedicated to doing the useful, revenue–earning thing. Programmer thinking time was, relatively, cheap and abundant so they mode of working tended to use lots of that to avoid wasting machine cycles on code that was not strongly expected to be correct.
If you wanted to work the way we do now—for example, with approximately one computer per programmer—you had to be, say, NASA, and you had to have, say, basically unlimited resources because your project was, say, considered to be a matter of national survival. But for most programmers, their employer could not afford that. The entire organisation might have as few as one computer. Maybe one to a department.
The whole edifice of traditional software engineering can be seen as a perfectly reasonable attempt to deal with the constraint that you can’t afford to have a programmer use machine cycles to do programming with. So you need to find ways to write programs away form a computer. That’s what the ▭s and →s were trying to do. The people who came up with that stuff meant well, but ended up creating that world of the colossally idiotic ways to proceed.
I was once sent on a COBOL programming course—it’s a long and dreary story—and on this course we worked within a simulation of those bad old good old days: programs were designed using what I later realised was Jackson Structured Programming, written out in pencil on pre-printed 80-column coding sheets, desk-checked, and then typed in to a COBOL development system. One PC for a class of about 20 students—before which we formed a queue—and we each only had three goes at the compiler. If it took more than three compile/test/debug episodes to get your program running you failed the course.
Today, we are awash with machine cycles. I have many billions of them available to me here right now every second and all I’m using them for is writing this blog post. John von Neumann* must be spinning in his grave.
Don’t play dumb
If I were programming right now, rather than doing this, then I could use those billions of cycles to get prompt, concrete feedback from a large body of tests and from other tools about my current position in a long series of small design decisions.
Rather than thinking in big, speculative lumps I could think in tiny, tiny increments—always with the ever important continual, frequent and determined refactoring.
There is a failure mode, though. Ian says:
[…] with TDD, you dive into the detail in different parts of the program and rarely step back and look at the big picture.
Don’t do that.
I don’t think that there’s anything in TDD that says not to step back and look at the big picture. There’s nothing that says to do that, it’s true, but why would’t you? It’s disappointing to see a retired Professor of Software Engineering playing dumb like this—if he feels the need to step back and look at the big picture then he should. He shouldn’t not do that merely because he’s making an attempt to try out a technique that doesn’t say to do that. I mean, really!
Mighty thinking is not the winning strategy
Added to which, I don’t recall anyone ever saying that TDD is the only design technique—and it is a design technique—that anyone needs to use at any scale to produce a good system. What is said, by me for one, is that by using TDD to guide design thinking and most importantly, to make it quick, easy, cheap and safe to explore different design options, we can get to better results sooner and more reliably than we can by mighty thinking, which was previously the only economically viable method.
I understand that this can discomforting to those who’s thoughts tend to the mighty. It’s almost as if in contemporary** software development mighty thinking has turned out not to be the winning strategy, long term.
Neither for individuals in their careers nor for their employers, nor for their industry. It might be time to come to terms with that. And for a certain kind of very smart, very capable, very confident designer of programs that means letting go. Letting go of the code, of the design, letting go of a certain sense of control and gaining in return a safe way to explore design options that you were too smart to think up yourself.
And that’s not easy.
† we had to use professional quality document preparation systems to do that, because of all the ▭s and →s and ∀s and ∃s. Which was fun.
* He’s supposed to have responded to a demo of some tools written by a programmer to make programming easier by saying that “it is a waste of a valuable scientific computing instrument to use it to do clerical work”
** that is, since about 2006…
So, reports that in France there is outrage amongst (right-wing, conservative) commentators that the current government of President Hollande (a socialist) has reconfirmed the orthographic changes proposed originally in 1990 and agreed by the government of President Chirac (right-wing, conservative) which—amongst other things—deletes the circumflex from words where it makes no difference.
Some of these reports seem to follow the wikipedia article on the circumflex in mentioning that in English, apart from loan words, the circumflex is not used today but was once: in the days when posting a letter was priced by weight an ô was used to abbreviate ough. As in “thô” for “though”. This seems like a fine convention, and one that I intend to adopt in tweets and instant messages. Now that we can pretty much assume that both ends of any messaging app conversation will have good Unicode support we can do a range of interesting things.
For example, althô you can put newlines in tweets† it seems as if many messaging apps are designed on the assumption that no–one using them ever has two consecutive thoughts and interprets a [RETURN] as send. I’ve started using ¶ in messages. I wish it could be typed on an iPhone soft keyboard. For some reason § can be, which I think is no more obscure. Anyway, the pilcrow can be copied and pasted, as can ‘∀’ to mean “all” & ‘∃’ to mean “there’s a” or similar. I’d like to use ‘¬’ for “not” but that might be a step too far, althô I do see a lot of “!=” and “=/=” type of thing in my twitter stream. I also tend to use pairs of unspaced em–dash for parenthetical remarks—like this—which saves two characters in a tweet vs. using actual parens (like this). The ellipsis comes in very handy in several ways… ¶ Over time I’m getting more relaxed about using ‘&’ which of course has a particularly long heritage, although not so long as is sometimes thôt.¶ What other punctuation can we revive, re-purpose or re-use?
Update: how do we feel about ‘þ’ or ‘ð’, both easily available from the Icelandic keyboard, for the?
† I’ve used this to sneak footnotes into tweets. Of course, this will all become a bit pointless if the managers at Twitter really do continue to force fit their brilliant ideas into the product, rather than continuing their previously successful strategy of paving cowpaths.
You may be pleased to learn that this is probably the penultimate thing I have to say here about #NoEstimates.
Anyway, it’s for these reasons…
It’s conceptually incoherent
From what what I can gather from following twitter discussions, and reading blogs, and articles, and buying and reading the book, then, in #NoEstimates land, supposing that someone were to come and ask you “how long will it take to develop XYZ piece of software?” then any one of the below could be an acceptable #NoEstimates answer, depending on which advocate’s line of reasoning you follow:
- Estimation is morally bankrupt and I shall have no part in continuing the evil of it. You are a monster! Get out of my office! But fund the endeavour anyway. Yes, I do mean an open-ended commitment to spend however much it turns out to take.
- Estimation is impossible even in principle so there is no way to answer that question, however roughly. But do please still fund the endeavour. No, I can’t indicate a reasonable budget.
- Estimation is impossible even in principle so there is no way to answer that question and even if there were I still wouldn’t because you can’t be trusted with such information. No, I can’t indicate a reasonable budget. It’ll be ok. Trust me. No, I don’t trust you; but trust me.
- Estimation is so difficult and the results so vague that you’re better off just starting the work and correcting as you go. It’ll be ok. Trust me. No, I still don’t trust you.
- Estimation is so difficult and the results so vague that you’re better off choosing to invest a small, but not too small, amount of money to do something and learn from it and then decide if you’ve come to trust me and want to do some more (or not, which would be disappointing but OK). For your kind of thing, I suggest we start with $BUDGET_FOR_A_BIT, expect to end up spending something in $TOP_DOWN_SYNTOPIC_ESTIMATED_SPEND_AS_A_RANGE
- Estimation is difficult to do with any useful level of confidence and the results of it hard to use effectively. What would you do with an estimate if I did provide it? How could we meet that need some other way?
- Here is a very rough estimate, heavily encrusted with caveats and hedges, of the effort required to deliver something of a size such as experience suggests that what you asked for will end up being. No, I will not convert that into a delivery date for you. Let me explain a better way to plan.
- OK, OK, since you insist, here is a grudgingly made estimate of a delivery date in which I have little faith, I hope it makes you happy. Please don’t use it against me.
For the record: my preferred answer is some sort of combination of 5 and 6, with a hint of 4, and 7 as a backup. And I have turned down potentially lucrative work on the basis of those kinds of answer being rejected.
That’s a huge range of options, many subsets of which are fundamentally, conceptually, incompatible with other subsets. Which means that #NoEstimates doesn’t really seem to me as if it’s much help in deciding what to do.
Except…one good thing about #NE is that it does rule out this answer: “let me fire up MS Project and do a WBS and figure out the critical path and…” which is madness, for software development, but you still see people trying to do it.
Also for the record: In my opinion far too many teams spend far too much time estimating far too many things in far too much detail, and in ways that aren’t sufficiently smart or useful.
Even in an “Agile” setting you see this, and for that I blame Scrum which has had from the beginning in it this weird obsession with estimating and re-estimating, and re-re-estimating again and again and again. I don’t do that. And I certainly don’t do, and do not recommend task-level estimates (or even having tasks smaller than a story).
I can’t understand what anyone’s saying
It seems as if the “no” in #NoEstimates doesn’t mean no. Or maybe it does. Or it might mean: prefer not to but do if you have to. Or it might mean: I’d prefer that you didn’t, but if it’s working for you carry on.
And the “estimate” in #NoEstimates doesn’t mean estimate. It means: an involuntary commitment to an unsubstantiated wild-arsed guess that someone in authority later punishes you for not meeting§. Or it might mean estimate after all, but only if the estimate is based on no data. If there’s data, then that’s a “forecast”, which is both a kind of estimate and not an estimate.
“Control” seems to be a magic word for some #NE people. It’s said to them in the morally neutral, cybernetics sense but they hear it in some Theory X sense, as if it always has the prefix “Command and ”. This creates the impression that they have no interest in being in control of development, budgets, etc. Which might or might not be true. Who can say?
So not only are the #NoEstimates concepts all over the place, they’re discussed in something close to a private vocabulary—maybe more than one private vocabulary. This is not an effective approach to an engineering topic.
Nevertheless: it’s strong medicine and it’s being handled sloppily
…which, if you’ve ever taken strong medicine you’ll know is a poor policy.
In the contexts for software development that I’m familiar with† the idea of making estimates as an input to a budgetary process at the level of months, quarters, years and maybe (hopefully) beyond is really deeply baked in. Maybe this is part of why Scrum has managed to find such a better fit in corporate land that, say, XP ever did, because a Scrum team can seem to still play that game.
For a development team to around and say even that estimates are too difficult to make useful so lets do something else instead is very challenging to the conventions of the corporation. Conventions which I believe should be challenged, in principle. To turn around and say estimation is (and always was) impossibly difficult and management were doing bad things with the results of it is going to deeply challenge and upset many people in an unhealthy way. That’s not the way to effectively change organisational habits. We saw this before with Scrum.
Now, I happen to be of the opinion that estimation is hard, but can be done well, and you can learn to do it better, and the results of it are often misapplied. And I’ve come to the opinion that the most effective and/because least upsetting route to dealing with that is to re-educate managers to do their work in a better way such that they stop asking for estimates.I find that coaching managers to ask more helpful questions beats coaching programmers to give fewer unhelpful answers.
For the record, too: I agree that too many enterprises use developers’ estimates in a way that is invalid in theory, unhelpful in practice, and questionable in its effect on the long term health of the business (and the developers). But, also for the record, I do not agree that this is an inevitable consequence of some intrinsic problem with estimation.
But in the #NE materials that I’ve seen there’s not really much recognition of these organizational aspects of the proposed change. It seems mainly to be about convincing developers that they shouldn’t be producing estimates and explain how misguided (and best) or evil (at worst) management are to ask for estimates in the first place.
We’re just not smart about this kind of thing
…and the treatment of #NoEstimates that I’ve seen fosters exactly the kind of not-smartness that can get us into a real mess.
The industry, and corporations within it, and teams within corporations have a tendency to lurch from one preposterous extreme to another, and to wildly mis-interpret very well intentioned recommendations. This is a particular problem when the recommendation is to do less of something that’s perceived as irksome.
eXtreme Programming offers a good example. When considering a proposed way of working I often find it useful to consider to what it has arisen as a reaction. On one hand, #NoEstimates seems to be partly a reaction against the very degenerate Scrum-and-Jira style of “Agile” development that many corporations are far to comfortable with. And on another hand, it seems to be a reaction against some really terrible management behaviour* that’s connected with estimation.
eXtremeProgramming can be usefully read as a reaction against the insane conditions** in large corporate software shops in the mid 1990s. People who really wanted to be programmers rushed to XP with joyous relief. As it happens I took some convincing, because I kind-of wanted the models thing—and not just boxes-and-arrows, I love, love, love me some ∃s and ∀s—to work. But it doesn’t. So, you know, I’m able to recognise that my ideas need to change, and I’m prepared to do the work.
Anyway, in part of the rush to XP we found that people abandoned the writing of design documents—these days they’d be condemned as muda, but that Lean/kanban vocabulary wasn’t so widespread then—but unfortunately the design work that we were meant to do instead in other ways didn’t really get picked up. Similarly, BRDs and Use Cases and all that stuff went out of the window but good requirements practices tended to vanish too. And the results were not pretty.
And so, over a long and painful period we had to invent things like Software Craftsmanship to try to re-inject design thinking, and we—that is, Jeff Patton—had to introduce things like Story Mapping to get back to a way to talk about big, complex scope.
I invite you to get back to me in, oh, say, 5 years and check on this…forecast: Either #NoEstimates will have burned out and no-one will really remember what it was, or…
- There will be a[t least one] subscription taking membership organization devoted to #NoEstimates
- Leading members of that organization will make their living as #NoEstimates consultants and trainers, far removed from any actual development
- This organization will operate a multi-level marketing scheme in which the organization certifies, at great expense, senior trainers who train, at substantial expense, certified trainers who train, at not outrageous expense, certified #NoEstimates “practitioners”
- adoption of #NoEstimates will turn out to lack some benefit of estimation that #NoEstimates advocates wont’t see and can’t imagine and some other practice will have to have been invented to get it back.
And then the circle will be complete. I don’t think that we’re collectively smart enough to avoid this dreary, self-defeating outcome.
Update—as if by magic, this twitter exchange appears within 12 hours of my post (NB I mean no criticism of either party and I thank them for holding this conversation in public):
Noel asks: What is the first thing one should consider when contemplating a move to NoEstimates?
And Woody replies:
#NoEstimates isn’t something you “move to”. It is about exploring our dependence on and use of estimates, and seeking better.
I expect that many conversations like this are taking place. And that’s how the subtle but valuable message fades away and the industry’s hunger to be told the right thing to do (and then, worse, do it) takes over.
Although I have worked at, as it were, newly-founded companies with few staff, little money and one big idea, working out of adapted loft space I don’t characterise those as “startups”. That’s because the business model was not to throw fairly random things at the wall in the hope that one of them stuck long enough to pay the bills until the next funding round arrived; repeat until exit strategy kicks in or broke—which is how I understand “startups” to work. That’s a world that I don’t understand very well because I haven’t done it.
So: some of the corporations I’ve worked have been product companies, and some sold a service, and some were small and local and some were large and global, and plenty of variations on that. That’s the world I understand, and what I write here grows out of that understanding.
The F&A people wanted to know when, during an iterative, incremental development process they were supposed to create the intangible asset on the balance sheet representing the value-add on the CAPEX on building the system, so that they could start amortising it.
The HR people wanted to know how, with a cross-functional, self-organizing team in place, they could do performance management and, and I quote, figure out “who to pay a bonus to and who to fire”.
I’ve recently heard of companies that link the “accuracy” (whatever that might mean…) of task estimates to bonus pay. And I agree with J.B. that it’s fucking disgusting. What I very much doubt is that fixing that state of affairs is primarily a bottom-up exercise in not estimating.
** Around that time I held—mercifully briefly—a job as a “Software Engineer” in which the ideal we were meant to strive for was that no code was created other than by forward-generation from a model in Rational Rose.
That is, in the technical, sense used in Lean manufacturing, who’s first two principles include:
- Specify value from the standpoint of the end customer by product family.
- Identify all the steps in the value stream for each product family, eliminating whenever possible those steps that do not create value.
The “steps that do not create value” are waste. If our product is, or contains a lot of, software, is the action of testing that software waste, that is, not creating value from the standpoint of the end customer?
At the time of writing I am choosing the carpets tiles for our new office. On the back of the sample book is a list of 11 test results for the carpet relating to various ISO, EN and BS standards, eg the EN 986* dimensional stability of these carpet tiles is < 0.2%—good to know! There are also the marks of cradletocradle certification, GUT registration, BREEAM registration, a few other exotica and a CE mark. Why would the manufacturer go to all this trouble? Partly, because of regulation: an office fitter would baulk at using carpet that did not meet certain mandatory standards. And partly because customers like to see a certain amount of testing.
Take a look around your home or office, I’ll bet you have a lot of small electrical items of various kinds. Low-voltage power supplies, in particular. Take a look a them. You will find on some the mark of Underwriters Laboratories, which indicates that the manufacturer has voluntarily had the product tested by UL for safety, and maybe for other things. If you’re in the habit of taking things apart, or building things, you might also be familiar with the UL’s “recognised component” mark for parts of products. On British made goods† you might see the venerable British Standards Institution “Kite Mark” , or maybe on Canadian gear the CSA mark , on German kit one of the TÜV marks, and so on. These certifications are for the most part voluntary. Manufacturers will not be sanctioned for not obtaining these marks for their products, nor will—other than in some quite specialised cases—anyone be sanctioned for buying a product which does not bear these marks.
Sometimes a manufacturer will obtain many marks for a product, and sometimes fewer, and sometimes none. I invite you to do a little survey of the electrical items in your office or home: how many marks does each one have. Do you notice a pattern?
I’ll bet that the more high-end a device—in the case of power supplies, the more high-end what they drive—the more marks the device will bear, and the more prestigious those marks will be. Cheaper gear will have fewer, less prestigious marks—ones that make you say “uh?!”†† and the very cheapest will have none.
If testing is waste, why do manufacturers do this?
How does your answer translate to software development?
†† There are persistent rumours that some Chinese manufacturers of questionable business ethics have concocted a mark of their own which looks from a distance like the mark
Well, this feels like a conversation from a long time ago. This presentation got tweeted about, which asserts that
Mocks kill TDD. [sic]
which seems bold. And also that
TDD = Design Methodology
which seems wrong. And also that
Test-first encourages you to design code well enough to test…and no further
which seems to have profoundly misunderstood TDD.
Just so we can all agree what we’re talking about, I think that TDD works like this:
repeat until done:
- write a little test, reflecting the next thing that your code needs to do, but doesn’t yet
- see it fail
- make all tests—including the new one—pass, as quickly and easily as possible
- refactor your working code to produce an improved design
I don’t see that as being a design methodology. It’s a small-scale process for making rapid progress towards done while knowing that you’ve not broken anything that was working, and which contains a publicly stated commitment to creating and maintaining a good design. There’s nothing there about what makes a good design—although TDD typically comes with guidance about well designed code being simple, well designed code lacking duplication and—often overlooked, this—well designed code being easy to change. I also often suggest that if the next test turns out to be hard to write, you should probably do some more refactoring.
Note that in TDD we don’t—or shouldn’t—test a design, that is, we shouldn’t come up with a design and then test for it. Instead we discover a design through writing tests. TDD doesn’t design for you, but it does give you a set of behaviours within which to do design. And I’m pretty sure that when followed strictly, TDD leads to designs that have measurably different properties than designs arrived at other ways. Which is why this blog existed in the first place (yes, I have been a bit lax about that stuff recently). UPDATE: a commentator on lobste.rs (no, me neither) quotes me saying that “TDD doesn’t design for you, but it does give you a set of behaviours within which to do design.” and asks: how is TDD not a design methodology, then?! And I answer: because it doesn’t provide a vocabulary of terms with which to talk about design, it doesn’t provide a goal for design, it doesn’t provide any criteria by which a design could be assessed, it doesn’t provide any guidance for doing design beyond this—do some, do it little bit at a time, do it by improving the design of already working code. If that looks like a methodology to you, then OK.
But Ken does have a substantive objection to code that he’s seen written with mocks. Code which has tests like this:
and I certainly agree that this is a terrible test. There are far too many mocks in it, and their expectations are far too complex and far too specific. Worst of all, the expectations refer to other mocks. This is terrible stuff. You can’t tell what the hell’s going on, and this test will be extraordinarily brittle because it reaches out far too far into the system. It probably has a net negative value to the programmers who wrote it. That’s bad. Don’t do that.
Is this the fault of mocks? Not really. The code under test here wouldn’t be much different, I’ll bet, if it hadn’t been TDD’d—If this code even was TDD’d, I have my doubts although people do do this sort of thing, I know. This confusing, brittle, unhelpful test has been written with mocks, but not because of mocks. One could speculate that it was written by someone who’d got far, far too carried away with the things that mock frameworks can do, and failed to apply good taste, common sense and any kind of design sensibility to what they were doing. Is that the fault of mocks? Not really. Show me a tool that can’t be abused and I’ll show you a tool that isn’t worth having.
Other Styles of Programming
Ken, of course, has an agenda, which is really to promote a functional style of programming in which mock objects are not much help in writing the tests. I think he’s right about that and it should be no surprise as mocks are about writing tests that have something to say about what method invocations happen in what order, and as you move towards a functional style that becomes less and less of a concern. So maybe Ken’s issue with mocks is that they don’t stop you from writing non-functional code—to which I say: that doesn’t mean that you have to.
If you can move to functional programming (spoiler: not everyone can) and if your problem is one that is best solved though a functional solution (spoiler: not all of them are), then off you go, and mocks will not be a big part of your world and fair enough and more power to you. But if not…
Now, I tweeted to this effect and that got Ron wondering about that kind of variation, and why it might be that Smalltalk programmers don’t use mocks when doing TDD. Ron kind-of conflates what he calls the “Detroit School” of TDD and “doing TDD in Smalltalk”, which is kind-of fair enough as Kent and he and the others developed their thinking about TDD in Smalltalk and that’s the style of TDD that was first widely discussed on the wiki and spread from there.
Ron says that he does use “test doubles” for:
“slow” operations, and operations across an interface to software that I don’t have control of
and of course mocks are very handy in those cases. But that’s not what they’re for. Ron says:
Perhaps our system relies on a slow operation, such as a database access […] When we TDD such a thing, we will often build a little object that pretends to be a database […] that responds instantly without actually exercising any of the real mechanism. This is dead center in the Mock Object territory,
Well, no. Again, you can use mocks for such tests, but you’ll only get much value from that if your test cares about, say, what the query to the database is (rather than merely using the result). And while it will make your tests go fast, that’s not the real motivation for the mock handy as it may be.
A Brief History Lesson
Mocks were invented to solve a very specific problem: how to test Java objects which do not expose any state. Really not any. No public fields, no public getters. It was kind-of a whim of a CTO. And the solution was to pass in a collaborating object which would test the object which was the target of the test “from the inside” by expecting to be called with certain values (in a certain order, blah blah blah) by the object under test and failing the test otherwise.
A paper from 2001 by the originators of mocks describes the characteristics of a good mock very well:
A Mock Object is a substitute implementation to emulate or instrument other domain code. It should be simpler than the real code, not duplicate its implementation, and allow you to set up private state to aid in testing. The emphasis in mock implementations is on absolute simplicity, rather than completeness. […] We have found that a warning sign of a Mock Object becoming too complex is that it starts calling other Mock Objects – which might mean that the unit test is not sufficiently local. [emphasis added]
the object under test in a mock object test is surrounded by a little cloud of collaborating mocks which are simple, incomplete and local. UPDATE: Nat Pryce reminds me that process calculi, such as CSP, had an influence on the JMock approach to mocking.
Ron talks about Detroit/Smalltalk TDD-ers developing their test doubles by this means:
just code such a thing up […] Generally we’d build them up because we work very incrementally – I think more incrementally than London Schoolers often do – so it is natural for our mock objects to come into being gradually. [emphasis added]
I don’t know where he gets that impression about the “LondonSchool”. In my experience, in London and elsewhere, mocks made with frameworks also come into being gradually, one expectation or so at a time. How else? UPDATE: Rachel Davies reminds me that the originators of mocking had a background in Smalltalk programming anyway.
Ron speculates that mocks are likely to be more popular amongst programmers who work with libraries that they don’t control, and I expect so. Smalltalkers don’t do that much, almost everyone else does, lots. He speculates that mocks are likely to be more popular amongst programmers who work with distributed systems of various kinds, and I expect so. Smalltalkers don’t do that much, almost everyone else does, lots. Now, if we could all write our software in Smalltalk the world would undeniably be a better place, but…
In fact, I suspect that Smalltalkers write a lot of mocks, but that these tend to develop quite naturally into the real objects. The Smalltalk environment and tools affords that well. Almost everyone else’s environment and tooling fights against that every step of the way. And Smalltalkers won’t generally use a mocking framework, although there are some super cute ones, because the don’t have to overcome the stumbling blocks that languages like Java put in the way of anyone who actually wants to get anything done.
Anyway, there’s this thing about tools. Tools have affordances, and good tools strongly afford using them the right way and weakly—or not at all—afford using them the wrong way. And there are very special purpose tools, and there are tools that are very flexible. I read somewhere that the screwdriver is the most abused tool in the toolbox, because a steel rod that’s almost sharp at one end and has a handle at the other is just so damn useful. But that doesn’t mean that it’s a good idea to use one as a chisel. I grew up on a farm and I remember an old Ferguson tractor which was started by using a (very large) screwdriver to short between the starter motor solenoid and the engine block. Also not a good idea.
That we can do these things with them does not make screwdrivers bad. And the screwdriver does not encourage us to be idiots—it just doesn’t stop us. And so it is with mocks—they are enormously powerful and useful and flexible and will not stop us from being stupid. In particular, they will not stop us from doing our design work badly. And neither will TDD.
What I think they do do, in fact, is make the implementation of bad design conspicuously painful—remember that line about the next test being hard to write? But programmers tend to suffer from very bad target fixation when a tool becomes difficult to use and they put their head down and power through, when they should really stop and take a step back and think about what the hell they’re doing.