Vlad's Roam Garden

Powered by 🌱Roam Garden

Clean Code: A Handbook of Agile Software Craftsmanship

This is an "Orphan" page. Its core content has not been shared: what you see below is a loose collection of pages and page snippets that mention this page, as well as snippets of this page that were quoted elsewhere.

G5: duplication This is one of the most important rules in this book, and you should take it very seriously. Virtually every author who writes about software design mentions this rule.

Dave Thomas and Andy Hunt called it the DRY principle (Don’t Repeat Yourself).

Kent Beck made it one of the core principles of Extreme Programming and called it: “Once, and only once.”

Ron Jeffries ranks this rule second, just below getting all the tests to pass.

Every time you see duplication in the code, it represents a missed opportunity for abstraction. That duplication could probably become a subroutine or perhaps another class outright. By folding the duplication into such an abstraction, you {{=:1|increase the vocabulary of the language of your design. Other programmers can use the abstract facilities you create}}. Coding becomes faster and less error prone because you have raised the abstraction level.

The most obvious form of duplication is when you have clumps of identical code that look like some programmers went wild with the mouse, pasting the same code over and over again. These should be replaced with simple methods.

A more subtle form is {{=:2|the switch/case or if/else chain that appears again and again in various modules, always testing for the same set of conditions}}. These should be replaced with polymorphism.

Still more subtle are {{=:3|the modules that have similar algorithms, but that don’t share similar lines of code}}. This is still duplication and should be addressed by using the Template Method , Strategy pattern. Indeed, most of the design patterns that have appeared in the last fifteen years are simply well-known ways to eliminate duplication. So too the Codd Normal Forms are a strategy for eliminating duplication in database schemae. OO itself is a strategy for organizing modules and eliminating duplication. Not surprisingly, so is structured programming. I think the point has been made. Find and eliminate duplication wherever you can.

DSLs, when used effectively, raise the abstraction level above code idioms and design patterns. They allow the developer to reveal the intent of the code at the appropriate level of abstraction.

The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that

One of the more common motivations for writing code/comments is bad code. We write a module and we know it is confusing and disorganized. We know it’s a mess. So we say to ourselves, “Ooh, I’d better comment that!” No! You’d better clean it! Clear and expressive code with few comments is far superior to cluttered and complex code with lots of comments. Rather than spend your time writing the comments that explain the mess you’ve made, spend it cleaning that mess.

Anything that forces you to check the function signature is equivalent to a double-take. It’s a cognitive break and should be avoided.

Kent Beck wrote about this in his great book Smalltalk Best Practice Patterns and again more recently in his equally great book Implementation Patterns. One of the more powerful ways to make a program readable is to break the calculations up into intermediate values that are held in variables with meaningful names.

Consider this example from FitNesse:

if(match.find())  { 
  String key = match.group(1);  
  String value = match.group(2);  
  headers.put(key.toLowerCase(), value);  
}

The simple use of explanatory variables makes it clear that the first matched group is the key, and the second matched group is the value.

It is hard to overdo this. More explanatory variables are generally better than fewer. It is remarkable how an opaque module can suddenly become transparent simply by breaking the calculations up into well-named intermediate values.

This implies that the blocks within if statements, else statements, while statements, and so on should be one line long. Probably that line should be a function call. Not only does this keep the enclosing function small, but it also adds documentary value because the function called within the block can have a nicely descriptive name.

Mixing level of abstraction within a function is always confusing. Readers may not be able to tell whether a particular expression is an essential concept or a detail. Worse, like broken windows theory, once details are mixed with essential concepts, more and more details tend to accrete within the function.

G34: Functions Should Descend Only One level of abstraction

The statements within a function should all be written at the same level of abstraction, which should be one level below the operation described by the name of the function. This may be the hardest of these heuristics to interpret and follow. Though the idea is plain enough, humans are just far too good at seamlessly mixing level of abstraction. Consider, for example, the following code taken from FitNesse:

 public String render() throws Exception {
     StringBuffer html = new StringBuffer(“<hr”);
     if(size > 0)
       html.append(” size=\“”).append(size + 1).append(”\“”);
     html.append(“>”);

     return html.toString();
   }

A moment’s study and you can see what’s going on. This function constructs the HTML tag that draws a horizontal rule across the page. The height of that rule is specified in the size variable.

Now look again. This method is mixing at least two level of abstraction. The first is the notion that a horizontal rule has a size. The second is the syntax of the HR tag itself. This code comes from the HruleWidget module in FitNesse. This module detects a row of four or more dashes and converts it into the appropriate HR tag. The more dashes, the larger the size.

I refactored this bit of code as follows. Note that I changed the name of the size field to reflect its true purpose. It held the number of extra dashes.

   public String render() throws Exception
   {
     HtmlTag hr = new HtmlTag(“hr”);
     if (extraDashes > 0)
       hr.addAttribute(“size”, hrSize(extraDashes));
     return hr.html();
   }

   private String hrSize(int height)
   {
     int hrSize = height + 1;
     return String.format(“%d”, hrSize);
   }

This change separates the two level of abstraction nicely. The render function simply constructs an HR tag, without having to know anything about the HTML syntax of that tag. The HtmlTag module takes care of all the nasty syntax issues.

Indeed, by making this change I caught a subtle error. The original code did not put the closing slash on the HR tag, as the XHTML standard would have it. (In other words, it emitted <hr> instead of <hr/>.) The HtmlTag module had been changed to conform to XHTML long ago.

Separating level of abstraction is one of the most important functions of refactoring, and it’s one of the hardest to do well. As an example, look at the code below. This was my first attempt at separating the abstraction levels in the HruleWidget.render method.

   public String render() throws Exception
   {
     HtmlTag hr = new HtmlTag(“hr”);
     if (size > 0) {
       hr.addAttribute(“size”, “”+(size+1));
     }
     return hr.html();
   }

My goal, at this point, was to create the necessary separation and get the tests to pass. I accomplished that goal easily, but the result was a function that still had mixed level of abstraction. In this case the mixed levels were the construction of the HR tag and the interpretation and formatting of the size variable. This points out that when you break a function along lines of abstraction, you often uncover new lines of abstraction that were obscured by the previous structure.

If a function does only those steps that are one level below the stated name of the function, then the function is doing one thing. After all, the reason we write functions is to decompose a larger concept (in other words, the name of the function) into a set of steps at the next level of abstraction.

complexity kills. It sucks the life out of developers, it makes products difficult to plan, build, and test.

The length of a name should be related to the length of the scope. You can use very short variable names for tiny scopes, but for big scopes you should use longer names.

Variable names like i and j are just fine if their scope is five lines long. Consider this snippet from the old standard “Bowling Game”:

  for (int i=0; i<n; i++)  g.roll(pins);  
}

This is perfectly clear and would be obfuscated if the variable i were replaced with something annoying like rollCount. On the other hand, variables and functions with short names lose their meaning over long distances. So the longer the scope of the name, the longer and more precise the name should be

Hiding implementation is not just a matter of putting a layer of functions between the variables. hiding implementation is about abstractions! A class does not simply push its variables out through getters and setters. Rather it exposes abstract interfaces that allow its users to manipulate the essence of the data, without having to know its implementation.

G36: Avoid Transitive Navigation In general we don’t want a single module to know much about its collaborators. More specifically, if A collaborates with B, and B collaborates with C, we don’t want modules that use A to know about C. (For example, we don’t want a.getB().getC().doSomething();)

This is sometimes called the Law of Demeter. The Pragmatic Programmers call it “Writing Shy Code.” In either case it comes down to making sure that modules know only about their immediate collaborators and do not know the navigation map of the whole system.

If many modules used some form of the statement a.getB().getC(), then it would be difficult to change the design and architecture to interpose a Q between B and C. You’d have to find every instance of a.getB().getC() and convert it to a.getB().getQ().getC(). This is how architectures become rigid. Too many modules know too much about the architecture.

Rather we want our immediate collaborators to offer all the services we need. We should not have to roam through the object graph of the system, hunting for the method we want to call. Rather we should simply be able to say:

myCollaborator.doSomething()

G8: Too Much Information

Well-defined modules have very small interfaces that allow you to do a lot with a little. Poorly defined modules have wide and deep interfaces that force you to use many different gestures to get simple things done. A well-defined interface does not offer very many functions to depend upon, so coupling is low. A poorly defined interface provides lots of functions that you must call, so coupling is high. Good software developers learn to limit what they expose at the interfaces of their classes and modules.

The fewer methods a class has, the better.

The fewer variables a function knows about, the better.

The fewer instance variables a class has, the better.

Hide your data.

Hide your utility functions.

Hide your constants and your temporaries.

Don’t create classes with lots of methods or lots of instance variables. Don’t create lots of protected variables and functions for your subclasses. Concentrate on keeping interfaces very tight and very small. Help keep coupling low by limiting information.

G27: Structure over Convention

Enforce design decisions with structure over convention. Naming conventions are good, but they are inferior to structures that force compliance. For example, switch/cases with nicely named enumerations are inferior to base classes with abstract methods. No one is forced to implement the switch/case statement the same way each time; but the base classes do enforce that concrete classes have all abstract methods implemented.

G22: Make Logical Dependencies Physical If one module depends upon another, that dependency should be physical, not just logical. The dependent module should not make assumptions (in other words, logical dependencies) about the module it depends upon. Rather it should explicitly ask that module for all the information it depends upon.

For example, imagine that you are writing a function that prints a plain text report of hours worked by employees. One class named HourlyReporter gathers all the data into a convenient form and then passes it to HourlyReportFormatter to print it. (See Listing 17-1 .)

Listing 17-1 HourlyReporter.java

   public class HourlyReporter {
     private HourlyReportFormatter formatter;
     private List<LineItem> page;
     private final int PAGE_SIZE = 55;
     public HourlyReporter(HourlyReportFormatter formatter) {
       this.formatter = formatter;
       page = new ArrayList<LineItem>();
     }

     public void generateReport(List<HourlyEmployee> employees) {
       for (HourlyEmployee e : employees) {
         addLineItemToPage(e);
         if (page.size() == PAGE_SIZE)
           printAndClearItemList();
       }
       if (page.size() > 0)
         printAndClearItemList();
     }

     private void printAndClearItemList() {
       formatter.format(page);
       page.clear();
     }
     
     private void addLineItemToPage(HourlyEmployee e) {
       LineItem item = new LineItem();
       item.name = e.getName();
       item.hours = e.getTenthsWorked() / 10;
       item.tenths = e.getTenthsWorked() % 10;
       page.add(item);
     }
     public class LineItem {
       public String name;
       public int hours;
       public int tenths;
     }
   }

This code has a logical dependency that has not been physicalized. Can you spot it? It is the constant PAGE_SIZE. Why should the HourlyReporter know the size of the page? Page size should be the responsibility of the HourlyReportFormatter.

The fact that PAGE_SIZE is declared in HourlyReporter represents a misplaced responsibility [G17] that causes HourlyReporter to assume that it knows what the page size ought to be. Such an assumption is a logical dependency. HourlyReporter depends on the fact that HourlyReportFormatter can deal with page sizes of 55. If some implementation of HourlyReportFormatter could not deal with such sizes, then there would be an error.

We can physicalize this dependency by creating a new method in HourlyReport-Formatter named getMaxPageSize(). HourlyReporter will then call that function rather than using the PAGE_SIZE constant.

G31: Hidden Temporal Couplings

Temporal couplings are often necessary, but you should not hide the coupling. Structure the arguments of your functions such that the order in which they should be called is obvious. Consider the following:

   public class MoogDiver {
     Gradient gradient;
     List<Spline> splines;

     public void dive(String reason) {
       saturateGradient();
       reticulateSplines();
       diveForMoog(reason);
     }
     …
   }

The order of the three functions is important. You must saturate the gradient before you can reticulate the splines, and only then can you dive for the moog. Unfortunately, the code does not enforce this temporal coupling. Another programmer could call reticulateSplines before saturateGradient was called, leading to an UnsaturatedGradientException. A better solution is:

   public class MoogDiver {
     Gradient gradient;
     List<Spline> splines;

     public void dive(String reason) {
       Gradient gradient = saturateGradient();
       List<Spline> splines = reticulateSplines(gradient);
       diveForMoog(splines, reason);
     }
     …
   }

This exposes the temporal coupling by creating a bucket brigade. Each function produces a result that the next function needs, so there is no reasonable way to call them out of order.

You might complain that this increases the complexity of the functions, and you’d be right. But that extra syntactic complexity exposes the true temporal complexity of the situation.

Note that I left the instance variables in place. I presume that they are needed by private methods in the class. Even so, I want the arguments in place to make the temporal coupling explicit.

N3: Use Standard Nomenclature Where Possible

Names are easier to understand if they are based on existing convention or usage. For example, if you are using the Decorator pattern, you should use the word Decorator in the names of the decorating classes. For example, AutoHangupModemDecorator might be the name of a class that decorates a Modem with the ability to automatically hang up at the end of a session.

Patterns are just one kind of standard. In Java, for example, functions that convert objects to string representations are often named toString. It is better to follow conventions like these than to invent your own.

Teams will often invent their own standard system of names for a particular project. Eric Evans refers to this as a ubiquitous language for the project. Your code should use the terms from this language extensively. In short, the more you can use names that are overloaded with special meanings that are relevant to your project, the easier it will be for readers to know what your code is talking about.

G6: Code at Wrong level of abstraction

It is important to create abstractions that separate higher level general concepts from lower level detailed concepts. Sometimes we do this by creating abstract classes to hold the higher level concepts and derivatives to hold the lower level concepts. When we do this, we need to make sure that the separation is complete. We want all the lower level concepts to be in the derivatives and all the higher level concepts to be in the base class.

For example, constants, variables, or utility functions that pertain only to the detailed implementation should not be present in the base class. The base class should know nothing about them.

This rule also pertains to source files, components, and modules. Good software design requires that we separate concepts at different levels and place them in different containers. Sometimes these containers are base classes or derivatives and sometimes they are source files, modules, or components. Whatever the case may be, the separation needs to be complete. We don’t want lower and higher level concepts mixed together.

Consider the following code:

   public interface Stack {
     Object pop() throws EmptyException;
     void push(Object o) throws FullException;
     double percentFull();

     class EmptyException extends Exception {}
     class FullException extends Exception {}
   }

The percentFull function is at the wrong level of abstraction. Although there are many implementations of Stack where the concept of fullness is reasonable, there are other implementations that simply could not know how full they are. So the function would be better placed in a derivative interface such as BoundedStack.

Perhaps you are thinking that the implementation could just return zero if the stack were boundless. The problem with that is that no stack is truly boundless. You cannot really prevent an OutOfMemoryException by checking for

stack.percentFull() < 50.0

Implementing the function to return 0 would be telling a lie.

The point is that you cannot lie or fake your way out of a misplaced abstraction. Isolating abstractions is one of the hardest things that software developers do, and there is no quick fix when you get it wrong.

G30: Functions Should Do One Thing

It is often tempting to create functions that have multiple sections that perform a series of operations. Functions of this kind do more than one thing, and should be converted into many smaller functions, each of which does one thing.

For example:

   public void pay() {
     for (Employee e : employees) {
       if (e.isPayday()) {
         Money pay = e.calculatePay();
         e.deliverPay(pay);
       }
     }
   }

This bit of code does three things. It loops over all the employees, checks to see whether each employee ought to be paid, and then pays the employee. This code would be better written as:

   public void pay() {
     for (Employee e : employees)
       payIfNecessary(e);
   }

   private void payIfNecessary(Employee e) {
     if (e.isPayday())
       calculateAndDeliverPay(e);
   }

   private void calculateAndDeliverPay(Employee e) {
     Money pay = e.calculatePay();
     e.deliverPay(pay);
   }

Each of these functions does one thing. (See “Do One Thing” on page 35.)