Code

March 15, 2012

About HTML semantics and front-end architecture

A collection of thoughts, experiences, ideas that I like, and ideas that I have been experimenting with over the last year. It covers HTML semantics, components and approaches to front-end architecture, class naming patterns, and HTTP compression.

About semantics

Semantics is the study of the relationships between signs and symbols and what they represent. In linguistics, this is primarily the study of the meaning of signs (such as words, phrases, or sounds) in language. In the context of front-end web development, semantics are largely concerned with the agreed meaning of HTML elements, attributes, and attribute values (including extensions like Microdata). These agreed semantics, which are usually formalised in specifications, can be used to help programmes (and subsequently humans) better understand aspects of the information on a website. However, even after formalisation, the semantics of elements, attributes, and attribute values are subject to adaptation and co-option by developers. This can lead to subsequent modifications of the formally agreed semantics (and is an HTML design principle).

Distinguishing between different types of HTML semantics

The principle of writing “semantic HTML” is one of the foundations of modern, professional front-end development. Most semantics are related to aspects of the nature of the existing or expected content (e.g. h1 element, lang attribute, email value of the type attribute, Microdata).

However, not all semantics need to be content-derived. Class names cannot be “unsemantic”. Whatever names are being used: they have meaning, they have purpose. Class name semantics can be different to those of HTML elements. We can leverage the agreed “global” semantics of HTML elements, certain HTML attributes, Microdata, etc., without confusing their purpose with those of the “local” website/application-specific semantics that are usually contained in the values of attributes like the class attribute.

Despite the HTML5 specification section on classes repeating the assumed “best practice” that…

…authors are encouraged to use [class attribute] values that describe the nature of the content, rather than values that describe the desired presentation of the content.

…there is no inherent reason to do this. In fact, it’s often a hindrance when working on large websites or applications.

Content-layer semantics are already served by HTML elements and other attributes.
Class names impart little or no useful semantic information to machines or human visitors unless it is part of a small set of agreed upon (and machine readable) names – Microformats.
The primary purpose of a class name is to be a hook for CSS and JavaScript. If you don’t need to add presentation and behaviour to your web documents, then you probably don’t need classes in your HTML.
Class names should communicate useful information to developers. It’s helpful to understand what a specific class name is going to do when you read a DOM snippet, especially in multi-developer teams where front-enders won’t be the only people working with HTML components.

Take this very simple example:

<div class="news">
  <h2>News</h2>
  [news content]
</div>

The class name news doesn’t tell you anything that is not already obvious from the content. It gives you no information about the architectural structure of the component, and it cannot be used with content that isn’t “news”. Tying your class name semantics tightly to the nature of the content has already reduced the ability of your architecture to scale or be easily put to use by other developers.

Content-independent class names

An alternative is to derive class name semantics from repeating structural and functional patterns in a design. The most reusable components are those with class names that are independent of the content.

We shouldn’t be afraid of making the connections between layers clear and explicit rather than having class names rigidly reflect specific content. Doing this doesn’t make classes “unsemantic”, it just means that their semantics are not derived from the content. We shouldn’t be afraid to include additional HTML elements if they help create more robust, flexible, and reusable components. Doing so does not make the HTML “unsemantic”, it just means that you use elements beyond the bare minimum needed to markup the content.

Front-end architecture

The aim of a component/template/object-oriented architecture is to be able to develop a limited number of reusable components that can contain a range of different content types. The important thing for class name semantics in non-trivial applications is that they be driven by pragmatism and best serve their primary purpose – providing meaningful, flexible, and reusable presentational/behavioural hooks for developers to use.

Reusable and combinable components

Scalable HTML/CSS must, by and large, rely on classes within the HTML to allow for the creation of reusable components. A flexible and reusable component is one which neither relies on existing within a certain part of the DOM tree, nor requires the use of specific element types. It should be able to adapt to different containers and be easily themed. If necessary, extra HTML elements (beyond those needed just to markup the content) and can be used to make the component more robust. A good example is what Nicole Sullivan calls the media object.

Components that can be easily combined benefit from the avoidance of type selectors in favour of classes. The following example prevents the easy combination of the btn component with the uilist component. The problems are that the specificity of .btn is less than that of .uilist a (which will override any shared properties), and the uilist component requires anchors as child nodes.

.btn { /* styles */ }
.uilist { /* styles */ }
.uilist a { /* styles */ }

<nav class="uilist">
  <a href="#">Home</a>
  <a href="#">About</a>
  <a class="btn" href="#">Login</a>
</nav>

An approach that improves the ease with which you can combine other components with uilist is to use classes to style the child DOM elements. Although this helps to reduce the specificity of the rule, the main benefit is that it gives you the option to apply the structural styles to any type of child node.

.btn { /* styles */ }
.uilist { /* styles */ }
.uilist-item { /* styles */ }

<nav class="uilist">
  <a class="uilist-item" href="#">Home</a>
  <a class="uilist-item" href="#">About</a>
  <span class="uilist-item">
      <a class="btn" href="#">Login</a>
  </span>
</nav>

JavaScript-specific classes

Using some form of JavaScript-specific classes can help to reduce the risk that thematic or structural changes to components will break any JavaScript that is also applied. An approach that I’ve found helpful is to use certain classes only for JavaScript hooks – js-* – and not to hang any presentation off them.

<a href="/login" class="btn btn-primary js-login"></a>

This way, you can reduce the chance that changing the structure or theme of components will inadvertently affect any required JavaScript behaviour and complex functionality.

Component modifiers

Components often have variants with slightly different presentations from the base component, e.g., a different coloured background or border. There are two mains patterns used to create these component variants. I’m going to call them the “single-class” and “multi-class” patterns.

The “single-class” pattern

.btn, .btn-primary { /* button template styles */ }
.btn-primary { /* styles specific to save button */ }

<button class="btn">Default</button>
<button class="btn-primary">Login</button>

The “multi-class” pattern

.btn { /* button template styles */ }
.btn-primary { /* styles specific to primary button */ }

<button class="btn">Default</button>
<button class="btn btn-primary">Login</button>

If you use a pre-processor, you might use Sass’s @extend functionality to reduce some of the maintenance work involved in using the “single-class” pattern. However, even with the help of a pre-processor, my preference is to use the “multi-class” pattern and add modifier classes in the HTML.

I’ve found it to be a more scalable pattern. For example, take the base btn component and add a further 5 types of button and 3 additional sizes. Using a “multi-class” pattern you end up with 9 classes that can be mixed-and-matched. Using a “single-class” pattern you end up with 24 classes.

It is also easier to make contextual tweaks to a component, if absolutely necessary. You might want to make small adjustments to any btn that appears within another component.

/* "multi-class" adjustment */
.thing .btn { /* adjustments */ }

/* "single-class" adjustment */
.thing .btn,
.thing .btn-primary,
.thing .btn-danger,
.thing .btn-etc { /* adjustments */ }

A “multi-class” pattern means you only need a single intra-component selector to target any type of btn-styled element within the component. A “single-class” pattern would mean that you may have to account for any possible button type, and adjust the selector whenever a new button variant is created.

Structured class names

When creating components – and “themes” that build upon them – some classes are used as component boundaries, some are used as component modifiers, and others are used to associate a collection of DOM nodes into a larger abstract presentational component.

It’s hard to deduce the relationship between btn (component), btn-primary (modifier), btn-group (component), and btn-group-item (component sub-object) because the names don’t clearly surface the purpose of the class. There is no consistent pattern.

In early 2011, I started experimenting with naming patterns that help me to more quickly understand the presentational relationship between nodes in a DOM snippet, rather than trying to piece together the site’s architecture by switching back-and-forth between HTML, CSS, and JS files. The notation in the gist is primarily influenced by the BEM system’s approach to naming, but adapted into a form that I found easier to scan.

Since I first wrote this post, several other teams and frameworks have adopted this approach. MontageJS modified the notation into a different style, which I prefer and currently use in the SUIT framework:

/* Utility */
.u-utilityName {}

/* Component */
.ComponentName {}

/* Component modifier */
.ComponentName--modifierName {}

/* Component descendant */
.ComponentName-descendant {}

/* Component descendant modifier */
.ComponentName-descendant--modifierName {}

/* Component state (scoped to component) */
.ComponentName.is-stateOfComponent {}

This is merely a naming pattern that I’m finding helpful at the moment. It could take any form. But the benefit lies in removing the ambiguity of class names that rely only on (single) hyphens, or underscores, or camel case.

A note on raw file size and HTTP compression

Related to any discussion about modular/scalable CSS is a concern about file size and “bloat”. Nicole Sullivan’s talks often mention the file size savings (as well as maintenance improvements) that companies like Facebook experienced when adopting this kind of approach. Further to that, I thought I’d share my anecdotes about the effects of HTTP compression on pre-processor output and the extensive use of HTML classes.

When Twitter Bootstrap first came out, I rewrote the compiled CSS to better reflect how I would author it by hand and to compare the file sizes. After minifying both files, the hand-crafted CSS was about 10% smaller than the pre-processor output. But when both files were also gzipped, the pre-processor output was about 5% smaller than the hand-crafted CSS.

This highlights how important it is to compare the size of files after HTTP compression, because minified file sizes do not tell the whole story. It suggests that experienced CSS developers using pre-processors don’t need to be overly concerned about a certain degree of repetition in the compiled CSS because it can lend itself well to smaller file sizes after HTTP compression. The benefits of more maintainable “CSS” code via pre-processors should trump concerns about the aesthetics or size of the raw and minified output CSS.

In another experiment, I removed every class attribute from a 60KB HTML file pulled from a live site (already made up of many reusable components). Doing this reduced the file size to 25KB. When the original and stripped files were gzipped, their sizes were 7.6KB and 6KB respectively – a difference of 1.6KB. The actual file size consequences of liberal class use are rarely going to be worth stressing over.

How I learned to stop worrying…

The experience of many skilled developers, over many years, has led to a shift in how large-scale website and applications are developed. Despite this, for individuals weaned on an ideology where “semantic HTML” means using content-derived class names (and even then, only as a last resort), it usually requires you to work on a large application before you can become acutely aware of the impractical nature of that approach. You have to be prepared to disgard old ideas, look at alternatives, and even revisit ways that you may have previously dismissed.

Once you start writing non-trivial websites and applications that you and others must not only maintain but actively iterate upon, you quickly realise that despite your best efforts, your code starts to get harder and harder to maintain. It’s well worth taking the time to explore the work of some people who have proposed their own approaches to tackling these problems: Nicole’s blog and Object Oriented CSS project, Jonathan Snook’s Scalable Modular Architecture CSS, and the Block Element Modifier method that Yandex have developed.

When you choose to author HTML and CSS in a way that seeks to reduce the amount of time you spend writing and editing CSS, it involves accepting that you must instead spend more time changing HTML classes on elements if you want to change their styles. This turns out to be fairly practical, both for front-end and back-end developers – anyone can rearrange pre-built “lego blocks”; no one can perform CSS-alchemy.