Using HTML5 elements in WordPress post content

Here are two ways to include HTML5 elements in your WordPress post content without WordPress’ wpautop function wrapping them in p tags or littering your code with line breaks.

HTML5 has several new elements that you may want to use in your post content to markup document sections, headers, footers, pullquotes, figures, or groups of headings. One way to safely include these elements in your posts is simple; the other way is a bit more complicated. Both ways rely on hand-coding the HTML5 markup in the WordPress editor’s HTML view.

If you are adding HTML5 elements to your post content then you should use an HTML5 doctype.

Disable wpautop for your theme

This is the simple way. Disable the wpautop function so that WordPress makes no attempt to correct your markup and leaves you to hand-code every line of your posts. If you want total control over every line of your HTML then this is the option for you.

To disable wpautop entirely add these lines to your theme’s functions.php:

remove_filter('the_excerpt', 'wpautop');
remove_filter('the_content', 'wpautop');

However, wpautop is generally quite useful if most of your posts are simple text content and you only occasionally want to include HTML5 elements. Therefore, modifying wpautop to recognise HTML5 elements might be more practical.

Modify wpautop to recognise HTML5 elements

WordPress’ wpautop is part of the core functions and can be found in this file within your WordPress installation: wp-includes/formatting.php. It controls how and where paragraphs and line breaks are inserted in excerpts and post content.

In order to create a modified version of WordPress’ core wpautop function I started off by duplicating it in my theme’s functions.php file.

What I’ve experimented with is disabling wpautop and adding a modified copy of it – which includes HTML5 elements in its arrayss – to my theme’s functions.php file.

Add the following to your theme’s functions.php file and you’ll be able to use section, article, aside, header, footer, hgroup, figure, details, figcaption, and summary in your post content. (Probably best to try this in a testing environment first!)

/* -----------------------------
MODIFIED WPAUTOP - Allow HTML5 block elements in wordpress posts
----------------------------- */

function html5autop($pee, $br = 1) {
   if ( trim($pee) === '' )
      return '';
   $pee = $pee . "\n"; // just to make things a little easier, pad the end
   $pee = preg_replace('|<br />\s*<br />|', "\n\n", $pee);
   // Space things out a little
    // *insertion* of section|article|aside|header|footer|hgroup|figure|details|figcaption|summary
   $allblocks = '(?:table|thead|tfoot|caption|col|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|select|form|map|area|blockquote|address|math|style|input|p|h[1-6]|hr|fieldset|legend|section|article|aside|header|footer|hgroup|figure|details|figcaption|summary)';
   $pee = preg_replace('!(<' . $allblocks . '[^>]*>)!', "\n$1", $pee);
   $pee = preg_replace('!(</' . $allblocks . '>)!', "$1\n\n", $pee);
   $pee = str_replace(array("\r\n", "\r"), "\n", $pee); // cross-platform newlines
   if ( strpos($pee, '<object') !== false ) {
      $pee = preg_replace('|\s*<param([^>]*)>\s*|', "<param$1>", $pee); // no pee inside object/embed
      $pee = preg_replace('|\s*</embed>\s*|', '</embed>', $pee);
   }
   $pee = preg_replace("/\n\n+/", "\n\n", $pee); // take care of duplicates
// make paragraphs, including one at the end
   $pees = preg_split('/\n\s*\n/', $pee, -1, PREG_SPLIT_NO_EMPTY);
   $pee = '';
   foreach ( $pees as $tinkle )
      $pee .= '<p>' . trim($tinkle, "\n") . "</p>\n";
   $pee = preg_replace('|<p>\s*</p>|', '', $pee); // under certain strange conditions it could create a P of entirely whitespace
// *insertion* of section|article|aside
   $pee = preg_replace('!<p>([^<]+)</(div|address|form|section|article|aside)>!', "<p>$1</p></$2>", $pee);
   $pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee); // don't pee all over a tag
   $pee = preg_replace("|<p>(<li.+?)</p>|", "$1", $pee); // problem with nested lists
   $pee = preg_replace('|<p><blockquote([^>]*)>|i', "<blockquote$1><p>", $pee);
   $pee = str_replace('</blockquote></p>', '</p></blockquote>', $pee);
   $pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)!', "$1", $pee);
   $pee = preg_replace('!(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee);
   if ($br) {
      $pee = preg_replace_callback('/<(script|style).*?<\/\\1>/s', create_function('$matches', 'return str_replace("\n", "<WPPreserveNewline />", $matches[0]);'), $pee);
      $pee = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee); // optionally make line breaks
      $pee = str_replace('<WPPreserveNewline />', "\n", $pee);
   }
   $pee = preg_replace('!(</?' . $allblocks . '[^>]*>)\s*<br />!', "$1", $pee);
// *insertion* of img|figcaption|summary
   $pee = preg_replace('!<br />(\s*</?(?:p|li|div|dl|dd|dt|th|pre|td|ul|ol|img|figcaption|summary)[^>]*>)!', '$1', $pee);
   if (strpos($pee, '<pre') !== false)
      $pee = preg_replace_callback('!(<pre[^>]*>)(.*?)</pre>!is', 'clean_pre', $pee );
   $pee = preg_replace( "|\n</p>$|", '</p>', $pee );

   return $pee;
}

// remove the original wpautop function
remove_filter('the_excerpt', 'wpautop');
remove_filter('the_content', 'wpautop');

// add our new html5autop function
add_filter('the_excerpt', 'html5autop');
add_filter('the_content', 'html5autop');

The results are not absolutely perfect but then neither is the original wpautop function. Certain ways of formatting the code will result in unwanted trailing </p> tags or a missing opening <p> tags.

For example, to insert a figure with caption into a post you should avoid adding the figcaption on a new line because an image or link appearing before the figcaption will end up with a trailing </p>.

<!-- this turns out ok -->
<figure>
  <a href="#"><img src="image.jpg" alt="" /></a><figcaption>A figure caption for your reading pleasure</figcaption>
</figure>

<!-- this turns out not so ok -->
<figure>
  <a href="#"><img src="image.jpg" alt="" /></a>
  <figcaption>A figure caption for your reading pleasure</figcaption>
</figure>

Another example would be when beginning the contents of an aside with a paragraph. You’ll have to leave a blank line between the opening aside tag and the first paragraph.

<aside>

This content could be a pullquote or information that is tangentially related to the surrounding content. But to get it wrapped in a paragraph you have to leave those blank lines either side of it before the tags.

</aside>

Room for improvement

Obviously there are still a few issues with this because if you format your post content in certain ways then you can end up with invalid HTML, even if it doesn’t actually affect the rendering of the page. But it seems to be pretty close!

Leave a comment or email me if you are using this function and find there that are instances where it breaks down. I ran numerous tests and formatting variations to try and iron out as many problems as possible but it’s unlikely that I tried or spotted everything.

Hopefully someone with more PHP and WordPress experience will be able to improve upon what I’ve been experimenting with, or find a simpler and more elegant solution that retains the useful wpautop functionality while allowing for the use of HTML5 elements in posts. Please share anything you find!