Using HTML5 elements in WordPress post content

Here are two ways to include HTML5 elements in your WordPress post content without WordPress’ wpautop function wrapping them in p tags or littering your code with line breaks.

HTML5 has several new elements that you may want to use in your post content to markup document sections, headers, footers, pullquotes, figures, or groups of headings. One way to safely include these elements in your posts is simple; the other way is a bit more complicated. Both ways rely on hand-coding the HTML5 markup in the WordPress editor’s HTML view.

If you are adding HTML5 elements to your post content then you should use an HTML5 doctype.

Disable wpautop for your theme

This is the simple way. Disable the wpautop function so that WordPress makes no attempt to correct your markup and leaves you to hand-code every line of your posts. If you want total control over every line of your HTML then this is the option for you.

To disable wpautop entirely add these lines to your theme’s functions.php:

remove_filter('the_excerpt', 'wpautop');
remove_filter('the_content', 'wpautop');

However, wpautop is generally quite useful if most of your posts are simple text content and you only occasionally want to include HTML5 elements. Therefore, modifying wpautop to recognise HTML5 elements might be more practical.

Modify wpautop to recognise HTML5 elements

WordPress’ wpautop is part of the core functions and can be found in this file within your WordPress installation: wp-includes/formatting.php. It controls how and where paragraphs and line breaks are inserted in excerpts and post content.

In order to create a modified version of WordPress’ core wpautop function I started off by duplicating it in my theme’s functions.php file.

What I’ve experimented with is disabling wpautop and adding a modified copy of it – which includes HTML5 elements in its arrayss – to my theme’s functions.php file.

Add the following to your theme’s functions.php file and you’ll be able to use section, article, aside, header, footer, hgroup, figure, details, figcaption, and summary in your post content. (Probably best to try this in a testing environment first!)

/* -----------------------------
MODIFIED WPAUTOP - Allow HTML5 block elements in wordpress posts
----------------------------- */

function html5autop($pee, $br = 1) {
   if ( trim($pee) === '' )
      return '';
   $pee = $pee . "\n"; // just to make things a little easier, pad the end
   $pee = preg_replace('|<br />\s*<br />|', "\n\n", $pee);
   // Space things out a little
// *insertion* of section|article|aside|header|footer|hgroup|figure|details|figcaption|summary
   $allblocks = '(?:table|thead|tfoot|caption|col|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|select|form|map|area|blockquote|address|math|style|input|p|h[1-6]|hr|fieldset|legend|section|article|aside|header|footer|hgroup|figure|details|figcaption|summary)';
   $pee = preg_replace('!(<' . $allblocks . '[^>]*>)!', "\n$1", $pee);
   $pee = preg_replace('!(</' . $allblocks . '>)!', "$1\n\n", $pee);
   $pee = str_replace(array("\r\n", "\r"), "\n", $pee); // cross-platform newlines
   if ( strpos($pee, '<object') !== false ) {
      $pee = preg_replace('|\s*<param([^>]*)>\s*|', "<param$1>", $pee); // no pee inside object/embed
      $pee = preg_replace('|\s*</embed>\s*|', '</embed>', $pee);
   }
   $pee = preg_replace("/\n\n+/", "\n\n", $pee); // take care of duplicates
   // make paragraphs, including one at the end
   $pees = preg_split('/\n\s*\n/', $pee, -1, PREG_SPLIT_NO_EMPTY);
   $pee = '';
   foreach ( $pees as $tinkle )
      $pee .= '<p>' . trim($tinkle, "\n") . "</p>\n";
   $pee = preg_replace('|<p>\s*</p>|', '', $pee); // under certain strange conditions it could create a P of entirely whitespace
// *insertion* of section|article|aside
   $pee = preg_replace('!<p>([^<]+)</(div|address|form|section|article|aside)>!', "<p>$1</p></$2>", $pee);
   $pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee); // don't pee all over a tag
   $pee = preg_replace("|<p>(<li.+?)</p>|", "$1", $pee); // problem with nested lists
   $pee = preg_replace('|<p><blockquote([^>]*)>|i', "<blockquote$1><p>", $pee);
   $pee = str_replace('</blockquote></p>', '</p></blockquote>', $pee);
   $pee = preg_replace('!<p>\s*(</?' . $allblocks . '[^>]*>)!', "$1", $pee);
   $pee = preg_replace('!(</?' . $allblocks . '[^>]*>)\s*</p>!', "$1", $pee);
   if ($br) {
      $pee = preg_replace_callback('/<(script|style).*?<\/\\1>/s', create_function('$matches', 'return str_replace("\n", "<WPPreserveNewline />", $matches[0]);'), $pee);
      $pee = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee); // optionally make line breaks
      $pee = str_replace('<WPPreserveNewline />', "\n", $pee);
   }
   $pee = preg_replace('!(</?' . $allblocks . '[^>]*>)\s*<br />!', "$1", $pee);
// *insertion* of img|figcaption|summary
   $pee = preg_replace('!<br />(\s*</?(?:p|li|div|dl|dd|dt|th|pre|td|ul|ol|img|figcaption|summary)[^>]*>)!', '$1', $pee);
   if (strpos($pee, '<pre') !== false)
      $pee = preg_replace_callback('!(<pre[^>]*>)(.*?)</pre>!is', 'clean_pre', $pee );
   $pee = preg_replace( "|\n</p>$|", '</p>', $pee );

   return $pee;
}

// remove the original wpautop function
remove_filter('the_excerpt', 'wpautop');
remove_filter('the_content', 'wpautop');

// add our new html5autop function
add_filter('the_excerpt', 'html5autop');
add_filter('the_content', 'html5autop');

The results are not absolutely perfect but then neither is the original wpautop function. Certain ways of formatting the code will result in unwanted trailing </p> tags or a missing opening <p> tags.

For example, to insert a figure with caption into a post you should avoid adding the figcaption on a new line because an image or link appearing before the figcaption will end up with a trailing </p>.

<!-- this turns out ok -->
<figure>
<a href="#"><img src="image.jpg" alt="" /></a><figcaption>A figure caption for your reading pleasure</figcaption>
</figure>

<!-- this turns out not so ok -->
<figure>
<a href="#"><img src="image.jpg" alt="" /></a>
<figcaption>A figure caption for your reading pleasure</figcaption>
</figure>

Another example would be when beginning the contents of an aside with a paragraph. You’ll have to leave a blank line between the opening aside tag and the first paragraph.

<aside>

This content could be a pullquote or information that is tangentially related to the surrounding content. But to get it wrapped in a paragraph you have to leave those blank lines either side of it before the tags.

</aside>

Room for improvement

Obviously there are still a few issues with this because if you format your post content in certain ways then you can end up with invalid HTML, even if it doesn’t actually affect the rendering of the page. But it seems to be pretty close!

Leave a comment or email me if you are using this function and find there that are instances where it breaks down. I ran numerous tests and formatting variations to try and iron out as many problems as possible but it’s unlikely that I tried or spotted everything.

Hopefully someone with more PHP and WordPress experience will be able to improve upon what I’ve been experimenting with, or find a simpler and more elegant solution that retains the useful wpautop functionality while allowing for the use of HTML5 elements in posts. Please share anything you find!

10 comments

#

Frank says…

Nice work and i hope it is possible to use this function on my project WPBasis Theme for WordPress with xHTML5. I hope you enjoy this and i can use your function.

#

Frank says…

another comment: i think this works only fine on the html-editor, great for nerds. I think, the TinyMCE must also learn this elements and attributes. I have write a small post to add new elements to the TInyMCE.

#

Nicolas says…

@Frank Yeah it’s only for people using the HTML view of the editor. It’s not problem for you to use this function on your project. Let me know if you make any improvements to it.

Thanks for linking to your article about adding elements to TinyMCE, that looks pretty useful. I’ll see if I can combine the two sets of functions.

#

Dave Doolin says…

I’ve never understood the problem wpautop is supposed to solve.

#

Tobi says…

you jsut amde my day! i got stuck in trying myself with preg_match and such to get rid of the starting tags before my ..

again – thanks a million!

#

Frank says…

$pee = str_replace( array( '/>', ' checked="checked"', ' selected="selected"', ' >' ), array( '>', 'checked', 'selected', '>' ), $pee);

Maybe another strings for this function.
Best regards

#

Brent Lagerman says…

I always thought that wpautop should be off by default in the HTML side of the editor, not sure why they don’t do that… anyway something else that’s kind of helpful instead of turning off wpautop for the entire site you could just turn it off on a page by page basis.

If you want to deliver a site to a client who doesn’t know much HTML, you’d want they to have wpautop enabled, then when you go in and make pages you can just disable autop with a custom field.

You can find the plug-in here:
http://rolandog.com/archives/2008/11/12/wp-plugin-rm_wpautop/

#

Jacob says…

I’m having an issue with autop only in FireFox 3.6.x. Works fine in IE7. Basically, I have a figure and a figcaption. The figcaption + img are wrapped in an anchor tag. FireFox removes all of the figcaption contents (an h3, h4, p) and wraps each one in an anchor, which pretty much destroys the layout.

FireFox 3.6.x on Mac does this no matter what. Here’s a fiddle: http://jsfiddle.net/CsJr2/1/

Is there anyway to take account of the (now standardized) practice of wrapping multiple block level elements in an anchor?

#

Nicolas says…

@Jacob: The figcaption element must be the first or last child of the figure element. You cannot wrap it in an anchor is you want it to be valid HTML. This should also remedy the issue you are experiencing.

#

Ilyon says…

Thanks! Your will be useful for my new template!

Comments are now closed.