WYSIWYG Editors And Bad Markup
WYSIWYG editors are a key component of content management systems. They empower non-technical users to manage rich content efficiently and intuitively. Unfortunately, WYSIWYG editors are notorious for generating "bad" markup (or "dirty code"). In the longer-term, the problems that bad markup creates can outweigh the benefits that WYSIWYGs offer.
So what is bad markup and what can you do about it? This article gives examples of bad markup created by other WYSIWYG editors and explains how XStandard makes sure that business users generate "clean", standards-compliant markup every time.
Since Web developers and designers have differing opinions on what constitutes bad markup, it's more productive to define what "clean" markup is.
- Clean markup is standards-compliant. Whether you are using HTML 4 or the latest version of XHTML, markup tags must be used correctly - i.e. according to W3C specifications.
- Clean markup is also based on best practices. This involves using techniques that have successfully emerged after extensive professional use and that favor one approach over another.
Markup that is not based on standards and that does not follow best practices can therefore be considered "bad". The following are classic examples of "bad" markup practices, and how using XStandard avoids them.
Incorrect Use Of <blockquote>
Of all the "bad" markup generated by using tags incorrectly, the most commonly misused tag is <blockquote>
. Although block quote should only be used to contain text that is a quotation, nearly all WYSIWYG editors wrongly use <blockquote>
for indenting, as we see in the toolbar screenshot below.
There are two strong reasons for not using block quotes for indenting.
First, identing only justifies text on the left, whereas <blockquote>
will justify text on the left and the right. So <blockquote>
is simply the wrong markup for the job. Instead, indents should be rendered using CSS and the following is an example of how to do this:
CSS
p.indent {margin-left:40px}
Markup
<p class="indent">The quick brown fox jumped over the lazy dog.</p>
Second, block quotes are used to transmit semantic meaning, whereas "indent" has no semantic meaning at all. "Indent" says nothing about the data that is indented. It cannot for instance require indented text to be read by a male voice via a screen reader. By contrast, text surrounded by <blockquote>
can "tell" an application to read text in a male voice, or in any other way supported by the auditory user-agent.
How XStandard addresses <blockquote>
XStandard has different toolbar buttons for indenting and for block quotes. See the screenshot below:
By default, the indent icon in XStandard creates the following markup but can be easily customized to suit your specific CSS:
<p class="indent">
Incorrect Use Of <div>
To imitate the tighter line spacing between paragraphs that is typically found in word processors, many WYSIWYG editors use <div>
tags instead of <p>
. Here is an example:
The <div>
tag is semantically meaningless and should only be used for grouping, whereas the correct tag for marking paragraph spacing is <p>
. If a line break is needed without beginning a new paragraph, then the <br>
tag should be used, not <div>
. In most WYSIWYG editors, pressing Shift-Enter creates a <br>
tag. Spacing between paragraphs is formatting and tighter spacing should be done via CSS. For example: p {margin: .2em 0}
How XStandard Uses The <div>
Tag Correctly
XStandard uses <p>
for paragraph breaks and <br>
for line breaks. XStandard treats <div>
tags like a layer for grouping content.
Illegal Characters
In order to seamlessly copy & paste text from word processors, WYSIWYG editors accept characters that are in fact illegal for the encoding they support. The most common illegal characters are curly quotation marks (”), long dashes (—) and ellipses (…). If the markup generated by the WYSIWYG does not support Unicode, then special characters should be represented as entities or decimal values.
How XStandard Deals With Illegal Characters
XStandard's native character encoding is Unicode so it can use special characters without escaping them. When interacting with content management systems that do not support Unicode, XStandard can convert Unicode (and special characters) to their decimal values.
Bloated Markup
WYSIWYG editors are notorious for generating bloated markup, and the tag that generates most bloated markup is the <font>
tag. Whether the editor inserts the <font>
tag itself, or has a color-picker or font-selector that lets users do it manually, the end result is bloated markup. For example:
Using CSS is far more efficient as we can see in the example below:
CSS:
table {font-family:arial;font-size:1em;color: #000000}
Markup:
How XStandard Creates Lean Markup Every Time
XStandard generates lean code. Formatting is done exclusively through external or embedded CSS, so tags responsible for bloated code (<font>
and style
attributes) are never used.
Mixing Formatting Models
Combining external or embedded CSS with inline CSS, <font>
tags and formatting elements is bad because it results in "spaghetti code", meaning the intent of the markup is not evident from the way it looks. The screenshot below shows one example of this:
How XStandard Avoids Mixing Formatting Models
As recommended by the latest XHTML specification, XStandard uses only external or embedded CSS for formatting. So deprecated or outdated constructs like <font>
and the style
attributes are never used.
Incorrect Use Of Alt (Alternate) Text
Images enhance the visual experience of those who are sighted, but for those with disabilities that limit vision, for users of small screen devices with limited display areas, or for search engine applications, alternate text becomes an important replacement for images. Alt text is therefore a crucial aspect of "clean" markup, yet most WYSIWYG editors do not encourage the use of alt text at all.
Some WYSIWYG editors that support file upload often insert the image file name as the alt text, but this results in meaningless alt text such as the one seen below:
<img src="images/x123001.gif" alt="x123001.gif" />
Many WYSIWYG editors also make the mistake of considering alt text and "tooltip" to be interchangeable, which they are not. Tooltip is placed inside the title
attribute while alt text is placed inside the alt
attribute. For example:
<img src="tv.gif" alt="Wide-screen television." title="On Sale Now!" />
WYSIWYG editors also rarely distinguish between images that are decorative versus images that are informative, leading to distortions in the meaning of content. Informative images transmit semantic meaning to devices such as accessibility screen readers and so require alt text. By contrast, decorative images (such as spacers, bullets, borders, etc.) are merely "eye-candy", convey no semantic meaning at all and should not use alt text. To make decorative images invisible to non-visual devices, the setting should be alt=""
.
The example below shows markup where alt text is used incorrectly for decorative images. Listen to the sound file to hear the confusion this creates when the markup is processed by an auditory user-agent such as a screen reader:
Listen.
How XStandard Uses Alt (Alternate) Text Correctly
When users upload images into XStandard, they are prompted to identify the image as decorative or informative. If the user identifies the image as decorative, an empty alt
attribute is automatically created and the title
and longdesc
attributes are removed. If the image is identified as informative, the alternate text becomes required. To make sure the alt text is not confused with the tooltip, XStandard has separate fields for "Alternate Text" and "Description" (tooltip) as shown in the screenshot below.
Proprietary Tags
Business users love to copy content from Microsoft Word then paste it into WYSIWYG editors. Unfortunately, when this happens, most WYSIWYG editors retain proprietary MS Office tags, creating meaningless and non-validating code. The illustration below shows examples of proprietary markup that cannot be understood outside of Word:
MS Office markup can also reference proprietary inline CSS such as seen below:
style="mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA; mso-bidi-font-family: 'Times New Roman'; mso-highlight: yellow"
MS Office markup also references proprietary CSS class names such as:
class=MsoNormal
How XStandard Neutralizes Proprietary Tags
When content is copied from Word and pasted into XStandard, the editor strips out all proprietary tags and inline formatting so that only important structural elements survive such as tables, lists, headings, images, hyperlinks and semantic tags like <strong>
, <abbr>
, <code>
, <cite>
, <kbd>
, etc. Formatting is easily and more effectively replaced using XStandard's "styles" menu that references CSS.
Empty Tags
WYSIWYG editors tend to generate empty tags. This often occurs when formatting has been applied to text and the text is later deleted. The following is an example of empty tags.
How XStandard Avoids Empty Tags
XStandard removes inline tags that are meaningless because they are empty of content. So you never get markup that looks like this:
... <span></span> ...
Using Formatting To Convey Meaning
Colors and fonts add no meaning (no semantics) to data. A font is a font and no more. The color red only says "red". Regardless, the practice of WYSIWYG vendors has been to encourage the use of color and font selectors to assign or suggest importance to data. This is a futile exercise since no information about the data is actually transmitted, as we see from the meaningless markup generated by old-fashioned formatting tools below:
How XStandard Uses Meaningful Markup To Convey Meaning
XStandard has no color-pickers or font-selectors since these tools create semantically barren markup. Instead, XStandard's easy-to-use "styles" menu generates the type of meaningful markup seen in the illustration below. User-friendly style names apply semantic markup and at the same time reference CSS that offer limitless formatting options in a single mouse click. What better way to ensure a consistent look-and-feel to content?
Incorrect Use Of Tables
Most WYSIWYG editors use tables incorrectly, whether for layout or for tabular data. Below is an example of a data table, where the data in the table can only be understood in relation to column and/ or row headers.
Cups of coffee consumed by each personName | Cups | Type | Sugar |
---|
Wendy | 10 | Regular | yes |
Jim | 15 | Decaf | no |
If the markup behind this table does not associate each cell with the appropriate header, the cells will be processed like <div>
tags by non-visual devices. Listen to how an auditory user-agent "reads" the table when the markup is incorrect. Now listen to the same table using correct markup.
How XStandard Uses Tables Correctly
When users of XStandard create tables, they can explicitly select the type of table required (data table or layout table), as shown in the screenshot below:
In XStandard, layout tables use only <table>
, <tr>
and <td>
tags. Data tables use <table>
, <caption>
, <thead>
, <tbody>
, <tr>
, <th>
and <td>
tags, and the following attributes <th id="a">
and <td headers="a b">
. Below is a screenshot of a data table and the correct markup created by XStandard.