HTML Table Elements and SEO – Part 1
Let’s analyze the HTML table elements from a Search Engine Optimization perspective. According to W3 those elements are: <table>, <caption>, <thead>, <tfoot>, <tbody>, <colgroup>, <col>, <tr>, <th> and <td>.
Their purpose is to structure tabular data within your web documents. As mentioned in many of my previous articles, using tables is not exactly the best approach to build web pages, but even in 2011 there are lots of us using tables to display non tabular data instead of using div’s.
Before analyzing each of the elements, I would like to stress on something less known to many search engine optimization professionals: how search bots actually read information from web documents built with tables. It’s important to know this because, the way we lay out data with html tables, using <td>, <tr> or other elements, will affect how browsers render that document. Equally and maybe more importantly for search engine optimization is how search engine bots “read” web pages.
Humans read from top to bottom and left to right (western European languages). Browsers and bots are rendering and reading a html page from top to bottom, but not exactly from left to right. They will jump through <td>s and <tr>s depending on how you structure their way. In other words, what you see in a browser might be read differently by search engine bots.
Here’s an example of how search engine bots would crawl an html table structure:
- They will look for the first opening instance of the <table> tag
- Find the table head tag <thead>, then the first table row <tr>, then they will search for <th> tags which should contain tabular header data, and finally find the closing </thead>. Next they will aim for the <tbody> tag. The browser will automatically add (or imply) a tBody element if it is not present.
- Look for the <tr>, the table row tags of the tBody and read the data inside the table data tags, the <td>s
- If they find a closing </table> tag it means that the respective table has been closed.
- Repeat the above for the next tables within the page
To visualize the above process described above take a look at the source code for a tabular structure (an excerpt from this page):
Why is it important to understand how search engine bots read the source code? There is (or it used to be) a strong belief that search engines give more importance to the text closer to the top of the source code. With the classic table implementation your <tbody> content that matters to search engines may be buried way down the code, depending on how much code and “noise” content you have before it.
Do a simple experiment. Identify your most heaviest pages in terms both kilobytes (as rended by browsers) and heavy top and left navigation data (specific for ecommerce sites). Take a look at the cached version in Google using the cache: command.
In case you see your cached version of that page containing a lot of irrelevant content on the top of the page and your main content buried down (or not present at all due to excessive irrelevant content indexed on the top) you should consider some alternatives: a) using the “table trick” described below or, better b) migrate to tableless development with CSS and div’s.
Let’s take a look a possible situation where irrelevant content could bury important content, i.e. a website with lots of links on the top menu and tens or hundreds navigation links on the left navigation section of the page:
So, what’s the SEO “table trick”? Insert an empty <td> right before your body content and move the navigation data after the body content.
A variation of the “table trick” is the “css trick”, described here which puts the content that matters (main body content) to the top of the source code, while moving “noise” content (header, navigation, footer) at the bottom. The page will render normally for browsers and users, but search engine bots will find the content first, then the rest.
Long time back, there was another SEO best practice to keep your pages under 100kb because of search engine crawling and indexing limitations. Later on that limit was “increased” to 300kb. The “table trick” and “css trick” described above came to help those days, but I think you don’t need to worry too much nowadays, for at least 2 reasons:
- This Wikipedia page is almost 700kb of pure text and is cached in Google, so size doesn’t seem to matter anymore with strong websites (and deep linked)
- Bing has a Vision-based Page Segmentation Algorithm (aka VIPS) which is used to identify the portion of your pages that contain the “body content” and remove navigation, menus and footer from their ranking algorithms (aka boilerplate content/text). Yahoo! has one too. Google also.
While it is important to understand how search engine bots read and index pages, your real focus should not be on using these techniques, but rather write quality, targeted content within your main body section of the page and trying to become authority in the niche.
As a final advice for the first part of the article, if you still design using tables, I encourage you to use the <tbody> tag where you start the main body tables, to give search engines a clue where the content that matter is, just in case the VIPS algos are failing.
Pitstop Media offers ROI focused SEO services. If you need a SEO company to help you rank #1 please contact us for a free, no obligation quote. We’ve helped companies rank first on Google in short periods of time, for highly competitive terms.