Wordpress Blog Search Engine Optimization

May 26th, 2010

I have recently noticed that a lot of the relevant search queries that reference this site find things like the categories and archives instead of the specific posts that contains the relevant content.

It makes the search results look dirty and disorganized and means that there is duplicate content in different pages, which is what is confusing the search engines.

Doing a bit of digging found a few ways to help direct the search engines to index the content I wish they would, instead of what they choose.

robots.txt
One quick way of preventing Google and other search engines from indexing a site is by adding a robots.txt file in the root directory of the site. This file contains instructions for “well behaved” search engine crawlers.

The first section is defined for all robots agents and blocks access to private Wordpress directories as well as virtual paths that we don’t want indexed, such as the RSS feeds and categories.

The second section allows the Mediapartners-Google robot full access to the site. This is the robot used by adsense, so that any page serving ads will get indexed for keyword context matching. Without this, adsense will not be able to review the contents of the page to help match ads.

The last line “Sitemap:” identifies the sitemap built by the XML-Sitemap plugin.

<meta>
The <meta> keyword in the head of a page can be used to help robots determine, dynamically, what to do with a page. I use this, rather than the robots.txt for the archives since the format of the archive page names is somewhat dynamic (if I were to change it, then I would have to update the robots.txt)

Instead of just blocking archives, I chose to block anything that is not a page, a post, or the homepage with the following code in my header.php of my theme.

Insert the following in the <head> tag in header.php.

If the blog page is NOT a single, or a page, or the homepage OR it is a paged file, then block it from being indexed and archived by search engines, but allow them to follow the links to other pages.

I chose to block the is_paged() (things like the previous pages from the homepage like /page/2) pages this way instead of through the robots.txt so that they would get “followed”. Anything excluded in robots.txt is, in theory, never loaded by a search engine robot, so they cannot follow any links in that page. I’m not sure this is strictly necessary, since those links should all be available by following the links through the posts.

This <meta> tag will also block category pages, so the /category exclusion in the robots.txt is not strictly necessary.

I’m not clear on exactly how the adsense robot treats the <meta> tag, but it seems like they might be blocked too. We will have to see how this plays out.

Title Sorting

Another feature I discovered, and like, is to reverse the title of the pages. By default, my theme was making hierarchical names starting with “Notions” on the left, and the post name on the right. I switched them so the article was on the left, and the blog name is on the right. It makes the most significant thing, the page subject, the first thing you read

So now this entry is titled

“Wordpress Blog Search Engine Optimization « Notions”

Instead of

“Notions » Wordpress Blog Search Engine Optimization”

The following was inserted into the <head> tag in header.php and replaces any other reference to <title>

To block categories or not to block categories?
The reason to block real pages such as the feeds and categories is to prevent duplicate content from being indexed. Category and Archive pages contain copies of original posts, which they should do, but it confuses search engines as they see the same content on your site. By blocking the extra copies it makes it more obvious to the search engine where the real content is, what to index, and will send users to the real pages (with comments, etc.) rather than an archive page.

I debated for a while whether I should block the category pages, since they do provide a service for users searching for things related to those categories, and group things all together. In the end I decided that it was still not worth the confusion of the extra pages. An alternative would be to block regular posts, and only allow the categories to be index, as they may provide more keywords to search and contain more relevant content.

Bookmark and Share

Updated Border Wait Times Google Gadget

May 5th, 2010

I’ve upgraded the Border Wait Times Google Gadget so that it now supports both Canada and US bound directions.

It is now using a Google App Engine server to collect and combine the data into a more useful format for the Gadget. The client gadget code is now much simpler and faster as is the server it is using, which can probably respond faster under high load than the data sources.

Get it here

Bookmark and Share

Google Calendar Wordpress Widget 1.2

February 17th, 2010

Version 1.2 of the Google Calendar Widget Plugin now adds an option to allow you to expand the event entries automatically, without clicking on them.

More info here

Bookmark and Share

Google Calendar Wordpress Widget

February 8th, 2010

I have created a Wordpress plugin that installs a widget for showing an agenda view from one or more Google Calendars in the sidebar. Version 1.1 is now live.

Read more about it and download it from here

Bookmark and Share

Blogger Image Import Update

November 10th, 2009

The Blogger Image Import Wordpress plugin that I wrote some time ago is getting a bit old and has some issues with the newer blogger structure.

People have continued to use this plugin, and some have even made suggestions on how to fix it in the comments of that page.

I have collected some of that information here if anyone is still interested in using this plugin. I do not have time to update the plugin, but I am happy to include fixes that others have made to it.

There are two main issues that I know of now.

  1. The domain on which the images are hosted is not constant. It has changed over time from blogspot.com to blogger.com
  2. The links to the full size images are actually links to html pages that contain the image

Some commenters have pointed out some solutions. Those are

  1. Run the script once with the image server domain set to blogspot.com and again with it set to blogger.com. Ultimately, it would be best if the script were updated to support both, or possibly any other domain as there is no reason to restrict it to just those two, but then it might import images that were intended to be linked.
  2. If the name of the image contains “s1600-h” remove the “-h” and convert it to “s1600″ when importing. It seems that Blogger is currently storing the images at those locations.

Here is a link to a version of the script that was sent to me by JT and described on the comment page

blogger-image-import.zip

I have not tested it, so use it at your own risk, but it looks good and shows the changes that were made to the original. Look for “MODIFIED:” comments in the script for a description of the changes.

I welcome any updates that people make to improve the script. If you find any issues, or fix the existing issues better, please let me know and send a copy of your fixes so we can share them with others.

Bookmark and Share

Notions is Digg proof thanks to caching by WP Super Cache