SEO tweaks using mod_rewrite to prevent duplicate content being crawled

Posted on November 14, 2011 by Stephen

Over the past few years, with the rise in popularity of using no-www there have been some issues where content is getting crawled by a search engines twice, and in some cases getting a negative scoring due to duplicated content. I.E. Accessing a website using http://www.site.com and also http://site.com

More recently I’ve come across the fact that http://www.site.com/index.html and http://www.site.com/ (just ending in a trailing slash) are been indexed twice as separate pages, and also getting a negative score due to duplicated content.

I’m sure search engines are getting more intelligent and not negatively scoring websites for these oversights, however I said I’d use mod_rewrite for the Apache webserver to overcome these issues.

vi .htaccess
RewriteEngine On
RewriteCond %{HTTP_HOST} ^domain\.com$
RewriteRule (.*) http://www.domain.com/$1 [R=301,L]
RewriteRule ^$ http://www.domain.com/index.html [R=301,L]

In your apache vhost config, you may have to change “AllowOverride None” to “AllowOverride FileInfo” (more fine grained and safer than going AllowOverride All).

So the above mod_rewrite code rewrites http://domain.com to http://www.domain.com/index.html It also rewrites http://www.domain.com/ to http://www.domain.com/index.html
This was for a recent web design project for a simple static website. Your mileage may vary, so if you use the above, make sure to test thoroughly.

This entry was posted in IT, Web Design, Web Development and tagged CMSs, static, xhtml. Bookmark the permalink.