SEO tweaks using mod_rewrite to prevent duplicate content being crawled

Over the past few years, with the rise in popularity of using no-www there have been some issues where content is getting crawled by a search engines twice, and in some cases getting a negative scoring due to duplicated content. I.E. Accessing a website using http://www.site.com and also http://site.com

More recently I’ve come across the fact that http://www.site.com/index.html and http://www.site.com/ (just ending in a trailing slash) are been indexed twice as separate pages, and also getting a negative score due to duplicated content.

I’m sure search engines are getting more intelligent and not negatively scoring websites for these oversights, however I said I’d use mod_rewrite for the Apache webserver to overcome these issues.

vi .htaccess
RewriteEngine On
RewriteCond %{HTTP_HOST} ^domain\.com$
RewriteRule (.*) http://www.domain.com/$1 [R=301,L]
RewriteRule ^$ http://www.domain.com/index.html [R=301,L]

In your apache vhost config, you may have to change “AllowOverride None” to “AllowOverride FileInfo” (more fine grained and safer than going AllowOverride All).

So the above mod_rewrite code rewrites http://domain.com to http://www.domain.com/index.html It also rewrites http://www.domain.com/ to http://www.domain.com/index.html
This was for a recent web design project for a simple static website. Your mileage may vary, so if you use the above, make sure to test thoroughly.

 

This entry was posted in IT, Web Design, Web Development and tagged , , . Bookmark the permalink.