Does Your Website Have Duplicate Content Issues?

duplicationThe Duplicate Content Monster

Just like most Western political systems, Google works on a ‘first past the post’ system when you’re talking about content. This means fresh, unique content is key to success.

‘That’s great!’ I hear you saying, ‘because I write my own product descriptions and all the content in my website is unique’. But is it?

If you run an eCommerce website then it’s a fair bet you have a duplicate content issue, but there are many other ways your website’s structure can be doing your bottom line harm.  Read on for some ways in which your website content isn’t as unique as you thought it was.

eCommerce and Parameters

Most eCommerce software will use parameter strings of some kind to control the content which appears on a page.  Whilst URL’s can be re-written dynamically to get rid of parameter strings on the website, behind the scenes you will find dozens if not hundreds of duplicate content pages for each of the pages in your website. Google can happily step through and index all these pages, creating a massive duplicate content issue.

Google’s own Search Console is a great tool for showing where duplicate content issues due to parameters. Using the ‘URL Parameters’ section of Search Console, website owners can quickly identify where issues lie relating to parameter-led duplication problems and the great thing is that this tool allows you to directly tell Google’s search spiders to discount the duplicated pages. You’ll have to use the ‘old’ search console for this, unfortunately the new search console has yet to have this function included.

Go to https://www.google.com/webmasters/tools/crawl-url-parameters if you have an account or sign up for a Google Search Console account if you don’t and check, it’s a real eye opener.

Is Your Site Secure?

Google has been very keen to push adoption of the https secure protocol for all pages on websites. The reason for this is that websites which don’t use https across all pages can potentially let attackers into that website.  It isn’t just eCommerce checkout pages which are a security issue, even something as innocuous as a contact form can be used to gain access to a website or server!

Because of the rise of use of https, this has caused a rise in duplication between http and https.

Google treats both the http and https version of your website as separate entities, therefore if you have a website which can be reached by both http and https you in effect have 100% duplication across the website. Not good! Make sure that if you’re on https, page level 301 redirection is in place to this from the http version of your website.

Are You WWW?

Whilst you’re checking your http to https redirects, make sure that you don’t have both https:// and https://www indexable in search results. The ‘www’ bit of the URL string is actually referring to a sub-folder on your website.

Google again treats https:// and https://www as two separate entities and if you have both indexable then Google will be seeing 100% duplicate content. Make sure that non-www to www redirects are set up and as a ‘belt and braces’ approach, add rel=”canonical” to all pages pointing towards the https://www URL’s so that Google knows which ones to index.

Getting a New Website?

Do you have a current website but are getting a new website developed? When you get a new website designed it is always good practise to block off all search engine spiders from being able to gain access to the new website until it goes live.

There are a few reasons for this, however from a search engine perspective you really don’t want search engine spiders crawling all over the content in your new website in its development folder if it has the same content as on your live website.  This is again is 100% duplication which will penalise both your current website and your development site.

Block off search engine spiders either using server level security, .htaccess (username and password for access) or robots.txt (with a noindex tag) so that search engine spiders ignore the website. It’s a good idea to also include rel=”canonical” tags to all of the pages pointing towards the actual domain the new site will sit on as well.

Is Somebody Stealing Your Content?

Your web content is valuable and spammers know it! You may find that scraper websites will take your website and display it as their own in order to make money, which isn’t great news for your own website as this can land you with duplicate content penalties. It’s again a good idea to use rel=”canonical” tags on all pages in your website, this way if someone does ‘borrow’ all of your website text Google won’t be in any doubt where that content came from.

You should also think about using absolute URL’s (for instance https://www.site.com/page rather than /page) if your website can support this. Absolute URL’s embedded in web pages are a clear indication where the original content came from, again stopping abuse from people who want to clone your content.

David Fairhurst

Head of eCommerce

Intelligent Retail

David has been involved with Search Engine Optimisation and web development since 1999 and has spoken at many different retail and SEO conferences including Spring Fair and SES London

Leave a Reply