Guest Post: Duplicate Content and How to Avoid It

Friday, August 24, 2012

Guest Post: Duplicate Content and How to Avoid It

It is very easy to inadvertently create duplicate content on a website. The definition of duplication in this context is where the same content appears under multiple URLs. Duplicate content can potentially decrease traffic to a web page because the search engines will be unsure of which page is the most relevant for a particular query. Below are some common issues and solutions:

Canonicalization

This is

the most common form of duplicate content problem and is caused when the web pages can be accessed using various URLs e.g.

www.mywebsite.com

http://mywebsite.com

www.mywebsite.com/index.html

Although all of these URLs access the same web page, the search engines view them as separate pages and confusion will arise.

Solution

This can be resolved by using a server-side redirect to ensure that only one page URL is served; how to carry this out will depend on the specific server set up. Alternatively, choosing the preferred URL can be done via Google Webmaster Tools. Specific instructions for the latter option can be found

here: https://support.google.com/webmasters/bin/answer.py?hl=en&answer=44231

Printable Pages

Websites which have information pages such as News sites and RSS feeds often have a 'Print' feature which loads the page content with a format which is stripped of CSS styling and images more suitable for printing. However, the printable format creates a new URL by adding /print and is seen as duplicate content.

Solution

There is a very simple way to overcome this type of duplication and that is to use a rel=canonical tag on the \print page pointing towards the original page.

Blog Pages

The problem with blogs is not the pages themselves; it is the tags that are associated with them or the categories that they are placed under. Different blog articles will often contain the same tag words and will inevitably be posted under a category more than once.

Solution

The solution to this particular problem is again fairly simple. Most bloggers use a 'format' which features only a few different categories but numerous tags, in which case using "no index, no follow" on all of the 'tag' pages will resolve the problem. For those who do have a larger number of categories than tags, add the code to the 'category' pages.

Relative Linking

Many webmasters use relative internal linking as the faster, easier option, but when used in conjunction with HTTPS pages and sub domains especially, it can result in occurrences of duplicate pages.

Solution

Some issues can be overcome by simply using a 'self-referential' rel=canonical tag for each page that could be affected in this way. However, there is no guarantee that this will be successful for every searchengine.

The only other way is to make all internal links 'absolute' which means using the full URL in the link rather than serving the information from an internal source. Each absolute URL is unique and points directly to the file required.

Summary

There are other issues that can cause duplicated content to arise quite innocently, but the ones mentioned above are by far the most commonplace.

Most are easily resolved with a little bit of time and application, others can easily be avoided by a little more thought going in to the design and coding of any new web pages produced.

Pin It