What is duplicate content, exactly?

DSM Digital School of Marketing - duplicate content

The term ‘duplicate content’ refers to content which appears on the Internet in multiple places. ‘These places’ are defined as a location that has a unique website address (URL). In other words, if the same content is displayed at more than one web address, this is said to be duplicate content.

While not strictly a penalty, duplicate content can nonetheless impact search engine rankings. As soon as there are several pieces of, as Google terms it “appreciably similar” content, in more than one place on the Internet, it can be challenging for search engines in order to decide the version which is more applicable to a given search query.

Put another way, the issue of duplicate content is best understood from the perspective of a search engine trying to provide the best possible experience to its users. If you have a number of pages with similar content on your site, you waste the Google bot’s time. Rather than wasting crawling bandwidth on duplicate content, you want all of your unique landing pages and valuable content indexed and shown in search results.

Duplicate content also dilutes the value of your link. If someone links to your page via a URL with tracking or sorting parameters, these links are actually pointing to different URLs. This causes a dilution in link equity and prevents the correct page from ranking.

The types of duplicate content

There are two types of duplicate content which are found on the Internet these days. These are the following:

Internal duplicate content

Internal duplicate content occurs when one domain creates duplicate content through multiple internal URLs on the same website.

A lot of the time, online shops have to manage duplicate content. Here is a case that occurs very commonly. The product detail pages can also be accessed without the necessity of having the related category or product page in the address of the web page:

It is a common occurrence that those pages would then be indexed by search engines and both URLs are linked to externally. Another reason for internal duplicate content occurring can be an inconsistent internal link strategy.

External duplicate content

Another name for ‘external duplicate content’ is ‘cross-domain duplicates’. This occurs when two or more domains, which are different, have the same page copy that is indexed by the search engines.

Many websites can be accessed by using a number of domain names. There is nothing that is incorrect with this practice if all other domain versions are redirected to the corresponding main domain by using a 301-redirect.

If this does not happen, Google is see that there are different domains which all have the same content. This makes it difficult for Google-Bot to determine the relevancy of each individual page, which can lead to ranking challenges for the website.

Examples of duplicate content

Quoting from another source is fine. If this source is another web page, link back to that site in order for the website owner to get additional value from you quoting them. The more backlinks that a website has, the better it is for their SEO.

Replicating chunks of text verbatim, without the addition of any other value,  won’t benefit your users. Rather steer clear of this practice as it will harm your SEO and will turn your readers away from your site.

Here are some examples of duplicates which are seen often:

  • Specifications in product descriptions.
  • Products appearing in more than one category on your e-commerce website.
  • Pages of content which appear in two places on your website (for businesses or consumers, for example).
  • Using another website’s content openly (press releases or feeds for local events, for example).

Why is duplicate content bad for SEO?

There are a number several reasons why duplicate content is harmful to the SEO health of your website.

Duplicate content can present three main challenges for search engines:

  • Search engines – such as Google and Bing – don’t know which version(s) to include or exclude from their indexes.
  • They don’t know if they need to direct the link metrics (trust, authority, anchor text, link equity, etc.) to one page or, alternatively, keep it separated between multiple versions.
  • Search engines don’t know which version(s) they need to rank for search results.

When duplicate content is present, website owners can be subjected to ranking and traffic losses. These two often stem from one of the following main problems:

  • To offer users the best search experience possible, search engines will not often show multiple versions of the same content. This means that they are forced to select which version is most likely to deliver the best result. This minimises the visibility of each of the duplicates.
  • Link equity can be further weakened as other sites have to also make a choice between the duplicates. As opposed to all inbound links pointing to one piece of content, these link to multiple pieces. This spreads the link equity among the duplicates. Owing to the fact that inbound links are a ranking factor, this can then impact the search visibility of a piece of content.

As people use Google, these days, as a means to find what they are looking for, it’s vital that your website gets found on the Internet. This means that you must do everything in your power to make sure that nothing on your website harms this.

Solving Duplicate Content Issues

Fortunately, there are several ways to cure duplicate content issues. These include:

301 Redirect

A simple way to prevent duplicate content from being indexed is a 301 redirect. Users and search engines are redirected from the duplicate content to the original. As a result, all of the link juice is sent to the original page.

A 301 redirect is applied on Apache servers by adding rules to the server’s .htaccess file. Bear in mind that this method deletes duplicate copy, so if you would prefer to keep duplicate pages use the next method.


The rel=”canonical” tag is another way to tell search engines about duplicate content. This code is inserted in the <head> of a web page. If Page B is a duplicate of Page A and we want to inform Google of this, we put the following code in the mark-up of Page B:

<link href=”http://website.com/Page-A” rel=”canonical” />

This code states that the current page is actually a copy of Page A’s URL. Once implemented, link juice will be transferred to the original page. This improve the ranking power of that page. Unlike the 301 redirect, duplicate pages in this case will still be accessible.

Meta Robots Tag

A simple solution to the problem of duplicate content is the meta robots tag. By adding a meta robots tag with a “noindex” parameter, you can prevent duplicate pages from being indexed.

Duplicate content persists in popping up on every website. This doesn’t have to be terrible news. Fix what you are able to. Don’t attempt to turn duplicate content, copied content and thin content into a viable SEO strategy.

Get in touch with the Digital School of Marketing

Want to discover additional information about SEO? If you do then you should consider doing our SEO and Web Analytics Course. Follow this link for more information.

DSM Digital School of Marketing Paid advertising and web analyics course registration