Boosting SEO: How Do I Find Duplicate Content—and Fix It?

With new websites popping up every day, all over the globe, competition for ranking high in search engine results pages (SERPs) has increased dramatically. Over the last decade alone, the landscape has shifted to mix in multi-billion-dollar players (like Facebook, Twitter, YouTube, Amazon, and Alibaba) with individuals and small business owners. 

With everyone fighting for the same featured Google snippets and top ranks, how can you compete? To boost your SEO (search engine optimization) and strengthen your website discoverability, use Google Search Console to help identify and fix duplicate content. This is a key factor in building your site’s success and increasing your revenue.

What Is Duplicate Content?

Duplicate content is defined as the same or highly similar content that appears on more or one URL. This can be an editorial that’s an exact copy, or a close version, of content that’s previously been published online. Duplicate content may mirror that of your own website—or maybe a copy of content found on other sites, owned by different entities.

Some websites use tools to paraphrase previously written content—and then paste those snippets onto their own sites. This may seem like an easy way to create new content, but duplicate content adds no value to your website. And Google has very strict policies regarding plagiarized content. So much so that even pages with no body content, and supporting meta information, are considered duplicate content—and are ignored by search engine web crawlers.

Can Publishing Duplicate Content Hurt My Business?

Search engines can be confused by duplicate content that’s detected by web crawlers. This confusion leads to an inability to properly index your site’s version of the duplicate content. They also don’t know how to weigh your site or which version of the content to rank higher.

When web crawlers find duplicate content, they often rank it all the same—and lower in SERP than other singular-sourced content. This lessens your site’s impact and diminishes your SEO, making you even more difficult for potential clients to find online.

How Do I Avoid Duplicate Content?

Being careful and conscientious is the simplest way to avoid publishing duplicate content. Common sense dictates that you simply not copy other, previously published content. But, this can be easier said than done. 

Sometimes, without even consciously doing it, you may quote a little too closely researched content on your website. To help avoid this, you can run your newly written content through a host of free or paid online plagiarism detection tools. The tools can flag content that’s close to the similarly published text, and help you correct the issue before you even publish it.

How Do I Find and Fix Duplicate Content?

But what about content that’s already on your website? There are a variety of ways you can find duplicate content across your published pages. 

Let’s take a look at three of the most popular and effective ways to identify duplicate content across all your website’s pages:

  1. Use a tool to find duplicate content
  • Google Search Console: A Google Search Console Index Coverage report is a great way to get a clear idea of which URLs on your website may have been indexed by Google and appeared on other pages.
Google Search Console page index report to find duplicate content

To access this report, you can simply click on the page indexing report after opening the dashboard.

  • Site Liner. An easy, no-cost way to find duplicate content on your website is to use the website Site Liner. A third-party tool designed specifically to identify duplicate content, you can simply search the URL of your choice, and get a near-instant report showing duplicate content. Both on-screen and downloadable reports include the total percentage of internal duplicate content present on your site. You can also check the details of all the duplicate content scanned. (Note: The free version is limited to 250 pages, monthly.)
  1. Search for Duplicate Content on Other Websites

There are tons of online tools available to help you find out if anyone is duplicating your website’s content. We like to use CopyScape, a free duplicate content checker that’s very easy to use. Just input your URL in the search field, and Copyscape will crawl the web and find similar content for you.

You can then select from the resulting list of content to see which parts of your content may have been copied. (Note: The free version of CopyScape has limited capabilities, but you can sign-up for a premium subscription, and get unlimited scans. A nice bonus, and a cost-effective way for owners of multiple websites to keep their content protected).

  1. Use Google to Check for Duplicate Content

Google is completely free, and finding duplicate content is as simple as copying a few words from the start of a sentence and pasting it with quotation marks into a Google Search. 

If you make a few searches and don’t come across any other websites with an exact match, you likely have no duplication. However, if you see a bunch of web pages popping up that don’t look trustworthy, with snippets of your content, that could be a red flag.

A Google hint: The original content source is always the first search result. So, if you see a website ranked above yours in the search results you should immediately consider taking that content down and fixing it. If you have content issues you should start focusing your efforts on content quality, here’s a helpful guide for optimising your existing content.

How Do I Fix Duplicate Content Issues?

Sometimes Google may inappropriately flag your content as duplicate, or may not be indexing your pages correctly. In such cases, you can take a few proactive steps to protect your website’s image and content.

Here are a few things you can do to resolve replication issues on your website: 

Canonical links

If you’re sharing your own content with other websites, be sure that your content recipient is employing a canonical tag (“rel canonical”). This helps search engines recognize that your URL is the master copy of the page. This is a helpful way to ensure your content is immediately recognized as an original source—and that all other versions are duplicates.

301 Redirects

301 redirects are your friend when it comes to dealing with duplicated content, and uncrawled pages.

You can integrate 301s in your .htaccess file and redirect your visitors, Google bots, and countless other spiders. This is immensely helpful for removing duplication issues—and you can prevent any alternate versions of the content from being visible, too. 

Here’s a helpful YouTube video tutorial to easily set up 301 redirects for SEO purposes.

Consistent Interlinking

If you aim to achieve significant visitors, there are no shortcuts. But, if you’re dedicated to internal-website-linking—your hard work will pay off when you have your SEO on point. 

Here’s a very useful internal linking guide and strategy to help you form good SEO habits by TopContent.

Google’s URL parameters tool

You can use Google’s own URL parameters tool to share how parameters are affecting the content on your site. This removes the randomness and unpredictability of search engine crawling, and helps to capture which parameters are useful (and which can be shed). This gives you a better chance of improving your SEO game.

Country-Specific Domains

Another nifty trick for resolving issues is to use country-specific domains in your site URL, to help Google serve the right version of your documents. 

For example, https://www.apple.com/ca/ will be identified to contain Canada-related content and Apple products. Creating a simplified country-related domain can significantly increase your SEO results.

Syndicate Content

You don’t have to let Google make the choice on which version of duplicated content is indexed—and which might be ignored. Instead, you can use a “no index” tag on pages you don’t want to be indexed. This helps Google the correct content version to index. 

Protect Content with DMCA

If you own the copyrights to the original content published on your site, and you want to avoid any duplication issues, you can place a Digital Millennium Copyright Act (DMCA) badge on your website to keep content scrapers away.

Content carrying a DMCA badge holds an extra level of protection. DMCA will remove duplicated content, at no additional charge, if your text is ever stolen (as long as it carries the badge).

This can be helpful in deterring thieves from plagiarizing your content. DMCA also offers tools to help you locate unauthorized copies of your content on someone else’s site. DMCA will even take down plagiarized content including pictures and videos.

Minimize Repetition

It is a good practice to use meta tags, and descriptions for better SEO results. As mentioned earlier, using Google’s Parameters Tool can help control content repetition. It’s also recommended that you create unique SEO-optimized content, rather than rely on a content supplier.

Manage Duplications and Alt Pages

Make sure to avoid duplicate content at all costs, however, if you have several similar pages, it’s a good idea to consolidate them. You can use the Rel=”alternate” tag to consolidate alternate versions of a page and match them to a mobile version or a country/language-specific page. Then you can use the hreflang tag to show the correct data for alternate pages.

Don’t Block Web Crawler Access

Google recommends that you avoid blocking web crawlers’ access to duplicate content pages and instead let them be identified as duplicates. But, if you’re worried about Google limiting its crawling budget for your website, you can adjust the crawl rate in the Google Search Console settings.

Why Do Websites with Duplicate Content Fail?

Good SEO is crucial for any web-based business’s survival because better online visibility means more traffic, which can translate to better revenue.

Duplicate content is undesirable for three main reasons:

  1. Search engines have difficulties effectively indexing the same content appearing on different pages or websites. Since duplicate content is competing against the original content, it lowers the performance for each version of the duplicate content on the web. 
  2. Search engines rank content using authority, relevancy, and trust as metrics. So, having duplicate content may impact the trust and authority of your website. This can ultimately lead to less visitors coming to your site (and less consumer traffic).
  3. Along with SEO issues, duplicate content on your site might also send conflicting signals to search engines and cause you to lose business. To avoid this, it’s important to make sure each page has a unique URL and no duplicate content. Google ranks pages with duplicate content lower than other pages.

Final Thoughts

Maintaining a high-quality standard is an important part of helping your website succeed.  And, duplicate content significantly pollutes your visitor’s experience, and may even have some serious repercussions.

Instead, aim to build a positive relationship with your site visitors by providing high-quality, original content. Establishing a good relationship with readers ultimately boosts your traffic and your revenue.