How to fix duplicate content in Magento 2
Duplicate content is considered constant anxiety for a lot of site owners.
Learn almost anything about it, and you might believe that your website is full of duplicate content problems. A Google penalty might be imposed some days.
Today’s post will give you a thorough understanding of duplicated content, why it happens, how it affects your Magento 2 website, and how to fix it effectively.
Are you ready to begin? Let’s dive right in.
Table of Contents
- What is Duplicate Content?
- Why Does Duplicate Content Happen?
- Does Duplicate Content Harm Your SEO?
- How to Identify Duplicate Content Issues?
- How to Fix Duplicate Content in Magento 2?
What is Duplicate Content?
Duplicate Content is the same content that appears in many places on the Internet. When multiple versions are the same, it is very difficult for the search engine to distinguish which version relates more closely to the user’s search query. Therefore, search engines will rarely display duplicate content pages, and instead choose the articles that are likely to be the original version or choose the most relevant version.
Why Does Duplicate Content Happen?
Duplicate product forms
E-commerce sites often utilize the manufacturer’s product description to detail the items they sell. The issue is that those items are often sold to multiple e-commerce sites. Then, the same description shows up on various sites and generates duplicate content.
Sorting and multi-page lists
Major e-commerce sites have filter and category options that create unique URLs. Product pages can show up in various categories and be ordered differently according to how the list is filtered. For instance, if you arrange 45 items by alphabetical order or by price, you will end up having two pages with the same content but with different URLs.
You usually want to keep tabs on your visitors and enable them, for example, to store products they want to purchase in a shopping cart. You need to give them a “session” to allow them to do that. A session is a short history of what the visitor did on your website and can include things like the products in the shopping cart.
To keep that session when a visitor clicks from one page to another, the unique identifier for that session (called the Session ID) needs to be saved somewhere. The most popular solution is to do that with cookies.
At that time, some systems return to using Session IDs in the URL. This means that each internal link on the site gets that Session ID included in its URL, and as that Session ID is exclusive to that session, it generates a new URL and thus causes duplicate content.
URL parameters for monitoring and filtering
Another cause of duplicate content is utilizing URL parameters that do not modify the content of a page, for example, in monitoring links. For a search engine,
https://www.mageplaza.com/keyword-x/?source=rss are not the same URLs. The latter might let you know what source people came from, but it might also make it more challenging to rank high.
This doesn’t just head for monitoring parameters. It goes for each parameter you include in a URL that doesn’t modify the important piece of content, whether that parameter is for “editing the sorting on a set of items” or for “displaying another sidebar”: all of them lead to duplicate content.
WWW vs non-WWW pages
If your Magento 2 website has different versions at “
www.website.com” and “
website.com” (with and without the “www” prefix), and the same content stays on both versions, you’ve efficiently generated duplicates of those pages. The same goes for websites that have versions at both
https://. If these two versions of a page are live and visible to search engines, you might encounter a duplicate content problem.
Scraped or copied content
Content covers blog posts, product information pages, or editorial content. Scrapers re-uploading your blog on their websites might be a common case of duplicate content, but there’s a popular issue for e-commerce websites (product details). If multiple sites sell the same products, and they all make use of the manufacturer’s descriptions of those products, identical content appears in different locations throughout the web.
Does Duplicate Content Harm Your SEO?
In case duplicate content doesn’t result in Google penalties, could you just leave it existing on your website? No. Duplicate content can leave a negative impact on your page rankings and organic traffic.
Firstly, search engines prevent returning duplicate entries on their results pages. A results page with 10 similar results on various pages is less useful than a page with 10 different, original results.
Search engines need to determine which version of duplicate content is most related. To do this, they assess domain authority and which page shows up to be the original, most authoritative source of the content. Crawlers that sort out duplicates from results pages:
If you’re highlighting content that also shows up on a more reliable site, your URL will be sorted out result pages, making room for the more authoritative sites. If you contain duplicate content throughout some pages of your site, the majority of these pages will be sorted out of the search engine results pages. Overall website visibility will be affected.
Next, duplicate content pages may dilute link equity and page credibility. If your website maintains two different URLs with the same content, websites linking to your content will need to pick between the two versions. This makes inbound links thinner than necessary, adversely impacting ranking signals for the pages in question.
How to Identify Duplicate Content Issues?
Although duplicate content is often visible to our naked eyes, it’s hidden in the code of a site. That’s the reason why you should depend on software to check for duplicate content.
On-site duplicate content
There are many SEO audit tools that can identify different URLs with identical content and suggest how to fix them. The tools can also warn you about general duplicate content tips.
As normal, site audit tools recognize duplicate content via meta descriptions and titles, creating an exportable list of URLs to make identifying and fixing the issue easier. Fixing these technical problems will help boost meta-tag SEO, which brings in higher click-through rates from search engine results pages.
Off-site duplicate content
Off-site duplicate content exists on various websites, so they’re more difficult to find.
To ensure you’re uploading content that already stays on another website, consider utilizing a plagiarism tool before releasing. This is especially crucial if you’re partnering with outsourced writers or new team members who may realize the significance of unique content.
Moreover, you can leverage that tool to check whether other websites are not copying your content. There are smart tools that scan the web to identify instances of content copied from your website.
How to Fix Duplicate Content in Magento 2?
In many cases, the best way to combat duplicate content is to set up a 301 redirect page from the “copy” page of the original content page. When pages with high rankings are well combined into a single page, they no longer compete with each other, which creates a stronger relevance and overall positive signal. This will positively impact on the ability to be well ranked in search engines.
Do not make users feel bothered because of being able to see the same content anywhere. When discovering duplicate content sites, use a 301 redirect that takes them as well as search bots to an address.
Meta robots constitute a helpful technical item when you need to analyze the risk of duplicate content on your Magento 2 website.
Meta robot tags are awesome if you need to exclude a specific page from being indexed by Google and would want them not to appear on search results pages.
By including the “no index” meta robots in the HTML code of the page, you can show Google you don’t want it to appear on SERPs. This is a better method than Robots.txt blocking as it enables more granular blocking of a specific page or file, while Robots.txt is usually a large-scale task.
Though this guide can be given for several reasons, Google will grasp this instruction and should remove the duplicate pages from SERPs.
Implement canonical URLs
Another idea to deal with duplicate content is leveraging the canonical link element to determine which version of the page should be evaluated for rankings. It facilitates bots to better understand your site, which then indirectly helps users as it affects page ranking positions.
Let’s take an example:
You might think of the first URL as the official or favorite version of that page. This URL doesn’t contain a sort parameter which makes the URL better, and this page might display the items in the order you’d prefer most people see.
Nevertheless, if sorting by size is the most common choice for your browsers, you might want the canonical URL to be
/product-list.html?sort=size instead. Similarly, you may think that the third URL (sorted by color) attracts the most attention from other sites or on social media platforms, and thus the third person might be more appropriate as the preferred version of the URL. No matter what URL you choose to be the official version, you announce this official version by creating a canonical tag.
Make title tag changes
If two pages have the same title tag or page header, but the pages are typically different and cater to distinct purposes, then you don’t have to make a big change to the page. Meanwhile, you can edit the title tag or the page header to clearly point out what the purpose of each page is. Besides, this practice can prevent several pages from ranking for the same terms and help you rank for other new terms.
Eliminate or consolidate content & redirects
If you have a serious duplicate content issue, editing a title tag or implementing a canonical tag won’t be enough of a solution. In these scenarios, the fix requires consolidating or eliminating pages, then redirecting eliminated URLs to the kept URL.
For instance, if Page And Page B are similar, you can delete Page B and keep Page A to resolve the duplication. Of course, users might still come searching for Page B, and you won’t want to lose those visitors. Therefore, you can redirect Page B to Page A after eliminating Page B from the site. In addition to helping remove the duplicate pages, that way can make sure people and bots visiting the website are able to find the wanted content.
This might get more complex if the pages are nearly identical or if only the intentions are similar. For those cases, you’ll consolidate the content from the duplicated pages to one page and maintain some of the content from the two versions of the page.
For instance, you have three pages mentioning your gadgets for sale, and each of these pages has unique content and images. You can consolidate all these pages into one page about gadgets, maintaining some of the content and pictures from each page. When the content gets consolidated, you will need to redirect the identical content versions into one page only.
How do you select which page to maintain and which page to delete then redirect?
It relies on how the pages are performing. Similar to choosing a canonical, you want to check which page browsers prefer and maintain the version of the page that has a huge amount of engagement and traffic.
Favorite domain and parameter handling in Google Search Console
Google Search Console enables you to set the favorite domain of your Magento 2 website (for example,
https://www.mageplaza.com rather than
http://www.yoursite.com) and determine if Google bot should crawl different URL parameters differently.
According to your URL structure and the cause of your duplicate content problems, setting up your favorite domain or parameter handling may offer a solution.
The big drawback to applying parameter handling as your main medium for solving duplicate content is that the changes you create only work for Google. Any rules applied using Google Search Console will not impact how other search engines’ crawlers scan your website. You will have to employ the webmaster tools for other search engines and edit the Search Console settings.
Solve hidden technical or structural problems
The duplicate content might happen due to a hidden technical issue. In some cases, this is an old development environment reflecting the live website that somehow became exposed to users.
This can also occur with dynamic pages that contain the same content at different URLs. Besides, the problem may arise as a consequence of user-generated content and lack of tracking people uploading identical content in various locations (for example, a forum where participants could raise the same query to four different categories within the forum).
Duplicate content might also be caused by a structural issue. For instance, there are five sections within the site where the page could live, so the webmaster places the page in each of these sections. That somehow makes sense and enables users to seek the page in multiple places, all of which might be related places for the page to live.
However, the better solution is to reorganize the site. One part can link to another or without different copies of the page in various places of the site.
Duplicate content is always a painful problem for SEO, so it is necessary to minimize the risk of this problem. Invest in your content with useful information, differentiate yourself from the rest of the site and ask for copyright issues when another website wants to get your information. Hopefully, through this article, you already know what duplicate content is about and pay more attention to this issue.
With Mageplaza SEO extension, by adding Canonical URL Meta, those duplicate content will be automatically prevented and will help to boost your SEO performance outstandingly.
not your workload
Simple, powerful tools to grow your business. Easy to use, quick to master and all at an affordable price.Get Started