Overcoming Content Duplication Issues

Overcoming Content Duplication Issues

Without a doubt, one of the most difficult things an SEO must contend with on a website is duplicate content. Too many content management systems are built for content and perform admirably with it, but when implementing that material across the website, not enough thought is given to search engine optimization.

Duplicate material comes in two flavors: onsite and offline. Content that appears twice or more on one of your own website’s pages is known as onsite duplication. When content from one website appears on another, it is referred to as offsite duplication. While outside duplication could be out of your control, onsite duplication is something you can control. Both pose issues.

Why is Duplicate Content an Issue?

It is advisable to begin by outlining the benefits of unique content before discussing the drawbacks of duplicate content. You stand out with original material. It distinguishes you. It makes you stand out. Why? Considering that the content is exclusive to you.

Nothing gives you an advantage over the competition when you utilize the same text to describe your products as the next man. Nothing gives one URL the advantage over another when you have numerous URLs that are exactly the same, and neither works properly.

Overcoming Content Duplication Issues
Overcoming Content Duplication Issues

The value of your material is essentially diminished by duplicate content. Search engines search for material that is distinct from other content since they don’t want to direct users to multiple pages that say the same thing. You can compete with your rivals rather than with yourself when you produce original material.

Upon starting their site crawling process, search engines extract the material from each page and store it in their index. When visitors examine the content of those pages and find page after page of duplicate information, they choose to divert their efforts. maybe on indexing distinct pages on the websites of your rivals.

Duplicate content basically reduces the value of your content. Because search engines don’t want to send consumers to several pages that say the same thing, they look for content that is unique from other content. When you create original content, you may compete with your opponents instead of with yourself.

Search engines gather content from every page as soon as they begin their site crawling process and store it in their index. Visitors decide to give up when they read through those pages and discover page after page of information that is repeated. perhaps on indexing different pages on your competitors’ websites.

Dealing With Offsite Duplicate Content Problems

There are two primary reasons for offsite duplicate content: you or someone else is to blame! Essentially, it is content that you either stole or that someone else took from you. Offsite duplicate content, whether allowed or not, is probably damaging your site’s ability to rank higher in search engine results.

Content Scrapers and Thieves

Those who steal content from the internet and post it on their own websites are the worst culprits when it comes to content theft. Usually, what emerges is a Frankensteinian jumble of disparate material fragments that add up to less of a cohesive whole than the green blockhead himself. These pages are usually created with the express purpose of drawing users in and luring them out as soon as possible by clicking on the numerous advertisements strewn throughout the content. These kinds of material scrapers are difficult to stop, and search engines are actively working to identify them as such in order to remove them from their indexes.

Not all content theft occurs through scraping. Some people simply copy what you’ve written and claim it as their own. While some content on these websites is indeed plagiarized without permission, overall they are of a higher caliber than the scraper sites. Compared to scrapers, this kind of copying is more damaging because the information is probably gaining links and the sites are often regarded as high-quality resources. Your own material is likely to rank higher if the stolen content generates more inbound links than it does!

Scrapers can generally be disregarded, although some heinous offenders and thieves may be pursued through the legal system or by submitting a request for DMCA removal.

Article Distribution

Content is frequently released through distribution methods in the hopes of being picked up and reposted on other websites. Usually, one or more links directing readers to the author’s website constitute the value of this duplication. I write a lot of stuff for our E-Marketing Performance blog, and a lot of it is already written elsewhere. This is strategic duplication, therefore I need to carefully consider the advantages and disadvantages.

I receive links back to my website from other publications that include one of my articles. These are very important connections. In addition, I receive significantly more exposure than I do on my own blog, which enables me to reach far beyond my own geographic boundaries. I won’t run the danger of producing the kind of mass off-site duplication that typically harms sites the most by limiting this duplication rather than doing it all at once.

The whole issue of duplicate content is a drawback. Since I no longer own the exclusive rights to my material, there’s a chance that visitors from these other blogs will replace traffic to my own website. In fact, these sites frequently appear first in search results above mine because they have a lot more authority than mine.

However, in this instance, the benefits exceed the drawbacks. For the time being, however. That might not always hold true.

Though I haven’t seen this materialize in any significant way, search engines talk about identifying the “canonical” version of such duplication to guarantee the original content gets greater points than the duplicate versions. I posed this question to some search engine engineers years ago. Do links leading to the copied version count as links to the original version if there are two identical pieces of content and search engines can plainly identify which one came first?

If this were the case, that would be fantastic. Even if the duplicate site and the original site received equal link juice from the search engines, I would still be satisfied. Of course, that would also need to include links and social media shares, but search engines can still favor original content over duplicate content that has been reprinted, whether or not that is done with malicious intent.

Generic Product Descriptions

Product descriptions are among the formats where duplicate material appears most frequently. Many of the things sold on thousands of websites are identical or quite similar. Consider any website that offers Blu-Ray discs, CDs, DVDs, or books. The product collection is essentially the same on all websites. Where do you think the product descriptions on these websites come from? Most likely the content producer, publisher, manufacturer, or film studio. Additionally, as these goods all eventually originate from the same location, their description language is typically 100% similar.

Overcoming Content Duplication Issues
Overcoming Content Duplication Issues

Multiply that amount by the millions of distinct products and the hundreds of thousands of websites that sell them. There is enough duplicate text to circle the solar system multiple times if each website doesn’t take the effort to create their own product descriptions.

How therefore does a search engine distinguish between one or more of these hundreds of websites that use identical product information when a search is conducted? Search engines prioritize original material, so even if you’re selling the same product, your chances of appearing higher in the search results are increased if you write a compelling and distinctive product description.

However, since there are no more variables to consider, the search engines must examine the entire webpage. The site’s weight and the quantity and caliber of backlinks are typically the decisive factors in these situations. If two websites have identical content, the more popular website with more users, a better backlink profile, and a wider social media following will probably rank higher than the other websites.

Unique product descriptions do provide websites an advantage, but they can’t keep up with sites with a strong, established history if they only offer unique material. However, original material will nearly always perform better than duplicate content on a site of comparable size, giving you the chance to build a stronger and stronger website. The secret to climbing out of the duplication content abyss is original material, even though it takes time.

Dealing with Onsite Duplicate Content Problems

Duplicate material on your own website is the most harmful type of duplicate content, and the one you can combat the best. Fighting duplicate content on other websites that you do not own is one thing. Fighting against duplicate content that exists within your organization is one thing; in theory, you can fix it.

In most cases, duplicate material on a website is the result of poor site architecture—or, more accurately, poor website programming! Inadequate site structure gives rise to a variety of duplicate content issues, many of which are difficult to find and resolve.

Google propaganda is typically used by those who argue against excellent architecture, claiming that Google can “figure out” these issues and so won’t affect your website. That scenario has a flaw in that it depends on Google to figure things out. Indeed, Google’s algorithms are capable of determining that certain duplicate content shouldn’t be duplicated, and they will consider this when evaluating your website. However, there’s no assurance they’ll find everything or even implement the “fix” as effectively as feasible for your own website.

It’s not okay for you to act like a fool because your partner is intelligent. Furthermore, you cannot use Google’s potential inability to diagnose your issues and implement appropriate remedies as a justification for failing to address your difficulties. You’re in big trouble if Google collapses. Thus, Google will function better for you if you ask it to do less work for you.

These are some typical problems with in-site duplicate content, along with solutions.

Also Read: How to Recover from Google Penalties

The Problem: Product Categorization Duplication

Product categorization is made possible by content management systems, which are used by many websites. In doing so, a distinct URL is made for every product within every particular category. When a same product appears in several categories, an issue occurs. As a result, the CMS creates a special URL for every category that the product belongs in.

These kinds of websites have been known to generate up to five URLs for each product page. For the engines, this kind of duplication is extremely problematic. Suddenly, a website with 5,000 products becomes one with 50,000 products. However, upon examination and analysis by the search engines, they discover that they have 45 000 duplicate pages!

This is the reason the search engine spider left your website while indexing pages, if there ever was one. Because of the repetition, the search engines are burdened needlessly, which forces them to focus their resources on more valuable areas and excludes you from the results for a significant portion of the page.

Here’s a screen grab I captured from The Home Depot’s website a few years back. I had to take two separate routes before I could find a specific product. Such a book might easily be associated with many categories, resulting in distinct URLs and, thus, redundant content pages.

Remember that all of the content on the page is 100% similar, with the possible exception of the real breadcrumb trail that is shown at the top of the page, even though the navigation path is different. Which one do you think would rank higher in search results if ten individuals connected to each of these pages and a rival received the same ten links, but to a single URL? You guessed it—the rival!

Overcoming Content Duplication Issues
Overcoming Content Duplication Issues

The Solution: Master Categorization

Eliminating the possibility of any product appearing in many categories is a simple way to address this type of duplication. However, that isn’t exactly beneficial for your customers because it makes it harder for those who might need this crucial product to find it if they fall into a different category.

Thus, there are two ways to avoid duplicate information in line with the capability to categorize things to numerous categories. One approach is to manually construct each product’s URL path. Your directory structure may become a little disorganized as a result of this time-consuming process. The other is to put every product, irrespective of its product and navigational category system, into the same directory. This somewhat undermines the design of your entire website and inhibits categorization reinforcement with your product URLs, therefore I’m not a big believer.

I think the simplest way to handle this is to give each product its own master category. The product’s URL will be determined by this master category. To give visitors several navigational paths to the product, the products below, for example, might be assigned to each category. However, the product’s URL will remain the same once they arrive, irrespective of how they got it.

Overcoming Content Duplication Issues
Overcoming Content Duplication Issues

In an effort to “fix” this issue, a lot of programmers block search engines from indexing any URL other than the one for each product. This prevents duplicate pages from appearing in the search index, but it doesn’t deal with the problem of link splitting. Consequently, any link juice that points to a non-indexable URL is effectively lost and does not improve the product’s position in the search results.

Band-Aid Solution: Canonical Tags

The above-mentioned solution won’t work with certain content management systems. If so, you may either apply a temporary fix or look for a more robust and search-friendly CMS. One kind of approach is the use of canonical tags.

Search engines created canonical tags to indicate to them which URL is the “correct” or canonical version. Thus, in the aforementioned cases, you select the URL you wish to be the canonical URL and then add the canonical tag to the code of every other product page with a duplicate URL.

<link rel=”canonical” href=”http://www.thehomedepot.com/building-materials/landscaping/books/123book” />

Theoretically, the search engines will assign the canonical URL to any links directing to non-canonical URLs when that tag is added to all duplicate product URLs. It should also pass any internal link value to the canonical URL, keeping the alternative URLs out of the search index. But that is merely conjectural.

Actually, this tag serves as a “signal” to the search engines about your intention and goal. After that, they’ll decide how best to use it. It’s possible that not all link juice will go to the right page and that non-canonical pages won’t be included in the index. In other words, they will consider your canonical tag.

The Problem: Product Summary Duplication

The placement of brief summaries of product descriptions on higher-level category pages is a typical example of duplicate content. Assume for the moment that you want a Burton snowboard. When you select the Burton link from the main menu, a full catalog of Burton products, snippets of product descriptions, and several filterable subcategories appear. Since you are certain that snowboards are all you want, you choose that subcategory, which displays a list of Burton snowboards, and then you click “snowboards.” This is a page featuring different Burton snowboards, each with a brief synopsis of the product.

You return to the “all snowboards” section of the website and see snowboards from every manufacturer along with product descriptions, which include the same descriptions you’ve previously seen twice for the Burton snowboards!

Category pages are an excellent way to score well in general search results (such as “burton snowboards” searches). But the majority of these kinds of product category sites just have product links on them, each with a brief synopsis that appears again over multiple pages of categories. This renders these pages nearly worthless!

The Solution: Create Unique Content for All Pages

The intention is for every product category page to be able to stand out on its own by offering visitors useful information and solutions. Writing a paragraph or more of original material for each product page is the easiest method to accomplish that. Take this chance to promote Burton snowboards and Burton products. Talk about specifics regarding the products that the visitor might not be aware of in order to influence their decision to buy.

Even if all products were removed from category sites, they should still be considered valuable pages for search engines to index. At that time, the page’s content will still be valuable despite the repeated content fragments.

The Problem: Secure/Non-Secure URL Duplication

Duplicate content issues can arise between safe and non-secure areas of e-commerce websites that use secure checkout. With a small modification, the outcome is comparable to the previous multiple URL problem. The search engines index a secure version of the same URL in addition to the conventional product URL.

http://www.site.com/category/product1/

https://www.site.com/category/product1/

As you can see, the “s” at the end of the “http” is what makes this instance unique. That suggests that the URL ought to be safe. Products do not generally need to be secure. Pages that require sensitive data are the only ones that must be secure.

This kind of duplication typically occurs when users navigate from an insecure area of the website to a secure shopping cart, but then leave the site and proceed with additional purchasing before checking out. When links leaving the secure shopping cart use “https” rather than “http,” this particular duplicate problem arises.

Overcoming Content Duplication Issues
Overcoming Content Duplication Issues

The Solution: Use Absolute Links

Linking products in a customer’s cart back to their product pages is, in my opinion, a smart concept. However, when it comes to internal URLs, web developers choose to utilize relative connections rather than absolute links.

For those who are unaware, a relative link simply carries the data necessary for the browser to locate the page (that is, everything after the “.com”). An absolute link, on the other hand, comprises the entire URL, including the “http://www.site.com.”

Complete connection:

<a href="https://www.polepositionmarketing.com/about-us/pit-crew/stoney-degeyter/"></a>

Related link:

<a href="/about-us/pit-crew/stoney-degeyter/"></a>

All relative links will automatically lead to “https” pages once a customer is in the secure portion of the cart because the visitor’s current location is used to determine which portion of the URL to display. It is necessary to use absolute links to direct customers to your merchandise. This prevents the consumer and search engines from accessing a secure URL and requires the visitor to switch back from “https” to “http.”

You might be asking yourself why anyone would use relative connections at all at this time. Before content management systems, each page’s actual files were created on the server and pages were manually coded. For many sites even now, this is still the case. Page files would be shifted around for improved organization during routine maintenance and site structure modifications. You could move files around using programs like Microsoft FrontPage and Adobe Dreamweaver, and the relative links would update automatically. Broken linkages were avoided as a result. When using absolute links, each link required manual updating.

The accepted form of link to utilize for this purpose is a relative link. Nonetheless, I support absolute links, particularly for product links in shopping carts and, more significantly, for site navigation. The link structure to and from your shopping cart is shown in the graphic below.

Overcoming Content Duplication Issues
Overcoming Content Duplication Issues

In the best case scenario, search engines should never be allowed inside a shopping cart area. All of these websites and URLs ought to be blacklisted. However, blocking these URLs alone is insufficient. Google’s index may pick up a page if a visitor switches from the prohibited pages to a duplicate (secure) unblocked product page. It is not possible to explore or index these pages if there are absolute links pointing back to the product page.

The Problem: Session ID Duplication

Some of the worst possible instances of duplicate content breaches are caused by session IDs. In order to follow users as they navigate a website and enable them to add items to a shopping cart, session IDs were developed to make sure the cart belonged to them and them alone.

Every time a visitor visits a website, a special ID number is added to the URL that is only for them.

Actual URL:

www.site.com/product

Visitor 1:

www.site.com/product?id=1234567890

Visitor 2:

www.site.com/product?id=1234567891

Visitor 3:

www.site.com/product?id=1234567892

Every URL they visit on the website has that session number connected to it, which tracks them around the site. We’re about to get into some serious arithmetic, so get out your calculators. Suppose your website has fifty pages. Every visitor receives a session ID, allowing you to have 50 distinct URLs for each one. With 50 unique, indexable URLs on 50 pages, your site should receive 2500 visitors per day, assuming that. When you multiply that by the number of days in a year, you have nearly a million distinct URLs—all for a meager 50 pages!

Overcoming Content Duplication Issues
Overcoming Content Duplication Issues

Is that something a search engine would want to index?

The Solution: Don’t Use Session IDs

Since I’m not a coder, my expertise in this field is somewhat restricted. This is what I am aware of. Session IDs are horrible. Session IDs can be used for more effective purposes without causing duplicate information to stick to your shoes like dog feces! While neither is cross-browser compatible, other methods, like cookies, not only let you monitor users through the site, they do it much better and can keep track beyond just one single session!

I’ll let you and your programmer determine which tracking method works best for our system; in the meantime, you can let them know that I mentioned session IDs are not the solution.

The Problem: Redundant URL Duplication

The way that pages are accessed in the browser is one of the most fundamental site architecture issues. Although most pages can only be viewed via their main URL, there are certain exceptions, such as when the page is the initial page of a virtual or non-virtual subdirectory.

The picture below serves as an example of this. If you leave any of these URLs unchecked, they all point to the same page with the same information.

Overcoming Content Duplication Issues
Overcoming Content Duplication Issues

This applies to any page (such as www.site.com/page et al.) at the top of a directory structure. That is one page with four different URLs, which causes the website to have duplicate content and divides your link juice.

The Solution: Server Side Redirects and Internal Link Consistency

This type of duplicate content issue can be fixed in several ways, and I advise applying them all. Although they all have their advantages, they are all prone to letting things pass them by. By putting them all into practice, you can construct an impenetrable duplicate content solution!

Server Side Redirects

Using your.htaccess file to reroute non-www. URLs to www. URLs (or vice versa) is one way to solve this problem on Apache servers. I won’t go into depth here, but you can click that link to find out everything there is to know. Although this won’t work on every server, you can work with programmers and your web host to create a comparable approach that will have the same impact.

Whether you want the www in the URL or not, this technique works. Simply choose your preferred path and nudge the other in that direction.

Internal Link Consistency

Make sure to use the www in all of your absolute internal linking after you’ve decided whether or not to use it in your URLs. Yes, the server side redirect will take care of an inaccurate link. However, you now run the risk of duplicate sites being indexed if the redirect fails for whatever reason. Redirects stop working when someone makes changes to the server; I’ve seen this happen. The issue is frequently not identified for months, and even then, only after duplicate pages have entered the search index.

Never link to /index.html (or .php, etc.)

Avoid referring to the page file name when referencing a page located at the top of a directory or subdirectory; instead, link to the root directory folder. Internal site sub-directory sites do not automatically receive this server side redirection; nevertheless, it happens automatically for the home page. You won’t have to worry about a duplicate page appearing in the search results if all of your links are consistently heading to the top level pages’ root subdirectory.

Link to:

www.site.com/
www.site.com/subdirectory/

not

www.site.com/index.html (or .asp, .php., etc.)
www.site.com/subfolder/index.html

Although it may seem excessive to implement EVERY one of these remedies for duplicate content, there is really no excuse not to, as the most are quite simple. Although it takes some time, the assurance that any duplicate content issues will be resolved is well worth the effort.

The Problem: Historical File Duplication

Although most people wouldn’t consider this to be a duplicate content issue, it certainly can be. A typical site undergoes several designs, revisions, development, and redevelopment throughout time. Duplicate content is prone to be unintentionally created as things get rearranged, duplicated, moved, and tested in beta. I have witnessed developers completely alter a website’s directory structure and then upload it, all without ever deleting or rerouting the original contents.

When internal content links are not updated to point to the new URLs, the issue is made worse. Referring back to the developers I mentioned earlier, I spent more than five hours repairing content links that were directing to the outdated files after they launched the “finished” new website!

The search engines will continue to index these outdated pages as long as they are still on the server and, worse, are being linked to, which will put the new and old pages in rivalry for search engine favor.

The Solution: Delete Files and Fix Broken Links

You will not benefit from a broken link check if you have not bothered to remove the outdated files from your server. So let’s begin there. Make sure you have a backup of your website so you won’t accidentally remove a crucial page. Use a tool like Xenu Link Sleuth to do broken link tests after deleting all outdated pages.

You should be able to determine which page has broken links and to which location by using the given report. Utilize it to figure out where the link should now be located and fix it. After addressing them all, perform the broken link check again. It’s likely that it will keep finding links to correct. Before I was certain that every faulty link had been rectified, I had to perform these tests up to 20 times. Even so, it’s a good idea to run them on a regular basis to see if anything has changed.

While not all duplicate material will undermine your on-site SEO efforts, some of it will undoubtedly keep your website from operating at the highest level. It’s your competitors’ websites, not your own, that have the greatest duplicate material. When at all possible, remove duplicate material in all its forms to give your website a chance to function. If you can replace duplicate content with original, well-thought-out content that benefits search engines and users, you’ll have an advantage over competitors who don’t use duplicate content.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *