Unmasking the truth about the Google duplicate content debate.
Google takes out a patent that will settle the debate for good.
Some believe that duplicate content will hurt your SEO efforts, while others dismiss the claim as a myth. Who is right? Well the answer lies in the new patent Google just filed: Duplicate document detection in a web crawler system. The patent explains how the search engine's content server interacts with a duplicate content server. The patent covers what duplicate content is, how it is detected, and how it effects you.
I guess the myths are in place because of the interpretation of the webmaster guides outline on duplicate content. The answer was different to some who believed the content to mean one thing and others be lived the content meant something else entirely. What is duplicate content according to Googles patent? Well the patent defense this by stating:
"Duplicate documents are documents that have substantially identical content, and in some embodiments wholly identical content, but different document addresses."The patent also details three separate scenarios in which duplicate documents are encountered by a web crawler:
- Two pages, comprising any combination of regular web page(s) and temporary redirect page(s), are duplicate documents if they share the same page content, but have different URL's.
- Two temporary redirect pages are duplicate documents if they share the same target URL, but have different source URL's.
- A regular web page and a temporary redirect page are duplicate documents if the URL of the regular web page is the target URL of the temporary redirect page or the content of the regular web page is the same as that of the temporary redirect page.
A permanent redirect page is not directly involved in duplicate document detection because the crawlers won't download the contents of a redirect page.
According to the apparent description, Google's web crawler consults the alike agreeable server to analysis if a begin page is a archetype of addition document. The algorithm again determines which adaptation is the a lot of important version.
Google can use altered methods to ascertain alike content. For example, Google ability yield "content fingerprints" and analyze them if a new web page is found.
Interestingly, it's not consistently the page with the accomplished PageRank that is called as the a lot of important URL
for the content. The patent states; The patent states:
"In some embodiments, a canonical page of an equivalence class is not necessarily the document that has the highest score (e.g., the highest page rank or other query-independent metric)."