URL Canonicalization
Recently I read about SEO and canonical Url, and I start to think that in these days and ages we have to be aware of this if we want our website to be "compatible and friendly" to search engines. 1.What is Canonicalization? < link rel = " canonical " href = " ...
Recently I read about SEO and canonical Url, and I start to think that in these days and ages we have to be aware of this if we want our website to be "compatible and friendly" to search engines.
1.What is Canonicalization?
<link rel="canonical" href="http://www.example.com/" />
Simply talking, canonical url is the url that you want the visitors or users to see, normally it is the simplest and most representative of all the urls that represent the same page.
2.Why is that Canonicalization of url is important?
Why borther with canonicalization? What is the point of doing this since any urls can go to the same page. It is important because if we care about how easily people can search and find your site through search engines than it is inevitable to understand this clearly and deeply.
Example of different urls which point to the same pages
Some urls contains tracking params: http://www.example.com/product/computer/?l_id=home_page http://www.example.com/product/computer/?l_id=browsing_history http://www.example.com/product/computer/ Above 2 urls all points to the same webpage but add more params to track where the origin is.
Another case is:
http://example.com/black-shoes https://example.com/black-shoes http://www.example.com/black-shoes The server is configured to serve the same content for the www subdomain or the http protocol.
OK. I start to see your points, but why same page with different urls can be a problem after all?, you might ask.
The reasons are:
-
Before search engine such as google show the links(which point to the page) to the searcher, it have to make sure that different page show has different content. Search engines and we also as the searcher hate to see duplicate contents on the different urls.
So what do search engines do?
-
It have to consolidate link signals for the duplicate or similar content. It helps search engines to be able to consolidate the information they have for the individual URLs (such as links to them) on a single, preferred URL. This means that links from other sites to http://www.example.com/product/computer/?l_id=home_page get consolidated with links to http://www.example.com/product/computer/
-
And more than often it becomes a challenge to search engine than to you to track a single topic.
-
-
You also might have the prefered url you want people to see.
-
If you syndicate your content for publication on other domains, you want to consolidate page ranking to your preferred URL
Here comes the role of canonical url.
3. How to set canonical URL
We can tell search engine by doing the following:
1. Set preferred domain
Whether it is:
http://example.com or
http://www.example.com
You can tell google your prefered domain. You can do so by:
- On the Search Console Home page, click the site you want.
- Click the gear icon , and then click Site Settings.
- In the Preferred domain section, select the option you want.
2. Indicate the preferred URL with the rel="canonical" link element
Mark up the canonical page and any other variants with a rel="canonical" link element. Add a <link> element with the attribute rel="canonical" to the <head> section of these pages:
<link rel="canonical" href="http://www.example.com/product/computer/" /> This indicates the preferred URL to use to access the green dress post, so that the search results will be more likely to show users that URL structure. (Note: We attempt to respect this, but cannot guarantee this in all cases.)Avoid errors: use absolute paths rather than relative paths with the rel="canonical" link element. Use this structure: http://www.example.com/product/computer/ Not this: /product/computer/
3. Use 301 redirects for URLs that are not canonical
Suppose your page can be reached in multiple ways:
https://example.com/home
https://home.example.com
https://www.example.com
It's a good idea to pick one of those URLs as your preferred (canonical) destination, and use 301 redirects to send traffic from the other URLs to your preferred URL. A server-side 301 redirect is the best way to ensure that users and search engines are directed to the correct page. The 301 status code means that a page has permanently moved to a new location.
4. Indicate how to handle dynamic parameters
Use Parameter Handling to tell Google about any parameters you would like ignored. Ignoring certain parameters can reduce duplicate content in Google's index, and make your site more crawlable. For example, if you specify that the parameter l_id should be ignored, Google will consider http://www.example.com/product/computer/?l_id=browsing_history to be the same as http://www.example.com/product/computer/
5. Specify a canonical link in your HTTP header
If you can configure your server, you can use rel="canonical" HTTP headers to indicate the canonical URL for HTML documents and other files such as PDFs. Say your site makes the same PDF available via different URLs (for example, for tracking purposes), like this:
https://www.example.com/downloads/book.pdf
https://www.example.com/downloads/partner-1/book.pdf
https://www.example.com/downloads/partner-2/book.pdf
https://www.example.com/downloads/partner-3/book.pdf In this case, you can use a rel="canonical" HTTP header to specify to Google the canonical URL for the PDF file, as follows:
Link: https://www.example.com/downloads/book.pdf; rel="canonical" Google currently supports these link header elements for Web Search only.
6. Prefer HTTPS over HTTP for canonical URLs
Google prefers HTTPS pages over equivalent HTTP pages as canonical, except when there are conflicting signals such as the following:
-
The HTTPS page has an invalid SSL certificate.
-
The HTTPS page contains insecure dependencies.
-
The HTTPS page is roboted (and the HTTP page is not).
-
The HTTPS page redirects users to or through an HTTP page.
-
The HTTPS page has a rel="canonical" link to the HTTP page.
-
The HTTPS page contains a noindex robots meta tag
Although our systems prefer HTTPS pages over HTTP pages by default, you can ensure this behavior by taking any of the following actions:
-
Add 301 or 302 redirects from the HTTP page to the HTTPS page.
-
Add a rel="canonical" link from the HTTP page to the HTTPS page.
-
Implement HSTS(HTTP Strict Transport Security)
4. Conclusion
I hope you can get some ideas of how important canonical urls are and try to apply it in your project.