Tag Archives: web search

Google and Canonical

I’ve found some more Google weirdness … this time related to how it handles declared ‘canonical’ URL’s.

A canonical URL is a meta tag that you put in a web page that says “This is the correct URL to this particular page”.

Here’s an example of the canonical tag …

<head>
   <link rel="canonical" href="https://example.com/page.php" />
</head> 

However, even when you declare a canonical URL, Google sometimes decides that there is a better URL.

In some cases it’s to another page on your site…

In this case, both pages are pretty similar (actually, they are identical) … but they are two distinct pages and both have their own canonical URL declared.

I noticed at least one case where the canonical URL that Google selected wasn’t even on my site.

Granted, this is the same content … but it’s should not be considered the canonical version of a page on my site.

Unfortunately I really don’t know how to resolve this issue … as Google doesn’t respond to webmaster raised issues related to their search engine functionality.

Google & ‘Soft 404’

Many of us who manage websites are familiar with Google’s ‘Search Console‘. The search console is a way for webmasters to manage how Google interacts with our web sites. It provides functions to tell Google what parts of the site to search, what parts to ignore, and determine what pages are doing better than others.

One of the functions it provides is a way to see what parts of a web site that Google has indexed and what part it hasn’t. It also can tell what parts of a site it is ignoring and, to a certian extent, why it’s ignoring them.

One of the reasons that Google might be ignoring a page is because it’s been to be determined to be a ‘Soft 404’.

What’s a Soft 404 error?

Well, a REAL 404 error is a page not found. It’s a function of the web server software. Most web servers provide the ability to use a custom page when a 404 error is encountered. You can see an example of one here.

As for a ‘Soft 404’ … according to Google …

A soft 404 means that a URL on your site returns a page telling the user that the page does not exist and also a 200-level (success) code to the browser.

https://support.google.com/webmasters/answer/181708?hl=en

While some sites might actually do that … handle a page not found error with a friendly page but indicate to the browser that it’s a normal page (200 status code) … I suspect it’s actually a minority of sites (granted, it may be a way to game the system).

However … it turns out that pages that contain the words ‘not found’, ‘error’, ‘authorized’, ‘not allowed’, etc., in the title or body are often treated by Google as a soft 404 error … even if the page isn’t a 404 at all. Additionally, the words do not even need to appear on the page at all. The details of what constitutes a ‘soft 404’ are very mysterious.

Continue reading