Google & ‘Soft 404’

Many of us who manage websites are familiar with Google’s ‘Search Console‘. The search console is a way for webmasters to manage how Google interacts with our web sites. It provides functions to tell Google what parts of the site to search, what parts to ignore, and determine what pages are doing better than others.

One of the functions it provides is a way to see what parts of a web site that Google has indexed and what part it hasn’t. It also can tell what parts of a site it is ignoring and, to a certian extent, why it’s ignoring them.

One of the reasons that Google might be ignoring a page is because it’s been to be determined to be a ‘Soft 404’.

What’s a Soft 404 error?

Well, a REAL 404 error is a page not found. It’s a function of the web server software. Most web servers provide the ability to use a custom page when a 404 error is encountered. You can see an example of one here.

As for a ‘Soft 404’ … according to Google …

A soft 404 means that a URL on your site returns a page telling the user that the page does not exist and also a 200-level (success) code to the browser.

https://support.google.com/webmasters/answer/181708?hl=en

While some sites might actually do that … handle a page not found error with a friendly page but indicate to the browser that it’s a normal page (200 status code) … I suspect it’s actually a minority of sites (granted, it may be a way to game the system).

However … it turns out that pages that contain the words ‘not found’, ‘error’, ‘authorized’, ‘not allowed’, etc., in the title or body are often treated by Google as a soft 404 error … even if the page isn’t a 404 at all. Additionally, the words do not even need to appear on the page at all. The details of what constitutes a ‘soft 404’ are very mysterious.

On my mailing list archives there are thousands of ‘Soft 404’ errors because people are discussing a file not found, a program not found, information not found, authorization, errors, etc. However, some don’t mention those words at all.

Here’s a quick sampling of some of the pages that Google has identified as ‘Soft 404’ …

https://archive.midrange.com/web400/200702/msg00040.html
https://archive.midrange.com/midrange-l/201804/msg01255.html
https://archive.midrange.com/midrange-l/200808/msg01254.html
https://archive.midrange.com/domino400/201801/
https://archive.midrange.com/domino400/200702/msg00030.html
https://archive.midrange.com/rpg400-l/201307/msg00210.html

If you visit those pages you’ll see they aren’t 404 pages at all, but Google has identified them as such.

Sadly Google hasn’t provided a mechanism to correct its misinterpretation, so there is nothing to be done … except, maybe, hope that Google’s algorithms get smarter.

On a side note … I’m really curious as to how Google is going to treat this page. It certainly contains all the indicators of a Soft 404.

Leave a Reply

Your email address will not be published. Required fields are marked *