Category Archives: Internet

Google and Canonical

I’ve found some more Google weirdness … this time related to how it handles declared ‘canonical’ URL’s.

A canonical URL is a meta tag that you put in a web page that says “This is the correct URL to this particular page”.

Here’s an example of the canonical tag …

<head>
   <link rel="canonical" href="https://example.com/page.php" />
</head> 

However, even when you declare a canonical URL, Google sometimes decides that there is a better URL.

In some cases it’s to another page on your site…

In this case, both pages are pretty similar (actually, they are identical) … but they are two distinct pages and both have their own canonical URL declared.

I noticed at least one case where the canonical URL that Google selected wasn’t even on my site.

Granted, this is the same content … but it’s should not be considered the canonical version of a page on my site.

Unfortunately I really don’t know how to resolve this issue … as Google doesn’t respond to webmaster raised issues related to their search engine functionality.

Google & ‘Soft 404’

Many of us who manage websites are familiar with Google’s ‘Search Console‘. The search console is a way for webmasters to manage how Google interacts with our web sites. It provides functions to tell Google what parts of the site to search, what parts to ignore, and determine what pages are doing better than others.

One of the functions it provides is a way to see what parts of a web site that Google has indexed and what part it hasn’t. It also can tell what parts of a site it is ignoring and, to a certian extent, why it’s ignoring them.

One of the reasons that Google might be ignoring a page is because it’s been to be determined to be a ‘Soft 404’.

What’s a Soft 404 error?

Well, a REAL 404 error is a page not found. It’s a function of the web server software. Most web servers provide the ability to use a custom page when a 404 error is encountered. You can see an example of one here.

As for a ‘Soft 404’ … according to Google …

A soft 404 means that a URL on your site returns a page telling the user that the page does not exist and also a 200-level (success) code to the browser.

https://support.google.com/webmasters/answer/181708?hl=en

While some sites might actually do that … handle a page not found error with a friendly page but indicate to the browser that it’s a normal page (200 status code) … I suspect it’s actually a minority of sites (granted, it may be a way to game the system).

However … it turns out that pages that contain the words ‘not found’, ‘error’, ‘authorized’, ‘not allowed’, etc., in the title or body are often treated by Google as a soft 404 error … even if the page isn’t a 404 at all. Additionally, the words do not even need to appear on the page at all. The details of what constitutes a ‘soft 404’ are very mysterious.

Continue reading

Your Email Address

We all agree that email is crucial to modern life.

But what email should you use?

Everyone gets email when they sign up for high speed internet service … the problem is that you’re tied to that internet service for that email address. If you switch service providers, you could lose the address. Even worse, if your provider goes out of business, you could loose access entirely. Sometimes the email provider charges a fee for better service and/or removing advertising.

Yes, you could use Gmail, Hotmail, Yahoo, or AOL, but you’re still tired to the provider. Plus, you don’t often get to choose the best address (johnsmith5734563@xyz.com just isn’t that sexy).

Wouldn’t it be nice if you could get an email address that belongs to you forever?

Continue reading

Your ISP’s DHCP does not function properly

As part of my migration to the cloud, I terminated the Comcast Business internet service and switched to Xfinity internet.

When I initially signed up for the Xfinity service, I got their cable modem / router / wifi appliance. My plan was to get my own cable modem eventually because Xfinity charges $13 / month to lease the appliance.

I was at Best Buy and saw that cable modems weren’t expensive, so I decided to purchase a mid-level model (Netgear CM600) so I could save the lease fee. The CM600 would pay for itself in about 8 months.

It took a while to get setup … and there were a few false starts, but eventually I got it working connected directly to my MacBook.

I ran into a problem when I switched the CM600 over to my ASUS RT-5300 wifi router.

I kept getting the message “Your ISP’s DHCP does not function properly” on the ASUS network map page.

Continue reading

LetsEncrypt, Certbot, and Lightsail

Although not directly supported, it’s quite possible to use the LetsEncrypt certbot client on Amazon Lightsail Linux.

First of all … what is LetsEncrypt?

Let’s Encrypt is a free service that offers basic SSL certificates any web site.  The certificates are good for 90 days but can be renewed indefinitely. With the proper software, the installation & renewal of the certificates can be fully automated.

There are a few things to be aware of and workarounds that need to be done.

First, download the certbot-auto client itself…

Continue reading

mailbait redux

For anyone who runs a mailing list and has gotten pummeled recently with a rash of subscription attempts, they may be coming from mailbait.info.

A while ago I blogged on how to block mailbait, but it appears they have changed their host.

Their new host is ‘themailbait.bitbucket.org’.

I suggest you update your web server configuration to block any referrer that references the word ‘mailbait’ in the URL.

Here’s my new httpd.conf entry to block mailbait…

SetEnvIf Referer mailbait mailbait
Deny from env=mailbait

Adding Envelope Sender in sendmail

Fair warning: This post is pretty darn technical and is of little interest to people who don’t muck around with Linux and/or mail servers.

Recently I had a problem with someone on a midrange.com mailing list where they sent obvious spam.

The problem was, they were a subscriber to the list and had posted before … so the normal counter measures for that didn’t work (the first post for all subscribers are held until approved, to prevent people from subscribing, posting spam, and unsubscribing).

The puzzling thing about this was … the ‘from address’ on the message was not in the subscriber list.

Turns out that Mailman will accept message based on the FROM address of the message or the SENDER address (also known as the envelope-from).  The sender addressed is set by the sending mail server and is not normally in the body of the message.

After a bit of digging around, I figured out a way to add this information to the message headers so I can more easily diagnose the problem in the future.

Continue reading

Snow Days

Garage ViewAs you might have noticed (or heard), the Chicago area has had a bit of snow recently.

Lots of people are off work because the the streets are impassible.

Ginny’s home, there’s just no way she could have gotten to her office.

Due to this contraption they call “The Internet”, and an invention called “VPN“, I have no excuse to not work.

In fact, I’m pretty sure there isn’t anyone in our office right now.

Thunderbird message list out of sync

Sometimes I find that the message list in Thunderbird gets out of sync with the message bodies. When this happens, if I click on a message in the list, the message body that is brought up doesn’t match the subject.

I found a easy solution … just shut down Thunderbird, delete the corresponding .msf file from the accounts data directory, and start Thunderbird back up. Thunderbird will rebuild the .msf file and everything should be fine again.

To find accounts data directory, click on the “Server Settings” category of the effected account and look at the “Local directory” field.

[tags]thunderbird, mozilla, email[/tags]

Net Neutrality

Ok, I don’t watch the Daily Show all that much … but I usually get a good laugh when I do catch it.

I found this clip, however, on the internet that’s pretty darn funny … it corrects some of Ted Steven’s … um … ‘errors’ that he made when describing what the internet is.