| 5:29 pm on Jan 21, 2014 (gmt 0)|
Incorporating the link rel="canonical" tag into your pages should clean this up.
Google's support and Matt Cutts on the topic
| 6:09 pm on Jan 21, 2014 (gmt 0)|
You have a relative URL in an href or src on your soft 404 page. It's causing Google to infinitely crawl 404 pages.
Make sure all of your relative URLs start with a '/' in hrefs and srcs, or add a <base> tag to <head>.
| 8:40 pm on Jan 21, 2014 (gmt 0)|
|Incorporating the link rel="canonical" tag into your pages should clean this up. |
I don't imagine this will stop Googlebot wasting time hammering the site for those non existent urls which is more important to me than the ranking of that one page.
|You have a relative URL in an href or src on your soft 404 page. It's causing Google to infinitely crawl 404 pages |
No. the urls now resolve to a generic hard 404 page not found. The URLs listed in the 'linked from' tab in WMT return a fetch status of "not found" when Fetch as Google is used. I've set up the redirect incorrectly. All the pages should be resolving to example.com/a-zIndex.htm Instead, I am seeing something like:
example.com/a-zIndex.htmExampleDirectory1/ExampleDirectory36/ExampleDirectory8/ExampleDirectory65/anotherPage.htm which returns the 404. I would expect them all to gradually disappear as the links to them become removed but I am seeing the opposite - it's as if the redirect is not working for Googlebot.
| 11:11 pm on Jan 21, 2014 (gmt 0)|
A few months back there was a thread in the Apache subforum started by someone who wanted to screen out every possible type of bad request, whether or not they'd ever happened. One category of "things you don't need unless you need them" is the IgnorePathInfo setting. By default, anything appended to an URL in .html will be ignored, so everything resolves. This is not a problem ... until the day someone asks for such a bogus URL. At that point, you need to set up a global redirect
alone. Exact formulation will depend on whether you're on Apache or IIS. There's no need to constrain it to the googlebot; you want to redirect everyone, so checking a condition is needless work for the server.
| 7:39 pm on Jan 22, 2014 (gmt 0)|
I converted our website into HTML5 format (with pages ending with .html) and still see at GWT "Not found 404" for old pages that ended .htm in spite of the fact that I used redirect. So, I understand that no matter what we do G gives Not found 404 error message for pages that do not exist anymore.
| 10:54 pm on Jan 22, 2014 (gmt 0)|
Do you mean that every single page that used to be .htm is now .html? If so, then any reported 404 means the redirect isn't taking place.
I suppose it's too late to point out that there was no earthly reason to change either your visible URLs or the physical file extensions :(