Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

URLs Questions about Fixing Crawl Errors

         

iteri

5:05 pm on Sep 14, 2011 (gmt 0)

10+ Year Member



I'm new at SEO. I am learning as much as I can on my own. I work for an ecommerce site.
In Google Webmaster Tools I see quite a few crawl errors and I want to fix them. For instance, I see one of the 404 pages is actually a product we no longer carry. There are a couple of forums who have a link to it. One of the errors was just found last week but the forum article that has the link was from 2006. I do not understand that. Also, what's the best way to correct this type of crawl error?

g1smd

5:47 pm on Sep 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have several options:

1. You know this page doesn't exist. Your site returns a 404 error to confirm that. Ignore it and move on.

2. You have incoming links that are going to waste. Create some content and put it online that that URL and recover some traffic and incoming link benefit.

3. You have incoming links to a non-existant URL. Find another page on your site with similar content and install a 301 redirect to it to retain the traffic and retain some incoming link benefit.

iteri

6:20 pm on Sep 14, 2011 (gmt 0)

10+ Year Member



I am guessing option #3 would be the most beneficial...

g1smd

8:21 pm on Sep 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...but it depends on the site and the circumstances, and you're likely correct.

Sally Stitts

11:38 pm on Sep 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have over 200 WMT-reported malformed incoming links, such as -
example.com/pagename.ht
example.com/pagename.htm.
example.com/pagename.htm..
example.com/pagename..
example.com/pagenam..
etc., etc.

Google can clearly see the malformed URLs. They KNOW that they are no good, and they KNOW that it is not my fault. I guess they show us only for FYI purposes, just in case we want to spend all our time on endless minutia.

I just ignore them. It would take many hours to email the sites, and most would probably not respond anyway.

Addressing this through htaccess also does not appeal to me.
It is only 200 out of 35,000, about 1/2 of 1%, so who cares?
Am I wrong? Do you do something else?
.

tedster

5:01 am on Sep 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I only do something else if that malformed link is coming from a powerful page - and the first thing I'd try is writing to the website and asking them to fix it.

g1smd

6:13 am on Sep 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you have incoming links with appended trailing junk, you can use a couple of simple .htaccess rules to strip that junk and redirect. It is often worthwhile doing so.

If there are characters missing from a URL request you'll require a much more complex solution (usually involving a database lookup) to fix it.

tedster

6:26 am on Sep 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a link to some amazingly thorough code that jdMorgan shared for handling a TON of URL issues in .htaccess:

A guide to fixing duplicate content & URL issues on Apache [webmasterworld.com]

lucy24

8:04 am on Sep 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have over 200 WMT-reported malformed incoming links, such as -
example.com/pagename.ht
example.com/pagename.htm.
example.com/pagename.htm..

G### must have done some housekeeping recently because instead of the usual string of 404s (since corrected to 410 or 301) and server hiccups I got hit in the face with one of these and went to investigate.

Turns out the blasted thing isn't even a bad link; the actual link on the site is perfectly correct. (But should never have been seen by google, because it's one of those obscure search-result pages. Site's fault, not google's.) What they're reading as a faulty url is a truncated-for-space header line:

<span class="Heading-Sponsors">
www.example.com/directory/filename.ht...</span>

A format g### should certainly recognize, since they do the exact same thing themselves except that theirs is called <cite>.

If we are now supposed to be concerned about errors that don't exist in the first place...

g1smd

8:13 am on Sep 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In Google WMT I have
www.example.com/$1
listed as being 404, with almost every page of the site cited as linking to it.

After pulling chunks of hair looking for some sort of major .htaccess screw up, I eventually realised they are getting this "URL" from a chunk of Javascript at the top of the HTML page.

iteri

3:15 pm on Sep 19, 2011 (gmt 0)

10+ Year Member



Some of my URL errors listed by GWMT I can't find. Guess I'll have to keep digging deeper....

lucy24

4:26 pm on Sep 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, I love it when they complain about a nonexistent page and then put "unavailable" in the Links column. Doesn't give you a lot to work on does it. You'd have to check for it in your logs and see if any human ever tries it-- and hope they don't have referers blocked.