Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Corrected broken links - but Google WMT still reports "not found"

         

cssatsc

3:21 pm on Jun 28, 2009 (gmt 0)

10+ Year Member



I hope that I am posting in the right forum. If not, please direct me to the most appropriate one.

I am relatively new to Google's webmaster tools but in my troubleshooting efforts to find out why some of my site's pages don't get indexed, I discovered the "Crawl errors" section in Google's webmaster tools.

While all "Crawl errors" categories show 0 errors, the "Not found" category shows 7 errors.

I checked those and, sure enough, these were broken *external* links that some posted in various other websites. So, I redirected them using Redirect 301 in .htaccess. Now, non of those 7 links generate a 404 "page not found" error.

Still, Google's webmaster tools continues to report them as "not found", several days after the corrections have been made.

Any idea why?

How can I reach the ideal state of 0 "not found" errors?

Thank you.

tedster

5:33 pm on Jun 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First, are you sure that googlebot has spidered those urls since they were fixed? But even if they were spidered, the Webmaster Tools reporting is often slow to respond.

In fact, it's not always practical to correct things for external links that result in a 404, so it's not a problem anyway. The report is there as more of an "FYI".

cssatsc

5:50 pm on Jun 28, 2009 (gmt 0)

10+ Year Member



First, are you sure that googlebot has spidered those urls since they were fixed?

No, I am not sure. I don't even know how to tell which URLs were spidered. How do I find out?

tedster

6:15 pm on Jun 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You find out from your site's server logs.

cssatsc

6:54 pm on Jun 28, 2009 (gmt 0)

10+ Year Member



You find out from your site's server logs

Thank you. I have numerous logs on my site's server. Which ones should I look at? (sorry for being clueless. I am still in the learning process)

I mostly use awstats via cPanel, but I can login via SSH and display any file in text mode. It's a Linux based host.

Thanks.

tedster

6:59 pm on Jun 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For the raw logs, check with your web host. It depends on how they set things up.

AWStats can be configured to show "Robots visits including search engines crawlers." Again this will depend on how it is configured by your web hosting service.

cssatsc

6:59 pm on Jun 28, 2009 (gmt 0)

10+ Year Member



I just found "Raw Access Logs" on the "Logs" section of cPanel. It looks like what you were referring to. I am going to examine this shortly.

cssatsc

7:19 pm on Jun 28, 2009 (gmt 0)

10+ Year Member



OK, I checked the raw access logs and unfortunately I was so clueless that I never knew I had to configure my site's account to archive the logs. So currently, only the one from today is available. It shows only three crawled URLs from Googlebot/2.1.

I turned on the archiving feature. I will have to wait a few days to start deriving meaningful information.

Thank you for your help so far. I already learned a lot from it.

cssatsc

8:24 pm on Jul 2, 2009 (gmt 0)

10+ Year Member



First, are you sure that googlebot has spidered those urls since they were fixed?

Yes. Now, after looking at the raw access logs, I can say that googlebot has spidered those urls since they were fixed.

The interesting thing is that WMT reports only 5 broken links (instead of the original 7). That's already an improvement.

However, although the previously broken links now lead to valid pages when clicked, WMT still reports them as 404 (Not found).

Any idea why?

Should I bother about this at all?

If I shouldn't bother, how do I know what makes Google stop crawling? (I had a product that was not found by Google for almost a month, and only when I fixed those broken links it started showing up on the Google search results).

tedster

8:31 pm on Jul 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's just a reporting time lag - once you've fixed it, just move on.

g1smd

2:38 am on Jul 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They will, on an occasional basis return to check the status of those URLs from time to rime, almost forever. It is always possible that the URL might come back in to use at some time. Once you have your fix in place and verified using Live HTTP Headers, you can forget it.

Mrkay

10:12 am on Jul 6, 2009 (gmt 0)

10+ Year Member



Still, Google's webmaster tools continues to report them as "not found", several days after the corrections have been made.

Any idea why?

WMT strikes me as very slow to update itself. My WMT shows broken links that were repaired weeks ago, as well as duplicate meta descriptions that were repaired months ago.