homepage Welcome to WebmasterWorld Guest from 54.196.196.62
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Crawl Errors in Google Webmaster Tools
jo3y




msg:4507546
 5:17 pm on Oct 12, 2012 (gmt 0)

Hi everyone,

I am having a crawl error issue with Google Webmaster Tools. I have researched, tested, etc. etc. and I can not find the problem. It is reporting over 1000 crawl errors and climbing.

Google Webmaster Tools reports tons of pages that do not exist due to a wrong path. When I go to the page listed as having the bad link everything looks correct. I even went as far as making all the links on the entire site absolute links(full URL), and I still get the errors. Here is what it's reporting:

Pages like this that do not exist(wrong path).

http://www.example.com/details/B7427/parts/5/images/Claimer%202011%20Front.pdf

The proper directory for that file is:

http://www.example.com/images/Claimer%202011%20Front.pdf

So it seems to be linking to it like it's a relative path because here is the URL that is reporting to have the bad link:

http://www.example.com/details/B7427/parts/5/Ford.html

All the links on the page above(page that allegedly has the bad links) are absolute path links. I am at a loss.

The other possibility is all my pages run through index.php with GET variables directing the content, so I have rewrite rules in place for more user friendly URLs. Could my rewrite rules be the cause of this? Here are my htaccess rewrite rules:



RewriteEngine On
RewriteRule ^([^/]*)\.html$ /index.php?content=$1 [L]
RewriteRule ^([^/]*)/([^/]*)\.html$ /index.php?content=$1&manufacturer=$2 [L]
RewriteRule ^([^/]*)/([^/]*)/machine%20shop/\.html$ /index.php?content=$1&service=$2 [L]
RewriteRule ^([^/]*)/([^/]*)/([^/]*)\.html$ /index.php?content=$1&product=$2&service=$3 [L]
RewriteRule ^([^/]*)/([^/]*)/([^/]*)/([^/]*)\.html$ /index.php?content=$1&MID=$2&category=$3&manufacturer=$4 [L]
RewriteRule ^([^/]*)/([^/]*)/([^/]*)/([^/]*)/([^/]*)\.html$ /index.php?content=$1&part=$2&category=$3&MID=$4&manufacturer=$5 [L]


Thank you guys in advance for any advice/assistance on this issue. Even though crawl errors don't play too big of a role in a site's ranking and performance, this seems like something is wrong and needs to be fixed.

Thanks again,

-Joey

[edited by: Robert_Charlton at 7:51 pm (utc) on Oct 12, 2012]
[edit reason] examplified domain [/edit]

 

aakk9999




msg:4507592
 8:01 pm on Oct 12, 2012 (gmt 0)

Mods will soon remove URL in your post - you should have used example.com.
And yes, mods have removed URL whilst I was typing the response...

Anyway, with regards to your question:

Firstly, it would be good if you crawl your site with a tool such as Xenu's Link Sleuth and then inspect the crawl results to confirm that there is indeed no such link on your site.

If you do not find anything using the crawl, then the next place to look with such kind of errors is whether the URL is somewhere constructed via javascript, and if so, whether so constructed URL perhaps does not include first slash from root. Javascript is often missed when checking for URLs that should not be generated.

It is also possible that link existed in the past and that you fixed it since, but Google is still requesting it. If Google has not re-crawled the page that had incorrect link (the page that you fixed), it will think the link is still there out on the web and from my experience these links get crawled more often and reported as errors in WMT.

And then it is also possible that someone linked to that page whilst the link was not in correct format and therefore such link exist somewhere else on the web - and Google will be re-trying it and reporting the error.

Once you fix the error (which appears you have), and the fixed page has been re-crawled by google, then I would declare WMT error "Fixed". From my experience, once the page that returns 404 is not linked from anywhere, and you cleared error in WMT via "Fixed", then the error for this URL will not appear again in WMT error report (this may take a while and may need a several cycles of declaring URL "Fixed"). But eventually Gooogle will drop this error from WMT errors report.

It is also worth knowing that if there is another page (on your domain or on external domain) that links to your page using that old incorrect URL format, then the 404 error will re-appear in WMT even if you declare it as "Fixed" (in which case you will probably see which page links to it, i.e. where did Google found it).

However, if the error reported in WMT is 404, I would not worry as long as your site is not linking to such URL. The 404 report in WMT is useful exactly for this reason - to check whether your own site is inadvertly linking to incorrect URL format. Otherwise, just ignore the error.

g1smd




msg:4507604
 8:33 pm on Oct 12, 2012 (gmt 0)

If internal links begin with a leading slash (or begin with protocol and hostname) then you have "fixed" the problem.

Google will request the duff URLs forever. Make sure they return 404 and then move on. The frequency of requests will diminish over time.

The
* in ^([^/]*)/([^/]*)/([^/]*)/([^/]*)\.html$ allows a request for example.com////.html to be considered valid and be rewritten (with blank parameter values passed to the PHP script). The * should be replaced with a + in each case.

The final
[^/] in each rule should be [^/.] as you're looking to stop at the period before the extension, no longer looking only for folder slashes.
jo3y




msg:4507632
 10:33 pm on Oct 12, 2012 (gmt 0)

Thank you guys very much. I think the consensus is to wait it out and see if they diminish. All the links are absolute, verified through viewing the source, so we will see! Also thank you for the mod rewrite advice. I made the changes.

You guys have always rocked! :D

g1smd




msg:4507640
 10:51 pm on Oct 12, 2012 (gmt 0)

This also looks problematical...

/machine%20shop/\.html$

Not only the space, but the slash immediately before the .html extension.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved