homepage Welcome to WebmasterWorld Guest from 54.161.155.142
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Crawl Errors - How to manage them effectively?
shaunm




msg:4517484
 11:12 am on Nov 9, 2012 (gmt 0)

Hi,

I am getting a lot of errors from a directory in my websites. This directory pages are not indexed but I am getting the errors through crawl section in Google webmaster tool.

What do I do now?

Can I just go ahead and add 'Disallow: /directoryname/' in my robots.txt file? Will you suggest that? Will that stop Google from accessing that particular directory when it arrives for crawling the next time? So eventually, I won't be seeing those errors anymore right?


Thanks!

 

phranque




msg:4517485
 11:23 am on Nov 9, 2012 (gmt 0)

what type of errors are you getting?

shaunm




msg:4517487
 11:34 am on Nov 9, 2012 (gmt 0)

@phranque
Thanks for stopping by :)

It shows some weird Urls which are not part of my site. But I strongly believe because of the malfunctioning of the CMS

These errors were tagged as 'not followed errors' by Google webmaster tool.

When I mouse over it it says 'These urls has active content or there might be a problem with the redirects' When I click on more info it goes to this page of Google : https://support.google.com/webmasters/bin/answer.py?hl=en&answer=2409684

Now, when I come to the Urls reported as having problems.

I seem all of them with a response code of 301. And the Url looks lie something sort of example.com/errordirectory/actualpage?page1/page2/page4/page45 blah blah

But when I click on the Url, it's going to their clean version or urls.

Please help me in fixing these errors more effectively.

Thanks

phranque




msg:4517502
 12:50 pm on Nov 9, 2012 (gmt 0)

I seem all of them with a response code of 301. And the Url looks lie something sort of example.com/errordirectory/actualpage?page1/page2/page4/page45 blah blah

"you see?" or "google reports?"
have you tried those URLs in GWT's "fetch as googlebot"?

But when I click on the Url, it's going to their clean version or urls.

"their"?
whose?
what response code do YOU get when you request the reported URLs?
if you get a 301 what does the Location: URL look like?

shaunm




msg:4519607
 6:59 am on Nov 16, 2012 (gmt 0)

@phranque
Thanks and sorry for delayed reply.

"you see?" or "google reports?"

This is what happens.
I am logged into my GWT account. I clicked on one of my websites profile and got to its dashboards.

When I go to 'health' -> 'Crawl errors', there are these tabs 'sever error' 'soft 404' 'Access denied' 'Not found' 'Not followed' 'Others'

I know about all other tabs except this 'Not followed' thing, so when I mouse over to that tab I am seeing around 35k URLs reported as 'Not followed'. As you know, Google only shows 1000 of them in that page.

There are 3 columns in that page: URL, Response Code, Detected.

All the URLs(1000) are appearing with a response code of 301.

Now when i click on each URL, another window opens with tabs such as 'Error details' 'In sitemaps' 'Linked from'. Above all, the complete URL.

It also shows a message as follows
There was a problem with active content or redirects. More info.
Google couldn't follow your URL because it redirected too many times.


But when I check personally, there isn't multiple redirects but only a single redirect.

Among those 1000 URLs reported as 301, there are URLs that doesn't exist in my site. I mean it looks more of a URL parameters.

If the actual URL is example.com/service/review.aspx, the reported URL is example.com/service/review.aspx?~99566565/page2/page3/page4=

When I copy it and paste it in any web browser, it redirects 2 times and finally land in a clean short URL. At the same time when I do 'fetch as google' it gets a 'Success' status. How come Google reports a URL as 301, at the same time when I fetch as Google it shows 'Success'?!? But the same URL is redirect to another URL when I copied it and paste in a Web browser?

Thanks for helping me out.

phranque




msg:4519612
 7:10 am on Nov 16, 2012 (gmt 0)

have you used a header checker to verify it is only 2 redirects?
are they both 301s?
does one go through another domain or hostname?
why isn't it 1 redirect?
=8)

i would be surprised to hear google doesn't follow 2 redirects unless there is some reason not to trust one of them.

after you tried fetch as googlebot did you use Submit to Google index?
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=158587 [support.google.com]

it's not uncommon for GWT to report errors incorrectly.

shaunm




msg:4519614
 7:17 am on Nov 16, 2012 (gmt 0)

Thanks for the quick response!

I used SEOBook's http status checker and yes, it's only 2 redirects. And both are 301.

Nope, the redirect happens in the same domain. I am analyzing the chain redirects to make it 1 redirect. But still, doesn't Google follow a URl with 2 redirects? Google says they accept upto 2 redirects.

I didn't use Submit to Google index. It's not an actual URL so that I didn't do that.

it's not uncommon for GWT to report errors incorrectly.

:))))

lucy24




msg:4519623
 7:59 am on Nov 16, 2012 (gmt 0)

Can I just go ahead and add 'Disallow: /directoryname/' in my robots.txt file? Will you suggest that? Will that stop Google from accessing that particular directory when it arrives for crawling the next time? So eventually, I won't be seeing those errors anymore right?

Won't do any good now. Google never forgets an URL. The URLs would simply be shifted from the "not followed" tab to the "blocked by robots.txt" tab.

'server error' 'soft 404' 'Access denied' 'Not found' 'Not followed' 'Others'

Gosh. Some of those I've never even seen.

But still, doesn't Google follow a URl with 2 redirects? Google says they accept up to 2 redirects.

It's got to be at least three. Mechanical redirects alone can be two separate steps (with/without www and directory-slash) if you've got a sloppy host and/or carelessly written htaccess. Add one more if you throw a "real" redirect into the mix. If they excluded all sites that weren't optimally coded, all results for all queries would drop into the triple digits :)

But the same URL is redirect to another URL when I copied it and paste in a Web browser?

Do you have any rules about handling requests in different ways depending on the referer? Or cookies? Almost all search engines come in with no referer-- and of course no cookies. But some of your pages probably expect people to be coming in from another page, or with some kind of background.

:: idly wondering about disparity between "1 error/25 attempts" robots.txt listed in WMT under robots.txt fetch, and 0 errors/1 attempt shown in logs for same date ::

:: not-so-idle wondering about enormous number of different Googlebot-Mobile from correct IP ::

Whoops. Sorry. I'm outta here.

phranque




msg:4519632
 8:45 am on Nov 16, 2012 (gmt 0)

you need to find out where google discovered those urls.
check your logs for referred traffic to those urls.
use link intelligence tools.
you may be able to solve the url problem at the source.
or maybe you'll decide to 404 those urls.
otherwise i would fix the redirect to make the canonical request in one hop.

shaunm




msg:4519650
 10:11 am on Nov 16, 2012 (gmt 0)

@lucy24,
Thanks!
Your answers are always very complicated to understand, but I very much like it :P

@phranque
Is that that Majestic SEO tools? There are many in link intelligence tools, which one should I try for this problem?
Btw, thank you so much for your other suggestions.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved