what type of errors are you getting?
Thanks for stopping by :)
It shows some weird Urls which are not part of my site. But I strongly believe because of the malfunctioning of the CMS
These errors were tagged as 'not followed errors' by Google webmaster tool.
When I mouse over it it says 'These urls has active content or there might be a problem with the redirects' When I click on more info it goes to this page of Google : https://support.google.com/webmasters/bin/answer.py?hl=en&answer=2409684
Now, when I come to the Urls reported as having problems.
I seem all of them with a response code of 301. And the Url looks lie something sort of example.com/errordirectory/actualpage?page1/page2/page4/page45 blah blah
But when I click on the Url, it's going to their clean version or urls.
Please help me in fixing these errors more effectively.
|I seem all of them with a response code of 301. And the Url looks lie something sort of example.com/errordirectory/actualpage?page1/page2/page4/page45 blah blah |
"you see?" or "google reports?"
have you tried those URLs in GWT's "fetch as googlebot"?
|But when I click on the Url, it's going to their clean version or urls. |
what response code do YOU get when you request the reported URLs?
if you get a 301 what does the Location: URL look like?
Thanks and sorry for delayed reply.
|"you see?" or "google reports?" |
This is what happens.
I am logged into my GWT account. I clicked on one of my websites profile and got to its dashboards.
When I go to 'health' -> 'Crawl errors', there are these tabs 'sever error' 'soft 404' 'Access denied' 'Not found' 'Not followed' 'Others'
I know about all other tabs except this 'Not followed' thing, so when I mouse over to that tab I am seeing around 35k URLs reported as 'Not followed'. As you know, Google only shows 1000 of them in that page.
There are 3 columns in that page: URL, Response Code, Detected.
All the URLs(1000) are appearing with a response code of 301.
Now when i click on each URL, another window opens with tabs such as 'Error details' 'In sitemaps' 'Linked from'. Above all, the complete URL.
It also shows a message as follows
|There was a problem with active content or redirects. More info. |
Google couldn't follow your URL because it redirected too many times.
But when I check personally, there isn't multiple redirects but only a single redirect.
Among those 1000 URLs reported as 301, there are URLs that doesn't exist in my site. I mean it looks more of a URL parameters.
If the actual URL is example.com/service/review.aspx, the reported URL is example.com/service/review.aspx?~99566565/page2/page3/page4=
When I copy it and paste it in any web browser, it redirects 2 times and finally land in a clean short URL. At the same time when I do 'fetch as google' it gets a 'Success' status. How come Google reports a URL as 301, at the same time when I fetch as Google it shows 'Success'?!? But the same URL is redirect to another URL when I copied it and paste in a Web browser?
Thanks for helping me out.
have you used a header checker to verify it is only 2 redirects?
are they both 301s?
does one go through another domain or hostname?
why isn't it 1 redirect?
i would be surprised to hear google doesn't follow 2 redirects unless there is some reason not to trust one of them.
after you tried fetch as googlebot did you use Submit to Google index?
it's not uncommon for GWT to report errors incorrectly.
Thanks for the quick response!
I used SEOBook's http status checker and yes, it's only 2 redirects. And both are 301.
Nope, the redirect happens in the same domain. I am analyzing the chain redirects to make it 1 redirect. But still, doesn't Google follow a URl with 2 redirects? Google says they accept upto 2 redirects.
I didn't use Submit to Google index. It's not an actual URL so that I didn't do that.
|it's not uncommon for GWT to report errors incorrectly. |
|Can I just go ahead and add 'Disallow: /directoryname/' in my robots.txt file? Will you suggest that? Will that stop Google from accessing that particular directory when it arrives for crawling the next time? So eventually, I won't be seeing those errors anymore right? |
Won't do any good now. Google never forgets an URL. The URLs would simply be shifted from the "not followed" tab to the "blocked by robots.txt" tab.
|'server error' 'soft 404' 'Access denied' 'Not found' 'Not followed' 'Others' |
Gosh. Some of those I've never even seen.
|But still, doesn't Google follow a URl with 2 redirects? Google says they accept up to 2 redirects. |
It's got to be at least three. Mechanical redirects alone can be two separate steps (with/without www and directory-slash) if you've got a sloppy host and/or carelessly written htaccess. Add one more if you throw a "real" redirect into the mix. If they excluded all sites that weren't optimally coded, all results for all queries would drop into the triple digits :)
|But the same URL is redirect to another URL when I copied it and paste in a Web browser? |
Do you have any rules about handling requests in different ways depending on the referer? Or cookies? Almost all search engines come in with no referer-- and of course no cookies. But some of your pages probably expect people to be coming in from another page, or with some kind of background.
:: idly wondering about disparity between "1 error/25 attempts" robots.txt listed in WMT under robots.txt fetch, and 0 errors/1 attempt shown in logs for same date ::
:: not-so-idle wondering about enormous number of different Googlebot-Mobile from correct IP ::
Whoops. Sorry. I'm outta here.
you need to find out where google discovered those urls.
check your logs for referred traffic to those urls.
use link intelligence tools.
you may be able to solve the url problem at the source.
or maybe you'll decide to 404 those urls.
otherwise i would fix the redirect to make the canonical request in one hop.
Your answers are always very complicated to understand, but I very much like it :P
Is that that Majestic SEO tools? There are many in link intelligence tools, which one should I try for this problem?
Btw, thank you so much for your other suggestions.