Forum Moderators: open
Does this mean I've inadvertantly banned Googlebot from checking my site?
64.68.82.74 - - [19/Nov/2002:05:00:58 -0800] "GET /robots.txt HTTP/1.0" 403 208 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
I've been tweaking my .htaccess file, but have no idea what I could have done so wrong.
I should point out that Inktomi and the others are doing fine from what I can see.
I appreciate any help on this. Don't want to loose that PR 'ya know...
Thank You.
Pendanticist.
Even if you ban the GoogleBot, it will still come back and see if you still have a ban for it.
But i would say wait up to 2,5 weeks, and if it hasn't spidered any pages of your site, then something is probably wrong.
Could this have been the culprit?
RewriteCond %{HTTP_USER_AGENT} bot [NC,OR]
It was contained in my .htaccess file until I saw those 403s.
My robots.txt hasn't been modified for a long, long time, whereas .htaccess has been an ongoing tweak for the last few days.
GoogleBot visits my pages fairly frequently as a result of my maintenance of links. (In time I want to do that "Recent" thing I read of in other posts. You know, the one that saves GB some time by re-spidering only those files which have recently been modified.)
Yes jomaxx, it is the one in my profile.
Any suggestions/solutions will be appreciated.
Pendanticist.
I am pretty sure (cannot lay my hands on the relevant stuff) that according to the standard the robot should (not must) treat a 403 as denying it permission to spider the site. That was Google's policy until recently.
They observed that most 403 codes were due to configuration errors, and that as a result they often failed to spider sites whose owners wanted to be spidered. So recently they started ignoring 403 codes.
I certainly hope that's the case. <phew!> Suddenly that would make my day.
My .htaccess file is finally doing what it was supposed to do and that is block those nasty bots. (The GoogleBot thing was a mistake on my part.)
EasyDL/3.02 - Shut this one right down.
grub-client-0.3.0 - served 8 distinct IPs worldwide.
/_vti_bin/owssvr.dll - Renders 403 now too.
URL_Spider_Pro/3.0 - requested robots and accounting_forensic.html both 403'd.
These preliminary results look good. Uh, with the exception of the good bots, that is :-)
FAST-WebCrawler/3.6/FirstPage and ZyBorg/1.0 are good spyders, but I'll just have to wait for them to return. The remaining spiders should return in a day or two as well.
Inktomi (I think it is) can't seem to grasp the concept of following the redirect to the new destination page and storing it. Instead, it keeps asking for the same old pages every time, getting redirected every time and returning every time to do the same thing every time. <shrug>
So, if they all share GoogleBots directives, then I may not have lost anything at all. SE's bring me half of my traffic and I don't want that to just 'go away' because of one mistake of ignorance. Here we go crossing fingers until morning...
Thanks Again Mohamed_E.
Pendanticist.
Agreed with Mohamed_E on 403's (I'd forgotten); at least a year ago, may be longer. I think that SSL / HTTPS not stopping spiders came in about the same time.