Forum Moderators: open
googlebot has been coming to my site atleast thrice everyday for the last 10 days, but the only trace it leaves in my logs is
216.239.46.166 - - [06/Dec/2002:05:41:33 +0530] "GET /robots.txt HTTP/1.0" 403 - "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
with a variation in the ip...it is of the form 64.68.82.*
I don't have robots.txt on my server.....searching google's ip, i see that it only seeks the robots.txt and then vamooses...
Off the cuff, it looks like google is not deep crawling my site....*BUT* where I am getting stumped is that why is the code being generated in the message is 403 (which means error forbidden i think) instead of the familiar 404 (not found)...
Also it is not even looking at the index.html file, let alone the other files...
is this normal...i mean coming to my site thrice a day and not even looking at even the index file? could it be that my site administrator has done some mischief to ban access to robots.txt file....? I am able to open this file in the browser though...
thanks
[edited by: gujgifts at 6:38 pm (utc) on Dec. 13, 2002]
If this is the site in your profile, then something has changed. Using the WebmasterWorld server header checker, the Search Engine World Robots.txt validator, and WannaBrowser, I see the following:
You have a valid robots.txt on your site, with a single subdirectory disallowed.
Requests for robots.txt return 200-OK responses to Googlebot's User-agent.
Requests to [yourdomain.com...] are redirected to [yourdomain.com...] correctly.
Unless you have forbidden Googlebot by IP address (which I can't test), it looks like it works to me...
Jim
I never imagined that you guys would go to so much trouble to help me out...WebmasterWorld rocks!
I did put a robots.txt and disallowed one redundent directory as some seo's said that disallowing nothing may be interpreted as disallowing everything by some bots...
btw, if google can ask for robots.txt, surely it cannot be banned by IP right? I am asking as i want to make sure that my site administrator is not doing any mischief...
btw, i saw your request for my root using wannabrowser and was majorly happy for bout 15 mins thinking that Googlebot is here to cache my homepage! Hope the real one turns up soon....
another question: even if it skims my site, shouldn't it atleast ask for my root or index apart from the robots.txt file...?
ciao
Your site appears to be indexed by Google, and I see a PR5 for your home page. I can't say why your pages are not cached, or why Google might have received a 403 on robots.txt.
We are not supposed to do site reviews here, but since I was already in there I'll offer a couple of suggestions:
1) Dump the revisit-after meta tag. Historically, only one search engine in Canada ever used it, and that one died years ago. All it does is take up space and dilute the value of the text on your page.
2) Add more text to the top of your pages, like maybe an introductory paragraph. You've sort of got a little of that now, but the text is "hidden" inside images - and spiders don't read images.
3) Add a DOCTYPE statement to the top of your pages to tell browsers what flavor of html you are using, and run your pages through the validators at w3c.org to make sure they validate.
Jim