Forum Moderators: goodroi
Then last week I found a message in google webmaster tools "can't find robots.txt file". That's because I've never had one.
Well I added the file with "all allowed" and within days google accessed my site and immediately I went back to page 1 search results.
I thought that robots.txt was only to dissallow but google say they look for this first and if not found they won't crawl the site, hence losing ranking.
I am probably saying something that you all already know but I was knocked for six at the devastating effect of not having this file. Especially as I managed perfectly well without it until December.
Having tried many changes since December I added a robote.txt file last week and was accessed by google within days with immediate results.
I would appreciate any advice as to what else may have cause my problem and it's solution as I am only a novice.
Maybe I read in a forum that google looks for this first and won't crawl if not found. I did read it somewhere.
That was 100% incorrect then - if anything Google (and other search engines) welcome opt-out nature of robots.txt that allowed the Web search to be what it is today. If robots.txt was required to be present for crawling then 80% of the Web won't be indexed.
next time this happens you should try the google robots.txt analysis tool [google.com].
Here is the message picked up from Google Webmaster Tools that caused me to think that the absence of a robots.txt file prevented Google crawling my site. As I said before I have never had such a file so I wonder if you could advise me how to interpret this message?
URL unreachable /robots.txt unreachable
Before we crawled the pages of your site, we tried to check your robots.txt file to ensure we didn't crawl any pages that you had roboted out. However, your robots.txt file was unreachable. To make sure we didn't crawl any pages listed in that file, we postponed our crawl. When this happens, we return to your site later and crawl it once we can reach your robots.txt file. Note that this is different from a 404 response when looking for a robots.txt file. If we receive a 404, we assume that a robots.txt file does not exist and we continue the crawl.
Before we crawled the pages of your site, we tried to check your robots.txt file to ensure we didn't crawl any pages that you had roboted out. However, your robots.txt file was unreachable. To make sure we didn't crawl any pages listed in that file, we postponed our crawl. When this happens, we return to your site later and crawl it once we can reach your robots.txt file. Note that this is different from a 404 response when looking for a robots.txt file. If we receive a 404, we assume that a robots.txt file does not exist and we continue the crawl.
Looks like your server wasn't returning a 404 Not Found and that was the problem.
It even says if we get a 404 we will crawl all. So it must have gotten something else... my guess is it was returning a 503 Service Unavailable... I guess that because they use the word "unreachable"... try hitting a page called robot.txt (no s) and see what the server returns..... it should be a 404, if not then you have found your issue..
[edited by: Demaestro at 10:06 pm (utc) on Feb. 13, 2008]
whether or not you have a robots.txt file, you still need to figure out exactly why google can't figure out if you have one.
it's very simple - you do a HTTP GET of http://www.example.com/robots.txt and you check the HTTP RESPONSE status chain.
your initial response must be either a 200 or 404 status.
are you on shared hosting?
wild guess here...
someone installed an alternate 404 page two months ago that wasn't returning a 404 status.
By shared hosting do you mean like I am on a paid server rather than on my own server. If so yes, shared.
Google hits to my website climbed from virtual zero to 30 per day from the day google again crawled my site with the new robots.txt file in place. Funny thing though, MSN searches (according to google stats) fell from a regular couple per day to zero. Not a big problem but MSN has now flatlined at zero for 7 days.
Its a funny old world. Win some - lose some