Forum Moderators: goodroi

Message Too Old, No Replies

No life without robots.txt file

This may be old hat to you but it knocked me for six

         

folkranger

9:45 pm on Feb 12, 2008 (gmt 0)

10+ Year Member



My four year old site dissapeared from google searches 2 months ago. I tried everything to find the problem. Had I been bombed by google,if so why. In deperation I changed almost everything. No joy.

Then last week I found a message in google webmaster tools "can't find robots.txt file". That's because I've never had one.

Well I added the file with "all allowed" and within days google accessed my site and immediately I went back to page 1 search results.

I thought that robots.txt was only to dissallow but google say they look for this first and if not found they won't crawl the site, hence losing ranking.

I am probably saying something that you all already know but I was knocked for six at the devastating effect of not having this file. Especially as I managed perfectly well without it until December.

Lord Majestic

9:51 pm on Feb 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



google say they look for this first and if not found they won't crawl the site, hence losing ranking.

No, they don't say it and won't say it - most websites don't have any robots.txt at all and they rank just fine.

folkranger

10:02 pm on Feb 12, 2008 (gmt 0)

10+ Year Member



I bow to his majestic lordship. Maybe I read in a forum that google looks for this first and won't crawl if not found. I did read it somewhere.
What webmaster tools dis say was
1) something in my site was peventing them accessing it (last accessed Dec 2 2007.
2) they could not find my robots.txt file.

Having tried many changes since December I added a robote.txt file last week and was accessed by google within days with immediate results.

I would appreciate any advice as to what else may have cause my problem and it's solution as I am only a novice.

Achernar

10:06 pm on Feb 12, 2008 (gmt 0)

10+ Year Member Top Contributors Of The Month



Not finding (404) a robots.txt is not a problem. But you're in trouble if your server is misconfigured, and returns a different error number.

Lord Majestic

10:17 pm on Feb 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe I read in a forum that google looks for this first and won't crawl if not found. I did read it somewhere.

That was 100% incorrect then - if anything Google (and other search engines) welcome opt-out nature of robots.txt that allowed the Web search to be what it is today. If robots.txt was required to be present for crawling then 80% of the Web won't be indexed.

folkranger

10:21 pm on Feb 12, 2008 (gmt 0)

10+ Year Member



When my site was not reachable by google search you could still get to it by keying in the full url.

Keying in any keywords, including the name of the site failed to bring up anything at all.

It was still there with most other search engines.

Not sure what you mean by nisconfigured server.

folkranger

10:24 pm on Feb 12, 2008 (gmt 0)

10+ Year Member



equally not even sure what I mean by nisconfigured server.
(dyslexic finger)

folkranger

10:29 pm on Feb 12, 2008 (gmt 0)

10+ Year Member



So it's not the robots.txt file or its absence. It must be something else.

I'm delighted that its up and running again but worried that if I don't know why, it could happen again.

Cheers

phranque

7:05 am on Feb 13, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], folkranger!

next time this happens you should try the google robots.txt analysis tool [google.com].

folkranger

9:02 pm on Feb 13, 2008 (gmt 0)

10+ Year Member



Thankee Phranquee, always welcome advice. Only thing is I didn't ever have a robots.txt file before and it worked well unitl recently. So it wasn't that file that was at fault.

folkranger

9:22 pm on Feb 13, 2008 (gmt 0)

10+ Year Member



Lord Majestic

Here is the message picked up from Google Webmaster Tools that caused me to think that the absence of a robots.txt file prevented Google crawling my site. As I said before I have never had such a file so I wonder if you could advise me how to interpret this message?

URL unreachable /robots.txt unreachable

Before we crawled the pages of your site, we tried to check your robots.txt file to ensure we didn't crawl any pages that you had roboted out. However, your robots.txt file was unreachable. To make sure we didn't crawl any pages listed in that file, we postponed our crawl. When this happens, we return to your site later and crawl it once we can reach your robots.txt file. Note that this is different from a 404 response when looking for a robots.txt file. If we receive a 404, we assume that a robots.txt file does not exist and we continue the crawl.

Demaestro

10:05 pm on Feb 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Before we crawled the pages of your site, we tried to check your robots.txt file to ensure we didn't crawl any pages that you had roboted out. However, your robots.txt file was unreachable. To make sure we didn't crawl any pages listed in that file, we postponed our crawl. When this happens, we return to your site later and crawl it once we can reach your robots.txt file. Note that this is different from a 404 response when looking for a robots.txt file. If we receive a 404, we assume that a robots.txt file does not exist and we continue the crawl.

Looks like your server wasn't returning a 404 Not Found and that was the problem.

It even says if we get a 404 we will crawl all. So it must have gotten something else... my guess is it was returning a 503 Service Unavailable... I guess that because they use the word "unreachable"... try hitting a page called robot.txt (no s) and see what the server returns..... it should be a 404, if not then you have found your issue..

[edited by: Demaestro at 10:06 pm (utc) on Feb. 13, 2008]

phranque

2:58 am on Feb 14, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



robots.txt:
here = good
not here = good
there = bad
maybe(not) here = bad

whether or not you have a robots.txt file, you still need to figure out exactly why google can't figure out if you have one.

it's very simple - you do a HTTP GET of http://www.example.com/robots.txt and you check the HTTP RESPONSE status chain.
your initial response must be either a 200 or 404 status.

are you on shared hosting?

wild guess here...
someone installed an alternate 404 page two months ago that wasn't returning a 404 status.

folkranger

6:43 pm on Feb 15, 2008 (gmt 0)

10+ Year Member



Demaestro
Thanks. Tried the robot.txt and got 404

Could it be that my server was down when crawled?

Lord Majestic

7:07 pm on Feb 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



folkranger: it is possible, as people above suggest, that if robots.txt is not there then they want to get 404 code to be sure crawling is allowed with all other error codes treated as possible issue on the site so in order to avoid crawling disallowed urls they choose not to crawl any at all, at that time. Very wise strategy I have to say.

folkranger

7:07 pm on Feb 15, 2008 (gmt 0)

10+ Year Member



phranque
I did my [mysite...] and it came up with the file ok.

By shared hosting do you mean like I am on a paid server rather than on my own server. If so yes, shared.

Google hits to my website climbed from virtual zero to 30 per day from the day google again crawled my site with the new robots.txt file in place. Funny thing though, MSN searches (according to google stats) fell from a regular couple per day to zero. Not a big problem but MSN has now flatlined at zero for 7 days.

Its a funny old world. Win some - lose some

folkranger

7:12 pm on Feb 15, 2008 (gmt 0)

10+ Year Member



L. M.
I agree, wise move. But it returned a 404 when I tried the misspelling so that may not be the problem. Obviously I don't know if a 404 has always been returned of if the server was screwed up at some point.

folkranger

7:17 pm on Feb 15, 2008 (gmt 0)

10+ Year Member



I've got to say I'm on cloud nine at the moment having my site returning results. I'm an absolute beginner and I'm also now very wary because google can turn you off just like that!