Forum Moderators: goodroi

Message Too Old, No Replies

Google problem with robots.txt

Webmaster tools isn't helping.

         

wesg

3:03 am on Jan 30, 2008 (gmt 0)

10+ Year Member



Finally I have made it to this forum, and hope that I can get a solution to my problem.

I started a blog in December, it was indexed by Google often when I started with an XML sitemap. Within the last 2 weeks, though, I have dropped out of the rankings, because Google doesn't connect with my robots.txt file. The funny thing is that I never had one originally, then added one to help the crawling process.

Google now continually says "robots.txt" timeout in the webmaster tools, and will not find the robots file. I have tested it with header checkers, and have visited the robots file myself, and each time I received a 200 reply that it was good. I don't know what to do now, because my web host doesn't seem to believe the problem is on their end.

Any suggestions? I am grateful for your help.

vincevincevince

3:05 am on Jan 30, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Delete the robots.txt file perhaps?

phranque

5:38 am on Jan 30, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], wesg!

have you tried any GW tools for robots.txt analysis [google.com]?

wesg

3:08 pm on Jan 30, 2008 (gmt 0)

10+ Year Member



Thanks for the responses.

Yes, I've deleted the robots.txt file, and the strange thing is that I get redirected to a folder called robots.txt/

I've used a header tester to check the file, and it returns 301 with the file deleted, and 200 when it is there. With the robots.txt file in place, Google wm tools says it can't connect to the file. Not a 404, but a confused Googlebot.

It's for these reasons that I think my web host has blocked Google somehow (IP is 66.249 I think).

Thanks for the help!

phranque

11:18 pm on Jan 30, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



what type of server are you on?
i'm guessing there is an RewriteRule in your configuration that needs to be conditionally avoided.
have you checked your HTTP response status chain for the robots.txt request?

wesg

11:22 pm on Jan 30, 2008 (gmt 0)

10+ Year Member



I have a rewrite rule for my Wordpress blog permalinks set up in my .htaccess along with a custom 404 document. The site is hosted on Red Hat, as far as I know.

If you need the contents of my .htaccess file, I'll let you know, but there isn't anything in it about blocking or ignoring.

phranque

2:24 am on Jan 31, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



please post the examplified .htaccess content.
(use generic keywords and example.com if necessary)

wesg

4:52 am on Jan 31, 2008 (gmt 0)

10+ Year Member



I have a wordpress blog in the directory /blog

HTACCESS

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule . /blog/index.php [L]
</IfModule>

ErrorDocument 404 /index.php

phranque

6:31 am on Jan 31, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



is that the .htaccess file in the /blog/ directory?
do you have a .htaccess file in the server root?
is your robots.txt file in the /blog/ directory or the server root?
or both?

wesg

4:30 pm on Jan 31, 2008 (gmt 0)

10+ Year Member



Checking my FTP structure, it appears I have this htaccess file in the root folder, and another identical one in the /blog directory. I have changed the name of the blog htaccess now so that it is not valid.

phranque

10:34 pm on Jan 31, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you should now test your robots.txt file in google webmaster tools to insure google can index your blog urls properly.

wesg

10:02 pm on Feb 1, 2008 (gmt 0)

10+ Year Member



The problem is still not solved, and I have asked my host repeatedly if it is an IP block. They seem adamant that nothing is blocked, but I'm not so sure. Nothing is in my error logs (probably because it is a new month) and checking my access log I see Yahoo Slurp and other bots, but no google. This has to be a firewall something doesn't it?

londrum

10:24 pm on Feb 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



try and visit the file yourself in your browser and see if it comes up
http://www.example.com/robots.txt

wesg

10:52 pm on Feb 1, 2008 (gmt 0)

10+ Year Member



Visiting the different files (robots.txt and sitemap.xml) is possible. I have even checked the headers from Googlebot and other user agents, and they come back 200 too.

Is it possible to test a file from specific IP address? Ie. can I copy Google's IP and watch the response I get? That would definitively give me an answer.

jdMorgan

12:28 am on Feb 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> can I copy Google's IP and watch the response I get?

Yes, but no.

You could fake up a packet with Google's IP address, but then your server would send the response to Google, not to you. And since Google did not send the request, their firewall is likely to reject (ignore) your server's response. Not much use...

This sounds like the result of a bug in Google's report of your robots.txt status.

Jim

wesg

1:55 am on Feb 2, 2008 (gmt 0)

10+ Year Member



In my server root directory, i have an htaccess file with these contents for a wordpress rewrite and custom 404.

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule . /blog/index.php [L]
</IfModule>

ErrorDocument 404 /index.php

phranque

2:47 am on Feb 2, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



and still no .htaccess file in /blog/?

wesg

2:50 am on Feb 2, 2008 (gmt 0)

10+ Year Member



I have removed the htaccess file in the /blog folder.

jdMorgan

6:43 am on Feb 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note the RewriteBase directive in that root .htaccess, guys...

Jim

wesg

3:16 pm on Feb 2, 2008 (gmt 0)

10+ Year Member



I was right!

Though my hosting provider had a hard time believing, there WAS in fact an IP block in the firewall for Google that was preventing access. It has been removed, and Google has successfully downloaded the robots file. I'll now be waiting for my sitemap to download, and I'll be back in business.

Thanks for all the responses, everyone, I will definitely be back to this forum should I have another problem.

bwnbwn

8:16 pm on Feb 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a question here. I am in contact with a tv reporter who buys from our ecommerce site.
He sent me his blog and station url's and said he would add me to them. Wonderful news I need some link love especially frrom a high powered tv site...:)

When I go to his blog in IE7 it connects then about 3 secounds latter I get kicked out to my perferred search engine with the url

http :// http :// www his site com/robots.txt like I typed this url in the browser.

It is like google searches the robots.txt file before loading the site and seeing or getting a bad url from the robots.txt it is kicking me out. Really strange.

I asked him who did the file he doesn't have a clue and said he didn't do it it looks like a custom file but I am not sure if wordpress comes with this file already for install with the purchase of the blog.

Doesn't do it in firefox only windows and he said I am the only one that ever said that. I wonder if it is because I have more enabled in my google tool bar than the normal person does and the Google bar searches out the robots text file when I visit a site.

I am just wondering here as I was reading this thread and figured I could ask the question since he has found his issue.

I have never run across this before and am really puzzled by this.