Forum Moderators: phranque

Message Too Old, No Replies

.htaccess is rewriting my robots.txt!

         

Potsie88

3:33 am on Aug 20, 2009 (gmt 0)

10+ Year Member


My .htaccess file has rewrite commands on it. I thought it was corrupted so I tried to take it down, but none of my interior pages were displaying. It seems to create a robots.txt that disallows access to my entire site...even when I upload a robots.txt file that allows all access. Here is what my .htaccess file is now...

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

I welcome you much appreciated help in solving this mystery! Thank you!

jdMorgan

3:40 am on Aug 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is the "standard" WP code. It rewrites *any* non-blank requested URL-path to WP if that requested URL-path does not resolve to an existing file or directory.

The solution is to upload a robots.txt file, and then to completely flush (delete) your browser cache, so that your browser will be forced to send a request for a new "copy" of the page/file to your serve, and will not show you a previously-cached page/file and server response (this being the likely reason that you thought your robots.txt file "didn't work").

Jim

Potsie88

3:58 am on Aug 20, 2009 (gmt 0)

10+ Year Member



Thanks for a speedy response Jim! I have been at this all day! I just finished this site and wanted to add a google sitemap. I used the WP plugin to create an XML sitemap automatically, and with every update to the site. When I submitted my sitemap (as I've done with my other sites) it gave me an error that my robots.txt was preventing Googlebot from crawling my site! I've posted the original problem here with no responses [webmasterworld.com...] This was really odd as I never created a robots.txt file for this domain! So I then created one that allows all robots to crawl all areas of the domain, and uploaded it to the root directory. This new file did not solve the problem. My webhosting IT person is the one who said it was the .htaccess. There must be something rewriting it to specifically block the site from being crawled and indexed. I'm lost, where do I go from here? Maybe Google has a cache that needs to clear? I'll have to sleep on it. Thanks again for your response!

jdMorgan

7:27 pm on Aug 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1) Your robots.txt Disallows all robots from fetching all pages.
2) Google cannot "write" to your robots.txt file, so either:
a) You're not uploading your robots.txt file to the correct directory, and the robots.txt file that contains the sitemap line remains in the correct directory despite your uploading a new one.
b) Google is messed up, and says you've got a sitemap declaration in your robots.txt file when in fact you do not.

In order to allow robots to crawl your site, your robots.txt file should be changed to read:


User-agent: *
Disallow:

This will disallow "nothing", and so allow everything to be fetched.
For more information from the original source, see A Standard for Robot Exclusion [robotstxt.org].

Whether you want to add the sitemap declaration is up to you.

Jim

Potsie88

7:40 pm on Aug 20, 2009 (gmt 0)

10+ Year Member



Thanks for the help Jim. I'm not sure what in the world was going on and my hosting company sent me on a wild goose chase! In any case, my new robots.txt has been finally recognized by Google. Since this experience, I would say that Google has a cache that needs to be cleared on their end before the update will be visible to them. Not sure if that makes sense, but at least the problem is solved for me! Take care.