Forum Moderators: open

Message Too Old, No Replies

Google, .htaccess 404 redirect and robots.txt

not using robots.txt -- am using .htaccess redirect

         

nonprofit

2:44 pm on Apr 7, 2003 (gmt 0)

10+ Year Member



I just learned about how to use .htaccess to send people getting 404 errors (file not found) to another page on our site.

I played around with it a while and decided to send everyone getting a 404 back to our home page.

I don't have a robots.txt file for our site and that would normally produce a 404 error. But now google and other search engines requesting robots.txt will be sent to our front page.

Is this bad? Could it cause google to somehow misinterpret our home page as robots.txt instructions?

John_Caius

2:51 pm on Apr 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why don't you create a robots.txt file that allows all bots to crawl then?

nonprofit

3:11 pm on Apr 7, 2003 (gmt 0)

10+ Year Member



How would I do that -- create a robots.txt that allows all robots to crawl? I'm a little nervous about creating anything that might prevent search engines from indexing my site.

Birdman

3:15 pm on Apr 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ROBOTS.TXT
User-agent: *

That's it!

nonprofit

3:16 pm on Apr 7, 2003 (gmt 0)

10+ Year Member



Thanks!

jdMorgan

3:26 pm on Apr 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't think that will validate [searchengineworld.com]... Try:

User-agent: *
Disallow:


This disallows nothing, thus telling the 'bots to crawl the whole site. Include the blank line at the end.

Jim

Birdman

3:28 pm on Apr 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry to mis-inform you, nonprofit. Now to fix mine. Thanks JD!

Added: Okay, mine did validate. I just remembered I removed the second line of mine because I didn't think it was needed.

User-agent: *
Disallow: /cgi-bin/