Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt File to Block from Dynamic URLs

Trying to eliminate duplicate content...

         

BillyS

6:48 pm on Oct 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got a website that publishes RSS feed with a url in the following form:

[sitename.com...]

I use mod_rewrite, but this form of URL leaks out to robots through the rss feed. To prevent Googlebot and others from indexing this type of URL, I'd like to block it in my robots.txt file.

My "normal" URLs look more like:

[sitename.com...]

Do I just do this?

User-agent: googlebot
Disallow: /index.php?

Dijkgraaf

10:29 pm on Oct 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well that would just stop googlebot, if you want to tell all bots not to try indexing you will need to use a wildcard for the user agent as below, but otherwise, yes I would think that should do the trick.

User-agent: *
Disallow: /index.php?

BillyS

2:07 am on Oct 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dijkgraaf - After I posted, I was rethinking my strategy because I noticed that all my rss feeds start with index2.php. That's why I only wanted to block Googlebot.

So now I've just added this disallow for all bots

User-agent: *
Disallow: /index.php?