Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt wildcard

Will this work

         

gosman

11:12 am on Sep 6, 2005 (gmt 0)

10+ Year Member



I have a URL jump script on my site. The purpose of this script is to track visits to our client’s sites. I have noticed that the PHP URL's are being cached and I think I may be incurring a duplicate content penalty because of it.

The links are formatted as follows

show_website.php?id=XX

Will the following in my robots.txt file stop these links being followed?

Disallow: /show_website.*

Lord Majestic

1:22 pm on Sep 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It may work for googlebot, but it won't for others -- robots.txt standard is pretty clear that the only place wildcard is acceptable is in the User-Agent definition.

Now, the good news is that in your case you don't need wildcards -- the standard requires checking of given URL starts with whatever you have in disallow field. This means that it effectively has a wildcard at the end of it, so you get what you want by changing your robots.txt to:

Disallow: /show_website.

gosman

1:32 pm on Sep 6, 2005 (gmt 0)

10+ Year Member



Thanks Lord_Majestic

twebdonny

1:43 pm on Sep 6, 2005 (gmt 0)



Hello,
We are trying to remove wildcard type pages from
our indexed site, ie pages that are being indexed with a
? mark after them, www.name.com/?

so we used the information for such removal and placed
User-agent: Googlebot
Disallow: /*?

in our robots.txt file

However, then when attempting to use the url removal tool
for this we get the follwoing message:

URLs cannot have wild cards in them (e.g. "*"). The following line
contains a wild card:
DISALLOW /*?

so what it be? How can we get these pages removed?

Thanks

Lord Majestic

1:51 pm on Sep 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The URL removal tool must not be supporting wildcards, which should not have been used in the first place. You will have to follow robots.txt standard to ensure good compatibility with tools that use robots.txt.

twebdonny

1:54 pm on Sep 6, 2005 (gmt 0)



So any idea on how to remove these? type pages
that we didn't want indexed in the first place?
I am afraid that if I attempt the Google removal
tool specifically on them, it may remove the original
page with the wildcard and that would be bery berry bad.

Lord Majestic

1:59 pm on Sep 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You will need to read up on what exactly URL removal tool requires, I never used and I think its off-topic in this forum. As far as robots.txt is concerned it is not adviseable to use wildcards.