Forum Moderators: open

Message Too Old, No Replies

Scooter ignoring one page entry in robots

how can i get it to not index this page?

         

semick

3:56 pm on Sep 11, 2002 (gmt 0)

10+ Year Member



I have one page - xt_kc.asp that is almost always followed with parameters like xt_kc.asp?WTL=50&id=20

In my robots.txt:

useragent: *
disallow: /indexconnect/
disallow: /common/
disallow: /products/
disallow: /none/
disallow: /xt_kc.asp

Does Scooter ignore this because the parameters make the request look like a different file name? For now I have resorted to checking the useragent at the top if this page and redirecting if scooter comes there.

Thanks,
Scott

volatilegx

4:25 pm on Sep 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(oops sorry)

Jack_Straw

6:00 pm on Sep 11, 2002 (gmt 0)

10+ Year Member



Your robots.txt file has invalid syntax.
So, it is ineffective.

You can check the robots.txt syntax at:

[searchengineworld.com...]

jdMorgan

9:18 pm on Sep 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



semick,

As Jack_straw points out, your robots.txt is invalid. You cannot expect it to work the way it is now. There is a short robots.txt tutorial here [searchengineworld.com] on WebmasterWorld's sister site. After fixing your robots.txt, use the robots.txt checker that Jack_Straw cited.

Your robots.txt should read:

User-agent: *
Disallow: /indexconnect/
Disallow: /common/
Disallow: /products/
Disallow: /none/
Disallow: /xt_kc.asp

Jim

semick

3:53 am on Sep 13, 2002 (gmt 0)

10+ Year Member



Thanks very much!

Scott Emick

(and the reason I am asking search engines to ignore this page is is used for PPC and affiliates to send traffic to the site, the page records a keycode and redirects to a regular page inside the site - I don't want search engines to record duplicate content under different URLs)

jdMorgan

4:03 am on Sep 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Scott,

I'm just glad you did see the responses about robots.txt, before heading down the complicated path of trying to actually block - rather than disallow - a specific robot from one page.

I've hosed up my own robots.txt more than once... :o

Anyway, glad ya got the word!

Jim

semick

5:55 pm on Sep 16, 2002 (gmt 0)

10+ Year Member



What is funny is, I had the correct robots.txt on other client sites...I must have deleted the - and not noticed it...

anyway, I am now considering using: (does this look OK?) [I am allowing scooter once again...just don't want google on that page]

user-agent: googlebot
disallow: /indexconnect/
disallow: /common/
disallow: /products/
disallow: /none/
disallow: /xt_kc.asp

user-agent: *
disallow: /indexconnect/
disallow: /common/
disallow: /products/
disallow: /none/

jdMorgan

7:48 pm on Sep 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Scott,

Nope - won't work - or probably won't work.

Your all-lowercase specifications are not legal. You must use:

User-agent:
-and-
Disallow:

(not the all-lowercase "user-agent" and "disallow")

Get your robots.txt fixed up, and then upload it to your domain, but under a different filename like "rabbits.txt". Then use the WebmasterWorld robots.txt checker [searchengineworld.com] to validate it. If it is good, then change the filename to "robots.txt", and you're done. If not, the checker will tell you what is wrong.

I strongly suggest using this tool every time you update your robots.txt file - it can save you months of misery from having a bad robots.txt file requested and used by the robots.

Jim