Forum Moderators: goodroi

Message Too Old, No Replies

Contrary advice over robots.txt validity

Who is right Google or the other?

         

netchicken1

4:31 am on Sep 18, 2006 (gmt 0)

10+ Year Member



I have made the following robots.txt file designed to prevent the bots from creeting duplicate content by accessing parts of the board where the page is in a different format...

User-agent: *
Disallow: /post.php?
Disallow: /member.php?
Disallow: /misc.php?
Disallow: /memcp.php?
Disallow: /chat/
Disallow: /cp2.php?

Using the Google robots text checker I put in some URLS with the offending strings...

http://example.com/xmb/post.php?action=newthread&fid=12
http://example.com/xmb/memcp.php?action=favorites&favadd=370

Google marked them as "Allowed" Meaning I suppose that the bots can access them.

The same robots code put in a Robots validation tool program here [tool.motoricerca.info] said

The following block of code DISALLOWS the crawling of the following files and directories: /post.php? /member.php? /misc.php? /memcp.php? /chat/ /cp2.php? to all spiders/robots.

meaning I suppose that the addressess have been Disallowed, which is what I want.
So which is right?

I love the internet its one part logical and 10 parts confusing.

sonjay

12:37 pm on Sep 18, 2006 (gmt 0)

10+ Year Member



They're both correct!

You're disallowing all pages that begin with /post.php? and your robots validation tool correctly reported that you are disallowing such pages. But it appears that you actually want to disallow pages that begin with /xmb/post.php?, and that is what you need to put in your robots.txt file.

The Disallow entries in robots.txt are all based on what the URL starts with -- starting immediately following the ".com" in your domain.

netchicken1

1:05 pm on Sep 18, 2006 (gmt 0)

10+ Year Member



Thanks very much :)

Thats great advice and I will follow through on it. :)