Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Problem with forum robots.txt and parameters

         

serenoo

9:53 am on Oct 5, 2009 (gmt 0)

10+ Year Member



I created a forum on my website. While I was working on it (to create it, to modify it, to keep it seo friendly with no parameters and to test it) it was not linked from my website. When it was ready I uploaded the robots.txt and a week after I linked my forum from my website. I waited for a week because I wanted google sees my robots.txt before crawl the forum.
But now I discovered by a site:www.example.com research that google lists 15 pages in this way:

/forum/viewtopic.php?f=1&t=4&view=unread
/forum/viewtopic.php?f=1&t=7&view=unread
/forum/viewtopic.php?f=1&t=5&view=unread

The content of these pages is always the same: the post does not exist (it is not a 404, but a message of the forum). They are my old tests.
So I have a big problem of duplicate content.

I thought to rename the forum directory into forum123 so all those pages will not exist anymore and inform google to remove it from its index.
Is this the correct way?
Please consider in my robots.txt I already have the row /forum/viewtopic.php.
Or do I have to do nothing and wait until google understand it has not to crawl those pages?
I think the google toolbar is the guilty

[edited by: tedster at 12:23 pm (utc) on Oct. 5, 2009]
[edit reason] switch to example.com - it cannot be owned [/edit]

tedster

8:29 pm on Oct 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If your robots.txt now has a Disallow line for those urls, you can just use the url-removal request and they will drop out of the index.

acemi

6:43 am on Oct 6, 2009 (gmt 0)

10+ Year Member



Add this to your robots file

Disallow: /*&view=unread

serenoo

7:14 pm on Oct 7, 2009 (gmt 0)

10+ Year Member



I do not want to add Disallow: /*&view=unread because it has to work with Disallow: /forum/viewtopic.php
I renamed the directory because there are some /forum/post-numbers.html posts that are not inside my robots.txt and I do not want to add irrilevant rows on robots.txt. I submited the 404 pages to google and it already removed them today. When I added the 404 pages to the url removal google said to me that they will be removed for 90 days.
Does it mean they will appear in 91 days (even if their path is included in my robots.txt)?
Is there a way to submit the robots.txt to google too?

TheMadScientist

7:30 pm on Oct 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'd be inclined to drop a 'noindex' meta tag on the pages... This will allow the content to be spidered, the links to pass weight, Google (and others) to know the content is there, and at the same time tell them those pages should not be part of the results (index). IMO it should have the desired effect.

BTW: GoogleBot and all other compliant Bots will access your robots.txt every visit, so you don't need to worry about submitting it to them or waiting for it to be found. As long as it's properly formatted any changes you make should be noticed and take effect the next time they spider any pages from your site.

serenoo

7:09 am on Oct 8, 2009 (gmt 0)

10+ Year Member



It is hard to add the noindex into the forum pages because the forum is an external software I do not know.
I do not think "GoogleBot and all other compliant Bots will access your robots.txt every visit" because I added the row Disallow: /forum/viewtopic.php on robots.txt and one week after that I linked the forum to my website and google ignored the robots.txt.