|robots.txt - disallow any url containing.|
using robots.txt and dynamic urls for phpbb3 forum
Msg#: 4314267 posted 6:00 pm on May 18, 2011 (gmt 0)
My goal is to get google to stop crawling specific URL's and setup a accurate sitemap. I am running a phpbb3 message board with SEF URL's. The problem I have is the forum script generates a URL for every reply in a topic, basically anchors.
This creates 1000's of useless URL's in the eyes of the search engine, even though the users like them for bookmarking.
TOPIC = Domain brewerscubs.com/messageboard/milwaukee-brewers/carlos-gomez-16796.html
Direct Link to post = brewerscubs.com/messageboard/milwaukee-brewers/carlos-gomez-16796.html#p412994
I have been researching and trying to find a way to tell robots.txt to disallow any url containing "#p" but have not had any luck. Also, my host, siteground, is busting my marbles about CPU usage from the testing i have been doing with a gsitecrawler so my days of testing are numbered... I need to get it right this time, so i turn to the experts :)
Msg#: 4314267 posted 6:09 pm on May 18, 2011 (gmt 0)
Google should not include anchors as if they are separate URLs, the bit after # should be ignored.
Anchors are interpreted only within the browser, as a page-relative link. The #part is not requested from the server when a link is clicked.
Msg#: 4314267 posted 6:35 pm on May 18, 2011 (gmt 0)
Thank you for the reply.
Msg#: 4314267 posted 8:10 pm on May 18, 2011 (gmt 0)
That's not to say there isn't some duplicate content somewhere to clean up. These searches will be useful to begin with:
Especially change to 100 results per page.
Msg#: 4314267 posted 9:48 pm on May 18, 2011 (gmt 0)
I changed my URL structure just last week, so 90% of those indexed pages are returning a 404 error.
I thought about doing a 301 redirect, but I dont think it would work. Here is what I am working with.
|Structure was /messageboard/(forum number)/(topic number) |
Strucure is messageboard/(forum NAME)/(topic NAME)
The topic number in the URL is converted on the fly..
So the redirect would be:
redirect 301 /messageboard/(forum number)http://www.example.com/messageboard/(forum NAME)
But what about the change to the second part, the topic? No way could I create a redirect for every topic as there are thousands.
Would it be best for to add a disallow to the old urls, ignore it, or another route?
The site has been active for years, and I am just now paying attention to SEO. The pages in the board had no meta data at all prior to last week. Now the description is pulled from the text on the page and the title is the the topic title.
Msg#: 4314267 posted 10:07 pm on May 18, 2011 (gmt 0)
Use a RewriteRule to match incoming external URL requests and internally rewrite them to a PHP script that can then interpret the old URL from the request, look up the new URL in an array or database, and then send the correct HTTP 301 redirect.
Msg#: 4314267 posted 10:28 pm on May 18, 2011 (gmt 0)
Wow... thats over my head. Thank you though!
Should I start a new topic in this section, [webmasterworld.com...] , with this info as we have strayed away from robots.txt?