Welcome to WebmasterWorld Guest from 54.226.45.241

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt - disallow any url containing.

using robots.txt and dynamic urls for phpbb3 forum

     

BrainDed

6:00 pm on May 18, 2011 (gmt 0)



My goal is to get google to stop crawling specific URL's and setup a accurate sitemap. I am running a phpbb3 message board with SEF URL's. The problem I have is the forum script generates a URL for every reply in a topic, basically anchors.

This creates 1000's of useless URL's in the eyes of the search engine, even though the users like them for bookmarking.

TOPIC = Domain brewerscubs.com/messageboard/milwaukee-brewers/carlos-gomez-16796.html

Direct Link to post = brewerscubs.com/messageboard/milwaukee-brewers/carlos-gomez-16796.html#p412994


I have been researching and trying to find a way to tell robots.txt to disallow any url containing "#p" but have not had any luck. Also, my host, siteground, is busting my marbles about CPU usage from the testing i have been doing with a gsitecrawler so my days of testing are numbered... I need to get it right this time, so i turn to the experts :)

g1smd

6:09 pm on May 18, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Google should not include anchors as if they are separate URLs, the bit after # should be ignored.

Anchors are interpreted only within the browser, as a page-relative link. The #part is not requested from the server when a link is clicked.

BrainDed

6:35 pm on May 18, 2011 (gmt 0)



Fantastic!

Thank you for the reply.

g1smd

8:10 pm on May 18, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



That's not to say there isn't some duplicate content somewhere to clean up. These searches will be useful to begin with:

site:example.com -inurl:www


site:www.example.com


Especially change to 100 results per page.

BrainDed

9:48 pm on May 18, 2011 (gmt 0)



Thanks again!

I changed my URL structure just last week, so 90% of those indexed pages are returning a 404 error.

I thought about doing a 301 redirect, but I dont think it would work. Here is what I am working with.

Structure was /messageboard/(forum number)/(topic number)
Strucure is messageboard/(forum NAME)/(topic NAME)

http://www.brewerscubs.com/messageboard/16/16417.html

http://www.brewerscubs.com/messageboard/mlb/astros-have-no-chance-in-winning-the-central-16417.html


The topic number in the URL is converted on the fly..



So the redirect would be:
redirect 301 /messageboard/(forum number)http://www.example.com/messageboard/(forum NAME)

But what about the change to the second part, the topic? No way could I create a redirect for every topic as there are thousands.

Would it be best for to add a disallow to the old urls, ignore it, or another route?

The site has been active for years, and I am just now paying attention to SEO. The pages in the board had no meta data at all prior to last week. Now the description is pulled from the text on the page and the title is the the topic title.

g1smd

10:07 pm on May 18, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Use a RewriteRule to match incoming external URL requests and internally rewrite them to a PHP script that can then interpret the old URL from the request, look up the new URL in an array or database, and then send the correct HTTP 301 redirect.

BrainDed

10:28 pm on May 18, 2011 (gmt 0)



Wow... thats over my head. Thank you though!

Should I start a new topic in this section, [webmasterworld.com...] , with this info as we have strayed away from robots.txt?
 

Featured Threads

Hot Threads This Week

Hot Threads This Month