homepage Welcome to WebmasterWorld Guest from 54.196.24.103
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Blocking parts of forum
but not others
AnonyMouse




msg:3607977
 1:18 pm on Mar 22, 2008 (gmt 0)

Hi,

I have a well-indexed forum, but have noticed that G is picking up all the links to "report this thread", which are useless and also lead to some 404 errors.

I want to block all urls that start
/foros/?func=report

But allow any other pages that are under
/foros/

If I get this wrong, my forums will drop out of G's index, so I need to make sure I do this carefully! Could anyone advise me on the correct disallow statement that would SPECIFICALLY block the "func=report" urls ONLY?

Many thanks!

 

bilalseo




msg:3648071
 5:50 pm on May 12, 2008 (gmt 0)

one more solution that you can have that to redirect the 404 urls which is in your point of view are unnecessary urls to your favorite urls. Because stoping 404 error by using robots.txt file is not a good idea.

I advice you to please use 301 redirect on all 404 errors urls..

thanks,

bilal

goodroi




msg:3648801
 1:00 pm on May 13, 2008 (gmt 0)

To block googlebot from accessing those pages add this line to the robots.txt file:
Disallow: /*func=report$

bilalseo does make a good point. it is better to identify 404 errors and if possible 301 redirect them to the new file. if there is no new file location and the information does not exist on your site you should serve a 404 error page with helpful links for the users. this will help make it clear to the users what is going on and what options are available to them.

bilalseo




msg:3652013
 5:40 pm on May 16, 2008 (gmt 0)

thanks goodroi:)

bilalseo




msg:3652015
 5:43 pm on May 16, 2008 (gmt 0)

Goodroi I want to ask one thing, that if the 404 pages remains cached and got maturity then what is the right method to remove them from cache and what steps should I take to overcome. I read before in google webmaster area about removal or permanent removal of directory, pages and urls from the google. But that wasn't entertained me as I could be.

thanks,

bilal

martinibuster




msg:3652019
 5:52 pm on May 16, 2008 (gmt 0)

You can also find where that link is defined within the forum software and add the nofollow to it. I added the nofollow to member messaging links (PM Member). Who needs bots indexing those, right?

Receptional Andy




msg:3652022
 5:59 pm on May 16, 2008 (gmt 0)

Disallow: /*func=report$

Looking at the OP, the request was to block "urls that start /foros/?func=report" so I assume we need to drop the $ at least. And doesn't Google allow question marks in disallow? So, the simplest would be:

Disallow: /foros/?func=report

Sorry if I'm being excessively picky ;)

AnonyMouse




msg:3652045
 6:29 pm on May 16, 2008 (gmt 0)

Ah martinibuster, that's the answer I was looking for! Thanks :-)

bilalseo




msg:3652079
 7:12 pm on May 16, 2008 (gmt 0)

yes nofollow is another solutions that you might take. but in case of 404, there must be 301... I suggest :)

bilalseo




msg:3652082
 7:13 pm on May 16, 2008 (gmt 0)

until you get 200 :)

WiseWebDude




msg:3709654
 8:24 pm on Jul 28, 2008 (gmt 0)

Disallow: /*func=report

That would be the correct way. You don't need the $ at the end...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved