homepage Welcome to WebmasterWorld Guest from 54.205.254.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Stop Spiders Crawling Message Board Edit
Cgi Factory Message Board
mn1dbp




msg:1525548
 10:45 pm on May 22, 2004 (gmt 0)

I run a message board it is the one from cgi-factory

I want search engines to index each message thread but unfortunatly there are 'edit' and 'delete' links on each message and Google has followed each of these and indexed each of the page it throws up - this is using huge bandwith for no reason.

So i thought i could use the robots.txt to stop it but i am new to robots.txt so dont really know how.

The message board pages i want indexed have the URL:

[www.mydomain.com/cgi-bin/mb/mesage_number.html]

when you click an edit function you get the URL:

[www.mydomain.com/cgi-bin/mb/edit.pl?query_string]

Is there a way i can stop indexing of all the edit pages easily?

Another way i have thought was to add something in the <head> tag to stop search engines following links off each of these pages. Is there such a command?

All messages have a master template so it would be simple to add. However this would only cover new messages not all the old ones.

 

jdMorgan




msg:1525549
 2:51 am on May 23, 2004 (gmt 0)


User-agent: *
Disallow: /cgi-bin/mb/edit.pl

should do it.

For the on-page solution:

<meta name="robots" content="index,nofollow">

Jim

maccas




msg:1525550
 4:36 am on May 23, 2004 (gmt 0)

Hi mn1dbp, I believe Googlebot ends up spidering the "bad referer" section of this script. If so try this, open edit.pl and find $my_title="Bad referer"; above that delete &vheader; then find sub message_box and below "print" add your new header... <head><meta name="robots" content="noindex,nofollow">...

brotherhood of LAN




msg:1525551
 5:06 am on May 23, 2004 (gmt 0)

If you're familiar with the script you could also edit it so that the links are buttons. Google and most 'normal' bots won't be able to follow the buttons.

mn1dbp




msg:1525552
 12:25 pm on May 23, 2004 (gmt 0)

Many thanks for the replies.

I will report back how i get on, to give an idea of the scale the are about 400 legitimate pages on my site including message threads - Goolge currently reports over 2000 pages. SO at least 1600 of them are simply it following the edit and delete links on each message in every thread and indexing that page.

Brett_Tabke




msg:1525553
 1:01 pm on May 23, 2004 (gmt 0)

This is a question for your forum software developer. They should have a solution in place for this very issue.

mn1dbp




msg:1525554
 12:36 pm on May 25, 2004 (gmt 0)

Its now up to 3,500 pages indexed!

If someone is looking at changing the script then i also have another problem with it - the old one used to allow ten messages then start a new page for message 11-20 etc...

This new version (v5.0) after 10 creates a new page for each and every extra message - so #11 on page 2, #12 on page 3 etc...

I currently get around this by having unlimited messages per page but this leads to gigantic page sizes that are not good for users.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved