homepage Welcome to WebmasterWorld Guest from 54.235.227.60
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
google follows links disallowed
Have I done something wrong?
shady




msg:1525649
 4:25 pm on Jan 11, 2005 (gmt 0)

I am running a forum and do not want SEs to follow outbound links.

In an attempt to stop this, I have replaced all outbound links as follows:

www.domain.com/subpage
becomes
/go.php?url=www.domain.com/subpage

I have created the following robots.txt:

User-agent: *
Disallow: /go.php

After all this, google is passing PR and listing my site as backlinks for the recipients.

Have I done something wrong or do I need to encrypt the URLs?

Best regards
Shady

 

glitterball




msg:1525650
 11:03 am on Jan 12, 2005 (gmt 0)

I have noticed that Google is ignoring my robots.txt also.

A few months ago I noticed that google was spidering many instances of a form that is on my site e.g. Goggle had form.asp?id=2 form.asp?id=3 etc.

My robots.txt now reads:
User-agent: *
Disallow: form.asp

However more than 6 months later (and Google has been back and re-cached the page several times), Google is still including the page.

A clarification from Google would be nice.

LowLevel




msg:1525651
 12:41 am on Jan 15, 2005 (gmt 0)


After all this, google is passing PR and listing my site as backlinks for the recipients.

The purpose of a robots.txt file is to ask robots to not download files.

Was your "go.php" page actually downloaded and indexed by Google? If not, Google has followed your robots.txt directives.


Disallow: form.asp

That should be "Disallow: /form.asp".

Try to analyze your robots.txt files with a robots.txt validator.

glitterball




msg:1525652
 1:59 pm on Jan 17, 2005 (gmt 0)

Hi LowLevel

Thanks for you advice regarding the use of "/" before the filename.

Actually many of the "authority" sites on Robots.txt do not specify the need for a preceeding forward slash and my robots.txt file seems to validate okay without it.

Anyway, I have updated my robots.txt with your suggestion, so hopefully the "/" will do the trick.

dedmond29




msg:1525653
 4:54 pm on Jan 18, 2005 (gmt 0)

Hello - I am having the same problem - but it is in regards to a cgi-bin folder which Google is indexing the query results. I did a site check on Google and am certain that it has indexed these, which I do not want it to do.

Here is my current Robots.txt file info:

User-agent: *
Disallow: /cgi-bin/

Should I specify the Googlebot as well? Thanks!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved