homepage Welcome to WebmasterWorld Guest from 54.196.159.11
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Consequences of robots.txt file
RoadTrips




msg:776265
 8:19 pm on Jun 26, 2005 (gmt 0)

Hi, I created a robots.txt to keep google and the other engines out of one folder that had several thousand detail product pages, and which google always counted thousands of them that were old and expired. In reality, there are only 2000 of these detail pages at any one time. My guess is that these overlapping detail pages were making it look like duplicate content.

Now that google seems to recognize the robots.txt file, I am wondering if there is a consequence to all this? Will I be penalized for this robots.txt file?

Thanks

 

Clint




msg:776266
 11:08 am on Jun 27, 2005 (gmt 0)

I'm giving you a "bump" since I want to know this as well.

I put the lines in my robots.txt file to only block the G bot from one PDF file. I started dropping in G after I did that!

FWIW, G doesn't seem to obey the robots meta tag line.

topr8




msg:776267
 11:11 am on Jun 27, 2005 (gmt 0)

i have all robots excluded from some folders in my site and haven't noticed any kind of penalty for it.

don't see why there should be, if anything the reverse, as you're doing them a favour by pointing out areas that they shouldn't bother indexing and thus saving them bandwidth (trivial i know)

Reid




msg:776268
 4:14 pm on Jun 27, 2005 (gmt 0)

robots.txt is part of the robots exclusion standard.
It was designed to control bots use of bandwidth and keep them out of unwanted folders.
Google obeys robots.txt and there is NO penalty for having robots.txt
That being said it is possible to affect ranking with robots.txt
If you ban bots from pages you want indexed - they won't be.

kgun




msg:776269
 10:02 pm on Jun 27, 2005 (gmt 0)

If you look at my robots.txt file, you find a lot of bots that I have allowed and more disallowed.

I also have a htaccess.txt file, that is not implemented as .htaccess that you may look at.

KBleivik
Make it simple, as simple as possible, but no simpler.

Reid




msg:776270
 6:59 am on Jul 7, 2005 (gmt 0)

thats quite the robots.txt collection kgun did you build it over the years or get it from somewhere?

latimer




msg:776271
 4:36 pm on Jul 7, 2005 (gmt 0)

Can someone help me understand this one:

we have used robots.txt on one of our sites to prevent google from accessing any of the files as follows:

User-agent: Googlebot
Disallow: /

What I have noticed is that google is somehow getting some of the pages anyway. out of about 20,000 they have now about 3,670.

also interesting is that on the search results page for:

oursitename site:www.oursite.com

google shows: Results 1 - 9 of about 3,670

And, only 9 url links without title or description show up. No way to access any of the other supposed 3,670 results.

We have another site that has same pages and the reason we block google from the mirror site is to avoid penalty. Concerned about these pages getting in despite the robots.txt block, and possible penalty.

Any help on understanding this would be appreciated.

Reid




msg:776272
 5:41 am on Jul 8, 2005 (gmt 0)

googlebot can find URL's from inbound links and list them as URL-only.
another googlebot will crawl those links and index the title and description, but in your case it wouldn't because it would find robots.txt .
If you submit that robots.txt to the google URL removal tool it will remove the entire site and not 'find it' again for at least six months.

I'm not sure about your search results though - it is possible to have a lot of 'hidden' IBL's - possibly from your mirror - in google.

if you are confident of your robots.txt (validate it) you can submit both robots.txt files and it will remove the one site and clean up the other from anything that is disallowed.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved