homepage Welcome to WebmasterWorld Guest from 54.163.91.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Want to Deny Google from Indexing, Crawling My Private Site
Google keeps pestering, is my robots text wrong?
erlandc




msg:4185460
 4:53 pm on Aug 11, 2010 (gmt 0)

Is this correct? Google keeps "GET /audio/whatever.ext HTTP/1.1" How do I stop all access? Do I have to resort to .htaccess? It's a Private Site. Thanks. E

User-agent: Googlebot
Disallow: *googlebot=nocrawl$
User-agent: Googlebot-Image
Disallow: /
User-agent: Googlebot
Disallow: /
User-agent: *
Disallow: /*.jpg
Disallow: /*.gif
# go away
User-agent: *
Disallow: /
Disallow: /
Disallow: /http://www.mysite.tv/
Disallow: /otherpages.html

 

lammert




msg:4185779
 5:14 am on Aug 12, 2010 (gmt 0)

Googlebot may be confused by the two sections where you mention User-agent: Googlebot. Looking at your robots.txt , it looks like you want to block all access from all robots. In that case the following should be sufficient:

User-agent: *
Disallow: /

dstiles




msg:4186188
 10:07 pm on Aug 12, 2010 (gmt 0)

Except it won't block all bots. Many do not actually look at robots.txt. They find a domain, they scrape it. :(

Lots of info on this in the SE forum hereabouts but start by blocking every server farm you can find, then add broadband suppliers from the most likely scraping countries such as far east, eastern europe, america (south AND north). :(

phranque




msg:4186212
 11:38 pm on Aug 12, 2010 (gmt 0)

have you tried the robots.txt test function on the Crawler access page of Google Webmaster Tools?
https://www.google.com/webmasters/tools/crawl-access?hl=en&siteUrl=http://www.example.com/

tangor




msg:4186427
 10:54 am on Aug 13, 2010 (gmt 0)

This robots.txt is malformed... you need a BLANK LINE between each useragent and a BLANK LINE at the end of the file, too.

StoutFiles




msg:4186429
 11:00 am on Aug 13, 2010 (gmt 0)

Have the index be a form with a password. Upon correct completion of password, assign a cookie. Have every page check for the cookie. No cookie, redirect back to form page. Problem solved.

erlandc




msg:4189374
 5:12 pm on Aug 19, 2010 (gmt 0)

Hi ya'll,

thanks for your time and help! much appreciated!

e

g1smd




msg:4192486
 12:59 pm on Aug 26, 2010 (gmt 0)



If you merely want it not crawled, then robots.txt will stop it crawling, but references to URLs on your site could still appear in Google SERPs as URL-only entries.

If you don't mind Google accessing the pages, but you don't want anything at all to appear in the SERPs then you need a meta robots noindex on every page.

If it really is private, then using .htpasswd is the way to go. That way there will be no access, no crawling, no indexing at all.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved