Welcome to WebmasterWorld Guest from 23.22.220.37

Forum Moderators: goodroi

Message Too Old, No Replies

Want to Deny Google from Indexing, Crawling My Private Site

Google keeps pestering, is my robots text wrong?

     
4:53 pm on Aug 11, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 29, 2003
posts: 126
votes: 0


Is this correct? Google keeps "GET /audio/whatever.ext HTTP/1.1" How do I stop all access? Do I have to resort to .htaccess? It's a Private Site. Thanks. E

User-agent: Googlebot
Disallow: *googlebot=nocrawl$
User-agent: Googlebot-Image
Disallow: /
User-agent: Googlebot
Disallow: /
User-agent: *
Disallow: /*.jpg
Disallow: /*.gif
# go away
User-agent: *
Disallow: /
Disallow: /
Disallow: /http://www.mysite.tv/
Disallow: /otherpages.html
5:14 am on Aug 12, 2010 (gmt 0)

Senior Member from KZ 

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 10, 2005
posts: 2886
votes: 1


Googlebot may be confused by the two sections where you mention User-agent: Googlebot. Looking at your robots.txt , it looks like you want to block all access from all robots. In that case the following should be sufficient:

User-agent: *
Disallow: /
10:07 pm on Aug 12, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3091
votes: 2


Except it won't block all bots. Many do not actually look at robots.txt. They find a domain, they scrape it. :(

Lots of info on this in the SE forum hereabouts but start by blocking every server farm you can find, then add broadband suppliers from the most likely scraping countries such as far east, eastern europe, america (south AND north). :(
11:38 pm on Aug 12, 2010 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


have you tried the robots.txt test function on the Crawler access page of Google Webmaster Tools?
[google.com...]
10:54 am on Aug 13, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6142
votes: 280


This robots.txt is malformed... you need a BLANK LINE between each useragent and a BLANK LINE at the end of the file, too.
11:00 am on Aug 13, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:May 6, 2008
posts:2011
votes: 0


Have the index be a form with a password. Upon correct completion of password, assign a cookie. Have every page check for the cookie. No cookie, redirect back to form page. Problem solved.
5:12 pm on Aug 19, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 29, 2003
posts:126
votes: 0


Hi ya'll,

thanks for your time and help! much appreciated!

e
12:59 pm on Aug 26, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0




If you merely want it not crawled, then robots.txt will stop it crawling, but references to URLs on your site could still appear in Google SERPs as URL-only entries.

If you don't mind Google accessing the pages, but you don't want anything at all to appear in the SERPs then you need a meta robots noindex on every page.

If it really is private, then using .htpasswd is the way to go. That way there will be no access, no crawling, no indexing at all.