homepage Welcome to WebmasterWorld Guest from 54.166.113.249
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Changed Robots.txt - now googlebots are blocked?
Rockzer



 
Msg#: 4559189 posted 1:52 pm on Mar 28, 2013 (gmt 0)

Hello,

Please check our old robots.txt,

####################################################

# Robots.txt file for [forums.example.in...]
User-agent: *
User-agent: TerrawizBot
Disallow: /
User-agent: BoardReader
Disallow: /
User-agent: GurujiBot
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: psbot
Disallow: /
User-agent: twiceler
Disallow: /
#
Sitemap: [forums.example.in...]
###########################################################

We did some changes to our old robots.txt file, after that we found that we lost nearing 50% of our web traffic from our site please check the robots.txt

# Robots.txt file for [forums.example.in...]
User-agent: *
User-agent: TerrawizBot
Disallow: /
User-agent: BoardReader
Disallow: /
User-agent: GurujiBot
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: psbot
Disallow: /
User-agent: twiceler
Disallow: /
#
Disallow: clientscript/
Disallow: cpstyles/
Disallow: customavatars/
Disallow: customprofilepics/
Disallow: images/
Disallow: vbmodcp/
Disallow: attachment.php
Disallow: editpost.php
Disallow: image.php
Disallow: misc.php
Disallow: moderator.php
Disallow: newattachment.php
Disallow: newreply.php
Disallow: newthread.php
Disallow: online.php
Disallow: poll.php
Disallow: postings.php
Disallow: printthread.php
Disallow: private.php
Disallow: profile.php
Disallow: register.php
Disallow: report.php
Disallow: reputation.php
Disallow: search.php
Disallow: sendmessage.php
Disallow: subscription.php
Disallow: threadrate.php
Disallow: usernote.php
Sitemap: http://forums.example.in/sitemap_index.xml.gz

I am unsure what went wrong with this robots.txt, Can you please help me and correct, Google bots are blocked on my site, can you please help me?

[edited by: tedster at 2:21 pm (utc) on Mar 28, 2013]

 

Str82u



 
Msg#: 4559189 posted 3:42 pm on Mar 28, 2013 (gmt 0)

This looks like you're blocking all bots
User-agent: *

Maybe I'm wrong and it's skipping over the disallows of the other bots.... I'd do this though
User-agent: Googlebot
Allow: /
User-agent: TerrawizBot
Disallow: /
.....


EDIT: I've heard that blocking a lot of pages in robots.txt isn't good so maybe you just shocked googlebot a bit with the new htaccess. We blocked something last month to get rid of some pages and took a dip for about 4 days; could have been unrelated though.

sujeetdit



 
Msg#: 4559189 posted 4:00 pm on Mar 28, 2013 (gmt 0)

read this post of matt cutss may be it can help u
[mattcutts.com...]

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4559189 posted 4:08 pm on Mar 28, 2013 (gmt 0)

the initial group (*) is empty and the group of Disallows at the end are also useless since bots match paths left-to-right so there must be a leading '/'.
in other words it appears there is no exclusion rule that applies to any of google's bots.
what is GWT telling you?
have you tried "fetch as googlebot"?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4559189 posted 5:58 pm on Mar 28, 2013 (gmt 0)

You MUST include a blank line after each record, i.e. before the next User-agent.

"Allow" is Non-Standard. Use
Disallow:
to allow all.

Premably this is a cut-and-paste error. I assume the " User-agent: * " line belongs just after the # mark, as those Disallow: rules have no preceding User-agent definition.

Str82u



 
Msg#: 4559189 posted 6:57 pm on Mar 28, 2013 (gmt 0)

@g1smd does this only apply to Google? [developers.google.com...] - way down the page they have:
disallow - The disallow directive specifies paths that must not be accessed by the designated crawlers. When no path is specified, the directive is ignored.
disallow: [path]
allow - The allow directive specifies paths that may be accessed by the designated crawlers. When no path is specified, the directive is ignored.
allow: [path]

I have to agree, [robotstxt.org...] does mention it:
To exclude all files except one - This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:


We use "allow" and "disallow" with no problems that are apparent.

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4559189 posted 8:24 pm on Mar 28, 2013 (gmt 0)

We use "allow" and "disallow" with no problems that are apparent.

That's great for Google, what about other bots?

Str82u



 
Msg#: 4559189 posted 8:27 pm on Mar 28, 2013 (gmt 0)

No problem with Bing or Yahoo. We get plenty of other bot traffic to it as well.

EDIT: By "apparent" I was inferring that there were no bot issues of any kind that were apparent.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4559189 posted 11:20 pm on Mar 28, 2013 (gmt 0)

A compliant robot MUST honor the "disallow" directive.

A compliant robot may CHOOSE to honor non-standard directives such as "allow".

The word "compliant" means everything.

There are no numbered versions of the robots.txt standard.
The /robots.txt standard is not actively developed.


Also, you may not have blank lines in a record, as they are used to delimit multiple records.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4559189 posted 1:27 am on Mar 29, 2013 (gmt 0)

You MUST include a blank line after each record, i.e. before the next User-agent.


that's true according to the robots exclusion protocol.
http://www.robotstxt.org/robotstxt.html [robotstxt.org]:
you may not have blank lines in a record, as they are used to delimit multiple records.


however according to the (non-standard) google documentation...
http://developers.google.com/webmasters/control-crawl-index/docs/robots_txt [developers.google.com]:
Note the optional use of white-space (an empty line) to improve readability.


as jimbeetle says:
That's great for Google, what about other bots?


given that the only bots that have a chance of honoring your robots.txt are non-google, you might want to add the blank lines between groups.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved