Welcome to WebmasterWorld Guest from 54.205.119.93

Message Too Old, No Replies

Changed Robots.txt - now googlebots are blocked?

   
1:52 pm on Mar 28, 2013 (gmt 0)



Hello,

Please check our old robots.txt,

####################################################

# Robots.txt file for [forums.example.in...]
User-agent: *
User-agent: TerrawizBot
Disallow: /
User-agent: BoardReader
Disallow: /
User-agent: GurujiBot
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: psbot
Disallow: /
User-agent: twiceler
Disallow: /
#
Sitemap: [forums.example.in...]
###########################################################

We did some changes to our old robots.txt file, after that we found that we lost nearing 50% of our web traffic from our site please check the robots.txt

# Robots.txt file for [forums.example.in...]
User-agent: *
User-agent: TerrawizBot
Disallow: /
User-agent: BoardReader
Disallow: /
User-agent: GurujiBot
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: psbot
Disallow: /
User-agent: twiceler
Disallow: /
#
Disallow: clientscript/
Disallow: cpstyles/
Disallow: customavatars/
Disallow: customprofilepics/
Disallow: images/
Disallow: vbmodcp/
Disallow: attachment.php
Disallow: editpost.php
Disallow: image.php
Disallow: misc.php
Disallow: moderator.php
Disallow: newattachment.php
Disallow: newreply.php
Disallow: newthread.php
Disallow: online.php
Disallow: poll.php
Disallow: postings.php
Disallow: printthread.php
Disallow: private.php
Disallow: profile.php
Disallow: register.php
Disallow: report.php
Disallow: reputation.php
Disallow: search.php
Disallow: sendmessage.php
Disallow: subscription.php
Disallow: threadrate.php
Disallow: usernote.php
Sitemap: http://forums.example.in/sitemap_index.xml.gz

I am unsure what went wrong with this robots.txt, Can you please help me and correct, Google bots are blocked on my site, can you please help me?

[edited by: tedster at 2:21 pm (utc) on Mar 28, 2013]

3:42 pm on Mar 28, 2013 (gmt 0)



This looks like you're blocking all bots
User-agent: *

Maybe I'm wrong and it's skipping over the disallows of the other bots.... I'd do this though
User-agent: Googlebot
Allow: /
User-agent: TerrawizBot
Disallow: /
.....


EDIT: I've heard that blocking a lot of pages in robots.txt isn't good so maybe you just shocked googlebot a bit with the new htaccess. We blocked something last month to get rid of some pages and took a dip for about 4 days; could have been unrelated though.
4:00 pm on Mar 28, 2013 (gmt 0)



read this post of matt cutss may be it can help u
[mattcutts.com...]
4:08 pm on Mar 28, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



the initial group (*) is empty and the group of Disallows at the end are also useless since bots match paths left-to-right so there must be a leading '/'.
in other words it appears there is no exclusion rule that applies to any of google's bots.
what is GWT telling you?
have you tried "fetch as googlebot"?
5:58 pm on Mar 28, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You MUST include a blank line after each record, i.e. before the next User-agent.

"Allow" is Non-Standard. Use
Disallow:
to allow all.

Premably this is a cut-and-paste error. I assume the " User-agent: * " line belongs just after the # mark, as those Disallow: rules have no preceding User-agent definition.
6:57 pm on Mar 28, 2013 (gmt 0)



@g1smd does this only apply to Google? [developers.google.com...] - way down the page they have:
disallow - The disallow directive specifies paths that must not be accessed by the designated crawlers. When no path is specified, the directive is ignored.
disallow: [path]
allow - The allow directive specifies paths that may be accessed by the designated crawlers. When no path is specified, the directive is ignored.
allow: [path]

I have to agree, [robotstxt.org...] does mention it:
To exclude all files except one - This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:


We use "allow" and "disallow" with no problems that are apparent.
8:24 pm on Mar 28, 2013 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



We use "allow" and "disallow" with no problems that are apparent.

That's great for Google, what about other bots?
8:27 pm on Mar 28, 2013 (gmt 0)



No problem with Bing or Yahoo. We get plenty of other bot traffic to it as well.

EDIT: By "apparent" I was inferring that there were no bot issues of any kind that were apparent.
11:20 pm on Mar 28, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



A compliant robot MUST honor the "disallow" directive.

A compliant robot may CHOOSE to honor non-standard directives such as "allow".

The word "compliant" means everything.

There are no numbered versions of the robots.txt standard.
The /robots.txt standard is not actively developed.


Also, you may not have blank lines in a record, as they are used to delimit multiple records.
1:27 am on Mar 29, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You MUST include a blank line after each record, i.e. before the next User-agent.


that's true according to the robots exclusion protocol.
http://www.robotstxt.org/robotstxt.html [robotstxt.org]:
you may not have blank lines in a record, as they are used to delimit multiple records.


however according to the (non-standard) google documentation...
http://developers.google.com/webmasters/control-crawl-index/docs/robots_txt [developers.google.com]:
Note the optional use of white-space (an empty line) to improve readability.


as jimbeetle says:
That's great for Google, what about other bots?


given that the only bots that have a chance of honoring your robots.txt are non-google, you might want to add the blank lines between groups.