homepage Welcome to WebmasterWorld Guest from 54.167.238.60
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Changed Robots.txt - now googlebots are blocked?
Rockzer




msg:4559191
 1:52 pm on Mar 28, 2013 (gmt 0)

Hello,

Please check our old robots.txt,

####################################################

# Robots.txt file for [forums.example.in...]
User-agent: *
User-agent: TerrawizBot
Disallow: /
User-agent: BoardReader
Disallow: /
User-agent: GurujiBot
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: psbot
Disallow: /
User-agent: twiceler
Disallow: /
#
Sitemap: [forums.example.in...]
###########################################################

We did some changes to our old robots.txt file, after that we found that we lost nearing 50% of our web traffic from our site please check the robots.txt

# Robots.txt file for [forums.example.in...]
User-agent: *
User-agent: TerrawizBot
Disallow: /
User-agent: BoardReader
Disallow: /
User-agent: GurujiBot
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: psbot
Disallow: /
User-agent: twiceler
Disallow: /
#
Disallow: clientscript/
Disallow: cpstyles/
Disallow: customavatars/
Disallow: customprofilepics/
Disallow: images/
Disallow: vbmodcp/
Disallow: attachment.php
Disallow: editpost.php
Disallow: image.php
Disallow: misc.php
Disallow: moderator.php
Disallow: newattachment.php
Disallow: newreply.php
Disallow: newthread.php
Disallow: online.php
Disallow: poll.php
Disallow: postings.php
Disallow: printthread.php
Disallow: private.php
Disallow: profile.php
Disallow: register.php
Disallow: report.php
Disallow: reputation.php
Disallow: search.php
Disallow: sendmessage.php
Disallow: subscription.php
Disallow: threadrate.php
Disallow: usernote.php
Sitemap: http://forums.example.in/sitemap_index.xml.gz

I am unsure what went wrong with this robots.txt, Can you please help me and correct, Google bots are blocked on my site, can you please help me?

[edited by: tedster at 2:21 pm (utc) on Mar 28, 2013]

 

Str82u




msg:4559235
 3:42 pm on Mar 28, 2013 (gmt 0)

This looks like you're blocking all bots
User-agent: *

Maybe I'm wrong and it's skipping over the disallows of the other bots.... I'd do this though
User-agent: Googlebot
Allow: /
User-agent: TerrawizBot
Disallow: /
.....


EDIT: I've heard that blocking a lot of pages in robots.txt isn't good so maybe you just shocked googlebot a bit with the new htaccess. We blocked something last month to get rid of some pages and took a dip for about 4 days; could have been unrelated though.

sujeetdit




msg:4559244
 4:00 pm on Mar 28, 2013 (gmt 0)

read this post of matt cutss may be it can help u
[mattcutts.com...]

phranque




msg:4559246
 4:08 pm on Mar 28, 2013 (gmt 0)

the initial group (*) is empty and the group of Disallows at the end are also useless since bots match paths left-to-right so there must be a leading '/'.
in other words it appears there is no exclusion rule that applies to any of google's bots.
what is GWT telling you?
have you tried "fetch as googlebot"?

g1smd




msg:4559283
 5:58 pm on Mar 28, 2013 (gmt 0)

You MUST include a blank line after each record, i.e. before the next User-agent.

"Allow" is Non-Standard. Use
Disallow:
to allow all.

Premably this is a cut-and-paste error. I assume the " User-agent: * " line belongs just after the # mark, as those Disallow: rules have no preceding User-agent definition.

Str82u




msg:4559314
 6:57 pm on Mar 28, 2013 (gmt 0)

@g1smd does this only apply to Google? [developers.google.com...] - way down the page they have:
disallow - The disallow directive specifies paths that must not be accessed by the designated crawlers. When no path is specified, the directive is ignored.
disallow: [path]
allow - The allow directive specifies paths that may be accessed by the designated crawlers. When no path is specified, the directive is ignored.
allow: [path]

I have to agree, [robotstxt.org...] does mention it:
To exclude all files except one - This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:


We use "allow" and "disallow" with no problems that are apparent.

jimbeetle




msg:4559332
 8:24 pm on Mar 28, 2013 (gmt 0)

We use "allow" and "disallow" with no problems that are apparent.

That's great for Google, what about other bots?

Str82u




msg:4559334
 8:27 pm on Mar 28, 2013 (gmt 0)

No problem with Bing or Yahoo. We get plenty of other bot traffic to it as well.

EDIT: By "apparent" I was inferring that there were no bot issues of any kind that were apparent.

lucy24




msg:4559392
 11:20 pm on Mar 28, 2013 (gmt 0)

A compliant robot MUST honor the "disallow" directive.

A compliant robot may CHOOSE to honor non-standard directives such as "allow".

The word "compliant" means everything.

There are no numbered versions of the robots.txt standard.
The /robots.txt standard is not actively developed.


Also, you may not have blank lines in a record, as they are used to delimit multiple records.

phranque




msg:4559409
 1:27 am on Mar 29, 2013 (gmt 0)

You MUST include a blank line after each record, i.e. before the next User-agent.


that's true according to the robots exclusion protocol.
http://www.robotstxt.org/robotstxt.html [robotstxt.org]:
you may not have blank lines in a record, as they are used to delimit multiple records.


however according to the (non-standard) google documentation...
http://developers.google.com/webmasters/control-crawl-index/docs/robots_txt [developers.google.com]:
Note the optional use of white-space (an empty line) to improve readability.


as jimbeetle says:
That's great for Google, what about other bots?


given that the only bots that have a chance of honoring your robots.txt are non-google, you might want to add the blank lines between groups.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved