homepage Welcome to WebmasterWorld Guest from 54.211.157.103
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
SEs Check robots.txt and just go away?
webtamers




msg:1525864
 10:59 pm on Nov 4, 2000 (gmt 0)

Help, guys! I submitted my site a month ago. I just checked my logs, and I can see that
there have been 40 requests for robots.txt file, where they check that file, and nothing else.
After checking the file, they go away. Is this normal? Do search engines just check it, then come back later to crawl?

Here's my robots.txt file:

User-agent: *
Disallow: /hid/
Disallow: /images/
Allow: /
Disallow:

Anything wrong with this?

Note:
The "Allow" is there to invite the engines to crawl everything.
Anybody know if this works?

Thanks in advance,

 

Air




msg:1525865
 11:34 pm on Nov 4, 2000 (gmt 0)

The "allow" should not be there, there is no "allow" directive for the robots.txt file. Likely they are ignoring the "allow" and interpreting the "/" (root) as disallow everything.

I would change it to just read:

User-agent: *
Disallow: /hid/
Disallow: /images/

NFFC




msg:1525866
 12:05 am on Nov 5, 2000 (gmt 0)

What Air said.
You also have other stuff in there that needs to be taken out. See here [info.webcrawler.com] for more info on robots.txt and here [rietta.com] for a no-brainer solution.

Air




msg:1525867
 3:41 am on Nov 5, 2000 (gmt 0)

webtamers,

checked out your site, I thought someone was messing with my cat, I've seen your site before, just can't remember where ....

DaveAtIFG




msg:1525868
 6:40 am on Nov 5, 2000 (gmt 0)

There's a syntax checker for robots.txt files here [tardis.ed.ac.uk] that's never let me down.

webtamers




msg:1525869
 10:16 am on Nov 5, 2000 (gmt 0)

Thanks Guys,

Still curious id that's standard procedure for the robots
to just check the robots.txt and come back later. If anybody knows please enlighten us all!

Just FYI, there IS an "Allow" -it's something new in the standards, don't know if they all read it tho'

Following is an explanation I found somewhere, in case
it helps someone:

User-agent: *
Disallow: /org/plans.html
Allow: /org/
Allow: /serv
Disallow: /

The following shows what robots are allowed to access:

[url.com...] No
[url.com...] No
[url.com...] Yes
[url.com...] Yes
[url.com...] Yes
[url.com...] No
[url.com...] No

A note to "AIR" - I'm so embarrassed! My site is awful.
It's 5 years old, from when I first learned HTML. The amazing part is that it works. I get a lot of replies
from my "free estimate" form. Been meaning to update it,
but have been too busy with my client's sites. Like the mechanic with the broken down car...

Air




msg:1525870
 5:27 pm on Nov 5, 2000 (gmt 0)

>A note to "AIR" - I'm so embarrassed!

Don't be, it's funny and memorable, I can see why you would get people's attention with it.

Thanks for the info on "allow" I didn't know that, do you recall where you saw it? I'd like to read up on it.

webtamers




msg:1525871
 9:19 pm on Nov 5, 2000 (gmt 0)

To AIR,

Well, thanks I feel a little better now...

Regarding the "Allow" here is the link:

[info.webcrawler.com...]

Here's an excerpt:
"Previous of this specification didn't provide the Allow line. The
introduction of the Allow line causes robots to behave slightly
differently under either specification:

If a /robots.txt contains an Allow which overrides a later occurring
Disallow, a robot ignoring Allow lines will not retrieve those
parts. This is considered acceptable because there is no requirement
for a robot to access URLs it is allowed to retrieve, and it is safe,
in that no URLs a Web site administrator wants to Disallow are be
allowed. It is expected this may in fact encourage robots to upgrade
compliance to the specification in this memo."

I'm so confused! Maybe we should start a new thread about this Allow thing. Nobody really seems to understand it. Whe I first ran into this it was on a site about promotion that claimed it would force some robots crawl your whole site.

As a matter of fact I'm going to do that now, I think it merits some discussion, don't you agree?

Air




msg:1525872
 6:14 am on Nov 6, 2000 (gmt 0)

I think I know the paper you are referring to, it was a draft spec form a few years ago. The author proposes "Allow" but it never got anywhere. So for now I would stick with "disallow" as the only valid directive for those bots still respecting the robots.txt.

There really isn't anyway that I've found to force spiders to do anything. A new thread may be a good Idea, I suspect it will evolve along these lines.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved