Forum Moderators: goodroi
Here's my robots.txt file:
User-agent: *
Disallow: /hid/
Disallow: /images/
Allow: /
Disallow:
Anything wrong with this?
Note:
The "Allow" is there to invite the engines to crawl everything.
Anybody know if this works?
Thanks in advance,
Still curious id that's standard procedure for the robots
to just check the robots.txt and come back later. If anybody knows please enlighten us all!
Just FYI, there IS an "Allow" -it's something new in the standards, don't know if they all read it tho'
Following is an explanation I found somewhere, in case
it helps someone:
User-agent: *
Disallow: /org/plans.html
Allow: /org/
Allow: /serv
Disallow: /
The following shows what robots are allowed to access:
[url.com...] No
[url.com...] No
[url.com...] Yes
[url.com...] Yes
[url.com...] Yes
[url.com...] No
[url.com...] No
A note to "AIR" - I'm so embarrassed! My site is awful.
It's 5 years old, from when I first learned HTML. The amazing part is that it works. I get a lot of replies
from my "free estimate" form. Been meaning to update it,
but have been too busy with my client's sites. Like the mechanic with the broken down car...
Well, thanks I feel a little better now...
Regarding the "Allow" here is the link:
[info.webcrawler.com...]
Here's an excerpt:
"Previous of this specification didn't provide the Allow line. The
introduction of the Allow line causes robots to behave slightly
differently under either specification:
If a /robots.txt contains an Allow which overrides a later occurring
Disallow, a robot ignoring Allow lines will not retrieve those
parts. This is considered acceptable because there is no requirement
for a robot to access URLs it is allowed to retrieve, and it is safe,
in that no URLs a Web site administrator wants to Disallow are be
allowed. It is expected this may in fact encourage robots to upgrade
compliance to the specification in this memo."
I'm so confused! Maybe we should start a new thread about this Allow thing. Nobody really seems to understand it. Whe I first ran into this it was on a site about promotion that claimed it would force some robots crawl your whole site.
As a matter of fact I'm going to do that now, I think it merits some discussion, don't you agree?
There really isn't anyway that I've found to force spiders to do anything. A new thread may be a good Idea, I suspect it will evolve along these lines.