Forum Moderators: open

Message Too Old, No Replies

Mistake with user agent

         

fatpeter

10:58 pm on Nov 26, 2002 (gmt 0)

10+ Year Member



Hi all

Being a complete newbie/idiot :) I managed to make a mistake with these 2 very simple lines

Instead of putting in......

User-agent: *
Disallow:

I managed to type in........

User-agent:
* Disallow:

Obviously I want to be crawled. I did this a month ago...since then I have had a couple of fresh on my index page but i don't know about a deep crawl...i haven't worked out how to get all that fancy detailed information but I have had a few visits from googlebot/2.1

Will this have caused me a problem or not?

Thanks

Andy

Nick_W

11:06 pm on Nov 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want to be crawled you need Allow not Disallow. Doh! ;)

Nick

fatpeter

11:27 pm on Nov 26, 2002 (gmt 0)

10+ Year Member



Hi there Nick

I must say i don't understand!

I copied the advice from one of these forums

The following allows all robots to visit all files because the wildcard "*" specifies all robots.
User-agent: *
Disallow:

This one keeps all robots out.
User-agent: *
Disallow: /

Is this wrong?

I thought I had just made a typing error!

Andy

wilderness

11:38 pm on Nov 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Andy,
I have not used this neat little program which may help you in a couple of years.
I'm not sure of the details. I thought it was a freeware program.
Robo Gen
[rietta.com...]

fatpeter

11:47 pm on Nov 26, 2002 (gmt 0)

10+ Year Member



Hi Wilderness,

Thanks for that...I'll have a look.

[searchengineworld.com...]

I got my information from this page so I'm sure it is right. I used the checker and it says it's o.k. I'm just wondering if i did any harm with the typing error or not?

between...

User-agent: *
Disallow:

and....

User-agent:
* Disallow:

Andy

jdMorgan

11:59 pm on Nov 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



fatpeter,

If you are saying that your site was "freshed" on Google - that is, it was listed in the search results with a date next to the URL, then I wouldn't worry about your invalid robots.txt. In almost all cases, invalid robots.txt files will be ignored. If you got crawled at all, whether by the freshbot or by the deep-crawler, then your robots.txt file was simply ignored.

Jim

fatpeter

12:05 am on Nov 27, 2002 (gmt 0)

10+ Year Member



Jim

It was already listed at google and I definately saw a fresh date at least a few days after putting up the bad robots.txt....just hope the fresh didn't come from before i put up the robot.txt. So..I guess that's a relief if thats the case.

Would that count as invalid?

User-agent:
* Disallow:

Was nick w playing with me when he talked about "allow"? :)

Thanks a lot.

Andy

andreasfriedrich

12:07 am on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yes, there is no Allow.

fatpeter

12:09 am on Nov 27, 2002 (gmt 0)

10+ Year Member



:) Ha ha

Thanks a lot Andreas

Andy

jdMorgan

2:04 am on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



fatpeter,

Yeah, your file with the typos was invalid. So it was most likely ignored. Ignored means the robot will crawl all pages of your site that it can find - if it feels like it and has the time. The only risk with an invalid robots.txt is that the robot will crawl a page you don't want it to crawl.

You've got basically nothing to worry about. All the decent robots will re-fetch robots.txt before they start a spidering run, and many will fetch it several times per day during an on-going spidering run. Therefore there is and can be no "permanent damage".

Don't worry, be happy. Get it cleaned up and validated before the next Google deep-crawl (very likely sometime in the next 2 weeks) and relax.

Jim

fatpeter

9:40 am on Nov 27, 2002 (gmt 0)

10+ Year Member



O.K Jim

Thanks for the info and putting my mind at rest

Just a little of topic question. I notice everyone seems to know exactly what the robots are doing on their site. How do you do that? All the info i get is like the list i pasted below. This comes through my web site provider.

Regards

Andy

Browser Report
Listing the top 40 browsers by the number of requests for pages, sorted by the number of requests for pages.

reqs: pages: browser
-----: -----: -------
27410: 3396: "Mozilla/4.0
364: 63: "Mozilla/5.0
49: 39: "Scooter/3.2.SF0"
262: 36: "Mozilla/4.7
29: 27: "Googlebot/2.1
66: 23: "Mozilla/4.74
97: 23: "Mozilla/3.0C-WorldNet
109: 20: "Mozilla/4.5
21: 20: "FAST-WebCrawler/3.6
122: 19: "Mozilla/4.61
18: 18: "Scooter/3.2"
115: 18: "Mozilla/4.75
16: 15: "Mozilla/3.0
75: 13: "Mozilla/4.8
49: 13: "Mozilla/4.05
19: 13: "-"
23: 11: "libwww-perl/5.48"
9: 9: "FAST-WebCrawler/3.6/FirstPage
81: 8: "Mozilla/4.73
6: 6: "Mozilla/2.0
41: 5: "Mozilla/4.72
5: 5: "Mozilla/4.0_(compatible;_MSIE_5.0;_Windows_95)_VoilaBot/1.6
35: 4: "Opera/6.0
18: 4: "Mozilla/4.77
19: 3: "Mozilla/4.78
16: 3: "Mozilla/4.51

This analysis was produced by analog 5.1.

Nick_W

9:44 am on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hehe, seems I'm not to hot on robots.txt ;)

Nick

fatpeter

9:55 am on Nov 27, 2002 (gmt 0)

10+ Year Member



Haha I thought you were sending me up as a newbie :)

wilderness

2:23 pm on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>like the list i pasted below. This comes through my web site provider.>

fat,
these broad settings that hosts provide for stats are almost worthless.
You need a custom (although free) log analyzer program
[statslab.cam.ac.uk...]
which eliminates the most used worthless stats.
I suggest you also find out where your ACTUAL logs are stored and begin viewing them. One host I had for a short while provided a shortend version of logs that was created by a script rather than industry standard full version logs.

This provids some examples of log types:
[verges.ch...]

Analog at one time had some very extensive expanationof log file types and structure of which I'm currently unable to find on their pages.