Forum Moderators: open
Being a complete newbie/idiot :) I managed to make a mistake with these 2 very simple lines
Instead of putting in......
User-agent: *
Disallow:
I managed to type in........
User-agent:
* Disallow:
Obviously I want to be crawled. I did this a month ago...since then I have had a couple of fresh on my index page but i don't know about a deep crawl...i haven't worked out how to get all that fancy detailed information but I have had a few visits from googlebot/2.1
Will this have caused me a problem or not?
Thanks
Andy
I must say i don't understand!
I copied the advice from one of these forums
The following allows all robots to visit all files because the wildcard "*" specifies all robots.
User-agent: *
Disallow:
This one keeps all robots out.
User-agent: *
Disallow: /
Is this wrong?
I thought I had just made a typing error!
Andy
Thanks for that...I'll have a look.
I got my information from this page so I'm sure it is right. I used the checker and it says it's o.k. I'm just wondering if i did any harm with the typing error or not?
between...
User-agent: *
Disallow:
and....
User-agent:
* Disallow:
Andy
If you are saying that your site was "freshed" on Google - that is, it was listed in the search results with a date next to the URL, then I wouldn't worry about your invalid robots.txt. In almost all cases, invalid robots.txt files will be ignored. If you got crawled at all, whether by the freshbot or by the deep-crawler, then your robots.txt file was simply ignored.
Jim
It was already listed at google and I definately saw a fresh date at least a few days after putting up the bad robots.txt....just hope the fresh didn't come from before i put up the robot.txt. So..I guess that's a relief if thats the case.
Would that count as invalid?
User-agent:
* Disallow:
Was nick w playing with me when he talked about "allow"? :)
Thanks a lot.
Andy
Yeah, your file with the typos was invalid. So it was most likely ignored. Ignored means the robot will crawl all pages of your site that it can find - if it feels like it and has the time. The only risk with an invalid robots.txt is that the robot will crawl a page you don't want it to crawl.
You've got basically nothing to worry about. All the decent robots will re-fetch robots.txt before they start a spidering run, and many will fetch it several times per day during an on-going spidering run. Therefore there is and can be no "permanent damage".
Don't worry, be happy. Get it cleaned up and validated before the next Google deep-crawl (very likely sometime in the next 2 weeks) and relax.
Jim
Thanks for the info and putting my mind at rest
Just a little of topic question. I notice everyone seems to know exactly what the robots are doing on their site. How do you do that? All the info i get is like the list i pasted below. This comes through my web site provider.
Regards
Andy
Browser Report
Listing the top 40 browsers by the number of requests for pages, sorted by the number of requests for pages.
reqs: pages: browser
-----: -----: -------
27410: 3396: "Mozilla/4.0
364: 63: "Mozilla/5.0
49: 39: "Scooter/3.2.SF0"
262: 36: "Mozilla/4.7
29: 27: "Googlebot/2.1
66: 23: "Mozilla/4.74
97: 23: "Mozilla/3.0C-WorldNet
109: 20: "Mozilla/4.5
21: 20: "FAST-WebCrawler/3.6
122: 19: "Mozilla/4.61
18: 18: "Scooter/3.2"
115: 18: "Mozilla/4.75
16: 15: "Mozilla/3.0
75: 13: "Mozilla/4.8
49: 13: "Mozilla/4.05
19: 13: "-"
23: 11: "libwww-perl/5.48"
9: 9: "FAST-WebCrawler/3.6/FirstPage
81: 8: "Mozilla/4.73
6: 6: "Mozilla/2.0
41: 5: "Mozilla/4.72
5: 5: "Mozilla/4.0_(compatible;_MSIE_5.0;_Windows_95)_VoilaBot/1.6
35: 4: "Opera/6.0
18: 4: "Mozilla/4.77
19: 3: "Mozilla/4.78
16: 3: "Mozilla/4.51
This analysis was produced by analog 5.1.
fat,
these broad settings that hosts provide for stats are almost worthless.
You need a custom (although free) log analyzer program
[statslab.cam.ac.uk...]
which eliminates the most used worthless stats.
I suggest you also find out where your ACTUAL logs are stored and begin viewing them. One host I had for a short while provided a shortend version of logs that was created by a script rather than industry standard full version logs.
This provids some examples of log types:
[verges.ch...]
Analog at one time had some very extensive expanationof log file types and structure of which I'm currently unable to find on their pages.