|How Intelligent is Scooter?|
huge urls crawled
| 9:27 pm on Jun 1, 2003 (gmt 0)|
When I checked my stats today, I could not believe what I saw: Scooter hat indexed a huge amount of pages. But happyness turned into astonishment when I had a closer look at my log files:
Scooter had indexed just yesterday thousands of urls 9 levels and more deep, containing about 300 to 400 characters!
Due to a leak in my URl rewriting code it got href="widget-name/manufacturer/" instead of href="/...". The page containing the error was not reachable by a link (probably the reason why I haven't noticed the error before), thus Scooter must have dropped the "manufacturer" and gone to /widget-name instead.
Glad I checked my stats today. I don't know where this would have ended (Scooter going down the infinite pyramide)
Now I am anxious to see, how I rank! Hope I don't get banned ;-)
| 9:34 pm on Jun 1, 2003 (gmt 0)|
What I also noticed, at a second look: there are a lot of URl like "/Widg" included!
Somehow it was have lost the ending some words!
| 9:56 pm on Jun 1, 2003 (gmt 0)|
Happy for ya globay
| 9:43 am on Jun 3, 2003 (gmt 0)|
from what I see, this bot is really buggy, it's requesting uri that do not exist on my server, does not respect absolute paths (like href="/something") and sends uncomplete uri like "login.ph" instead of "login.php"
Scooter produces tons of 404 everytime it come by my sites whereas all my links are valid
| 12:33 pm on Jun 3, 2003 (gmt 0)|
Scooter produces heaps of 404s on my site as well.
Like mentioned before it seems to chop of urls all the time.
It also seems to 'try' to connect to anything/java
Very strange, as I don't have anything "java" on my site.
I get an email everytime a 404 is produced, and each time I get scootered, I have to crawl through a heap of mails!
| 1:23 pm on Jun 3, 2003 (gmt 0)|
Hey driesie you just touched on a subject that interests me here: you get an email everytime you get a 404... What kind of script or program do you use and where can I get it?
| 1:26 pm on Jun 3, 2003 (gmt 0)|
I am not a mod rewrite expert, but you could do it, using Error 404 document, forwarding to a page, that automatically sends an email.
| 1:34 pm on Jun 3, 2003 (gmt 0)|