Forum Moderators: open
I just want to know who she originates from. Whose spider it is? Also just while you are all here what spiders are these: InternetSeer
Internet Archive
NPBot
Thanks alot
Happy Surfing;-)
Last night 206.40.146.58
206.40.128.0 - 206.40.159.255
They didn't read robots. Just jumped right in :(
In spite of my "new leaf" I'm going to deny them.
I've been getting some mild undientified traffic from Missouri for some time. Too bad I didn't document the IP.
balam
I recently upgraded my bandwidth and switched isps which is why the netblcok has changed. THe old ip address will go away in about 30 days or so. As you could imagine, with the extra bandwidth, i thoguht it would be appropriate to do a slightly deeper crawl than i usually do. I am also experimenting with some new algorithms (again) to try and solve some of the issues i currently have.
anyway, if you're in that seo world, go to the 'about' section on the site and there is a hoarde of information about how you can optimize if you care...
Not fetching robots.txt every so many hours is not a problem. Fetching Disallowed pages is a problem. But Fluffy doesn't fetch disallowed pages - at least not on my sites.
If you add a new page you want disallowed, always update your robots.txt before you post the page you don't want spidered - This applies to all search engines.
Jim
The bot does have some sort of glitch when it comes to CASE.
Some of my early pages and folder creations still exist (despite webtrends and methods) which involve the use of UPPER case.
Nowhere on the internet are these pages in lower case and yet fluffy attempted to read them as such. Generating 404's in the process.
In fluffy's defense, he is not the only one. Most of the APNIC and a few RIPE specific bots do the same thing on the case sensitive folder/pages.
Fluffy may in fact be an excellent SE. For me it's a matter of my visitors finding obscure SE's and the advantages that my content provide to the SE rather than the other way around.
I rarely uses robots.txt these days. The only general exception to that is if I add a folder which I do not desire the majors SE's to navigate.
Don
so this causes me grief and I intentionally lowercase everything as part of the normalization process. I did some preliminary work a few months back with a case sensitiveless version that merely used reference coutns to decide what the case should be and it seemed to work pretty well, i was just not yet ready to use and deploy the new system when I started thislatest round of crawling.
of course, you understand that many *nix servers are case sensitive...
i've one site that has URLs that are all uppercase and any
lowercase search for them will result in 404s... no, i don't
see a need to "normalize" them to lowercase... why? mainly
because there are additioal URLs that respond to oThErCaSe
URL requests...
maybe normalizing is not such a good idea for a spiderbot?
Twice in the last week!
I contacted kmarcus off the boards, and he was more than helpful figuring out why (Stale version of robots.txt in Fluffy!)
Anyway, I just wanted to comment that while there may be the occasional problem, kmarcus does seem to be very serious about solving them!
dave