Forum Moderators: open

Message Too Old, No Replies

Strange Bot Behavior

Back to the Future meets 1984

         

WebGuerrilla

11:43 pm on Jan 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




I have a client who fell victim to the Google December glitch. At the beginning of the month, I spent a great deal of time digging through error logs trying to determine if something on our end contributed to the problem.

One thing that jumped out was a ton of requests for very old pages (3+ years)that were generating 404's. The requests were coming in rapid sucession from Google, AV, and Direct Hit. It was lile I was looking at a log file from '98.

After seeing this, I got the idea that maybe the high volume of 404's were causing a problem, so for the first time ion this sites history, I put up a robots file that excluded these non-existent file names.

I spent the next few days wathing Googlebot to see if there was any change in behavior. I didn't see anything different, so I decided to run it a few days without it, so I logged in via FTP and changed the file name from robots.txt to robots2.txt. (I didn't want to have to write it from scratch if I decided to put it back).

Within a couple of days, 5 spiders showed up and made 12 total requests for robots2.txt Two of them were AV and Google.

Now I've had this problem with Alexa before, but for the life of me, I can't figure out how/why they would suddenly show up and request that file name. I can't remember ever seeing a spider request random variations.

Now there is a possibility I viewed that file in IE while the Google toolbar was running. I've always suspected that Google might use the toolbar to gather URL's to crawl, (the same way Alexa does) but what about the other spiders? Could Google be sharing data with other engines??

Has anyone seen anything similar?

jeremy goodrich

12:40 am on Jan 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hmm...that gives me an idea.

Create a bunch of 'stuff' surf to it with the toolbar, along with the sites with good pagerank for the same 'stuff' and wait...

then, uninstall, install on a different machine with different IP/UA and repeat...

would be a good way to test to see if in fact they are using the data to crawl with...

after all, aren't the sites that users surf with the toolbar on those that people from google like? so they would merit crawling, yes?

never seen anything like that before, though, but it gives me more reasons to finally check out the toolbar...

Beyond

2:03 am on Jan 30, 2002 (gmt 0)

10+ Year Member



So how did AV find robots2.txt then? They don't have a toolbar... Did you ever view robots2.txt in a browser - or for that matter anyone else in the company? Was your accesses to it only via FTP? If someone viewed it via a browser "maybe" they backclicked or something generating the url in a referer string and it got saved in someones open log file?

Verdy Verdy interesting. Those bots from Google and AV don't try random variations, they had to find that file name somewhere.

Ok - just for fun, try it again and rename it to robots3.txt via ftp

doit

2:09 pm on Mar 29, 2002 (gmt 0)



I unfortunately downloaded
the Alexa toolbar,
and I want to get rid of it.
But how can I uninstall
this Alexa toolbar ?

Thank you for your help.
Best regs

PAR