Welcome to WebmasterWorld Guest from 54.158.166.6

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

PHP Spider Trap and MSN Weirdness

There's an explination out there somewhere

   
4:14 am on Feb 2, 2006 (gmt 0)

10+ Year Member



Referencing back to a great post on a PHP spider trap [webmasterworld.com] here about 18 months ago.

Boy I have had fun with this one on five sites. The trap sends me an email when a bad bot gobbles my /trap.php. It's been working with out problems for over a year, and I'm not sure it has a problem even now. I think it's with MSN.

Here's the deal. I was going over the pages indexed by the big three for one of my sites and there was the surprise. MSN Search had neatly indexed mydomain.com/trap.php A great way for anyone who happened to stumble on it to block themselves.

I checked the log file and MSN's bot never touched that trap.php or banned itself. I checked my sitemap.xml to see if I had stuck it in there by accident. Is it possible that it grabbed that from my server log as it would have been put there when testing?

The site is only a couple of weeks old and just starting to get spidered. For some reason Google is letting the others get ahead on this one. If anyone has an idea I would sure enjoy hearing it.

[edited by: jatar_k at 5:54 am (utc) on Feb. 2, 2006]
[edit reason] fixed link [/edit]

10:55 pm on Feb 2, 2006 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



No, I don't know how it could ever get to your server log. I would guess that at some point the url was available and it picked it up. I'm surprised you can't find the hit in your access log.
1:05 am on Feb 3, 2006 (gmt 0)

10+ Year Member



I would guess that at some point the url was available and it picked it up

My thoughts exactly but where? I really looked hard in the log. I used awstats with the rawlog plugin. It has a search that would have picked it up.

In MSN's index it was listed with just the URL as the title link and of course no link to cached page or description. Very strange.

1:15 am on Feb 3, 2006 (gmt 0)

10+ Year Member



If what you see is the URL of your trap page in the index, it is a normal behavior for msn. It happens to me too. What happens is that the bot will pick up the link from any page which links to your trap but following the rules in your robots.txt it won't hit the page. However, it will be listed in the index as URL only.

In fact, for my site, when I do a site:example.com command using msn, it comes in the top 5 lol!.

2:28 am on Feb 3, 2006 (gmt 0)

10+ Year Member



caspita, Well I'm glad it's not just me. I thought about trying to change something, but in reality who will ever find that link in MSN except me.