homepage Welcome to WebmasterWorld Guest from 54.196.62.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
PHP Spider Trap and MSN Weirdness
There's an explination out there somewhere
bumpaw

10+ Year Member



 
Msg#: 11617 posted 4:14 am on Feb 2, 2006 (gmt 0)

Referencing back to a great post on a PHP spider trap [webmasterworld.com] here about 18 months ago.

Boy I have had fun with this one on five sites. The trap sends me an email when a bad bot gobbles my /trap.php. It's been working with out problems for over a year, and I'm not sure it has a problem even now. I think it's with MSN.

Here's the deal. I was going over the pages indexed by the big three for one of my sites and there was the surprise. MSN Search had neatly indexed mydomain.com/trap.php A great way for anyone who happened to stumble on it to block themselves.

I checked the log file and MSN's bot never touched that trap.php or banned itself. I checked my sitemap.xml to see if I had stuck it in there by accident. Is it possible that it grabbed that from my server log as it would have been put there when testing?

The site is only a couple of weeks old and just starting to get spidered. For some reason Google is letting the others get ahead on this one. If anyone has an idea I would sure enjoy hearing it.

[edited by: jatar_k at 5:54 am (utc) on Feb. 2, 2006]
[edit reason] fixed link [/edit]

 

coopster

WebmasterWorld Administrator coopster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 11617 posted 10:55 pm on Feb 2, 2006 (gmt 0)

No, I don't know how it could ever get to your server log. I would guess that at some point the url was available and it picked it up. I'm surprised you can't find the hit in your access log.

bumpaw

10+ Year Member



 
Msg#: 11617 posted 1:05 am on Feb 3, 2006 (gmt 0)

I would guess that at some point the url was available and it picked it up

My thoughts exactly but where? I really looked hard in the log. I used awstats with the rawlog plugin. It has a search that would have picked it up.

In MSN's index it was listed with just the URL as the title link and of course no link to cached page or description. Very strange.

caspita

10+ Year Member



 
Msg#: 11617 posted 1:15 am on Feb 3, 2006 (gmt 0)

If what you see is the URL of your trap page in the index, it is a normal behavior for msn. It happens to me too. What happens is that the bot will pick up the link from any page which links to your trap but following the rules in your robots.txt it won't hit the page. However, it will be listed in the index as URL only.

In fact, for my site, when I do a site:example.com command using msn, it comes in the top 5 lol!.

bumpaw

10+ Year Member



 
Msg#: 11617 posted 2:28 am on Feb 3, 2006 (gmt 0)

caspita, Well I'm glad it's not just me. I thought about trying to change something, but in reality who will ever find that link in MSN except me.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved