For the past few months slurp has been generating a lot of 404's. There are 3 types:
* Genuine 404s from pages which were deleted a while ago.
* 404s from what seems to be badly configured software
* 404s from what seems to be attempts at exploits.
The following are 404s from Yahoo sports pages such as blogs and video sections:
404 GET /nhl/blog/YYYYY/teams/Nashville+Predators/nhl.t.27
404 GET /nhl/players/2848/gallery/im:urn:newsml:sports.yahoo,getty:YYYYYY:nhl,photo,YYYYYYYYYYYY_nashville_pre:1
404 GET /nhl/teams/was
404 GET /nhl/teams/cob
My sector is sports but nothing to do with hockey, or US sports of any kind.
If I look at the referring pages there is no link to my site so is this badly configured software?
The following seem to be some kind of exploit:
myhigheredjobs is I believe a jobsite app which uses a login admin panel. As with the company/contact.cfm and the question/index they are not on my site and they look as if they are trawling for exploits.
The IP address does look genuine:
18.104.22.168.in-addr.arpa name = b3090812.crawl.yahoo.net.
Authoritative answers can be found from:
115.195.67.in-addr.arpa nameserver = ns2.yahoo.com.
115.195.67.in-addr.arpa nameserver = ns3.yahoo.com.
115.195.67.in-addr.arpa nameserver = ns4.yahoo.com.
115.195.67.in-addr.arpa nameserver = ns5.yahoo.com.
115.195.67.in-addr.arpa nameserver = ns1.yahoo.com.
ns1.yahoo.com internet address = 22.214.171.124
ns2.yahoo.com internet address = 126.96.36.199
ns3.yahoo.com internet address = 188.8.131.52
ns4.yahoo.com internet address = 184.108.40.206
ns5.yahoo.com internet address = 220.127.116.11
So what the heck is going on here? Is this some kind of spoofing in order to crawl my site to get past current bad bot blocking and / or exploit trawling?
As I said on another thread here slurp is excessively crawling the site. I am wondering if some kind of spoofing is going on and that I should totally block the IP.