Forum Moderators: open

Message Too Old, No Replies

Moreover

         

lucy24

8:38 pm on Mar 10, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't know if this is related to the robot discussed almost 10 years ago [webmasterworld.com].
IP: 70.39.246.abc (PacketExchange, last time I looked)
UA: Mozilla/5.0 Moreover/5.1 (+http://www.moreover.com)
robots.txt: kinda-sorta
. . . where “kinda-sorta” means that it requests robots.txt several seconds after requesting a single interior page (on which it gets a 403)

I find a visit in April 2016 from the identical IP, with UA
Mozilla/5.0 Moreover/5.1 (+http://www.moreover.com; webmaster@moreover.com)
No robots.txt that time; instead it asked for the same page twice. The current UA seems to go back to November 2018.

tangor

12:49 am on Mar 12, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Made me look. Went through 8 years (which is what I keep current) and had no hits from this. Where do you keep finding these things? (Inquiring minds humorously want to know!)

iamlost

2:01 am on Mar 12, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I believe it's the same: a media monitoring BI service.

I totally blocked it until a few years ago after it was aquired by Lexis-Nexis. Since then I allow it to crawl home page, about/contact pages, and newsletter archive and block it everywhere else :) Don't see it very often, though; perhaps a couple times a year.

The reason for allowing it at all is to show up as a source/citation for niche matters of visitor concern, such as niche disaster news updates and background, i.e. product contamination.

notriddle

1:18 am on Sep 2, 2019 (gmt 0)

5+ Year Member



Resurrecting this thread to mention that I, too, have gotten hits from them. Same IP address, same UA.

They seem to have picked up on my site's RSS feed. They request the feed, then request every linked page that they don't already have a copy of. The only reason I noticed them is that, unlike smarter feed readers that do the same thing, they don't rate limit their requests. They get down about five of them, then start getting 429's. They don't stop upon getting 429's, either, but they do re-request the errored pages an hour later when they re-fetch the feed.

Since they don't seem to have ever requested robots.txt from my site, and they are clearly recognized as a robot, I'm going to categorically block them.