Forum Moderators: open

Message Too Old, No Replies

ReplazBot

         

asdkasdfadf

2:17 pm on Mar 12, 2017 (gmt 0)



UA: Mozilla/5.0 (compatible; ReplazBot/3.1; +http://www.replaz.com)
Protocol: HTTP/1.1
Robots.txt: No
Host: Changes


They have different IPs and servers.

keyplyr

2:40 pm on Mar 12, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



replaz.com appears to be a Turkish search engine

Host: sadecehosting.com (equinix)
Servers: digitalocean.com
198.211.96.0 - 198.211.127.255
198.211.96.0/19

jonasjacek

9:46 pm on Mar 16, 2017 (gmt 0)

5+ Year Member



It may look like a search engine but it does not work. Worse: They use spam techniques to get their SERPs into other search engines. The HTML is awful.
For now I don't see any potential here. Blocked.

lucy24

1:10 am on Mar 17, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't care who they are--whether spam factory or legitimate search engine--if they don't ask for robots.txt they can't come in.

In any case, I gotta say the page does not look like a legitimate search engine worthy of the user's respect.

keyplyr

2:40 am on Mar 17, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would think that webmasters active in these forums would block server farms anyway. digitalocean.com would be high on the list.

if they don't ask for robots.txt they can't come in
That used to be the standard test whether a bot is legit or not. However with the introduction of Social Media & interest based marketing, that dynamic has changed.

I allow a lot of agents that do not request robots.txt. Yes, in a perfect world I would hope every agent request/obey the robots.txt standard but that just isn't so.

lucy24

4:32 am on Mar 17, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Generally when someone doesn't ask for robots.txt--facebook being the obvious example--there's some reason. "We're not just crawling, we came here because of some human action" type of thing. You can then choose whether to consider that a valid reason. But this one's a robot, plain and simple.

:: detour to check something ::

Yes, I do have a hole poked for facebookexternalhit (one header missing) and also for visionutils. Although, come to think of it, I don't really need to in the latter case because that UA only requests images, which are subject to their own hole-poking.

keyplyr

5:06 am on Mar 17, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bots are generally divided into 2 categories: spiders and validators/retrievers. Validators & retrievers have never seen themselves as required to support robots.txt.

Of the spiders, there can also be 2 categories: vertical and linear. The vertical bots, those that get their targets from a list, also have never seen a need to support robots.txt.

Only the linear bots (spiders/crawlers) that follow one link to another, have volunteered to support robots.txt... and only those who care about being reputable.

lucy24

8:23 pm on Mar 17, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Validators & retrievers have never seen themselves as required to support robots.txt.
Except the w3 family of validators, who are scrupulous about the robots.txt thing. (At least the link checker; that's the only one that has to visit live sites.) The twitterbot is also scrupulous about robots.txt. Partly for this reason, it is the only robot permitted to visit my test site.

The vertical bots, those that get their targets from a list, also have never seen a need to support robots.txt.
I've got one area that is visited by a slew of RSS-following robots every time something new gets added to a curated directory. The Great Divide is between the ones that ask for robots.txt and the ones that don't. If they don't ask for robots.txt, and the UA string offers no way to get information about who they are and why they're visiting, why the ### should I let them in?