Forum Moderators: open

Message Too Old, No Replies

EveryoneSocialBot

         

keyplyr

4:14 am on Oct 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)
Protocol: HTTP/1.1
Robots.txt: No
Host: Linode
50.116.0.0 - 50.116.63.255
50.116.0.0/18
66.175.208.0 - 66.175.223.255
66.175.208.0/20
69.164.192.0 - 69.164.223.255
69.164.192.0/19
96.126.96.0 - 96.126.127.255
96.126.96.0/19
97.107.128.0 - 97.107.143.255
97.107.128.0/20
173.255.192.0 - 173.255.255.255
173.255.192.0/18
(cloud nodes, so possibly other Linode ranges)

Good traffic for your site if your pages are of value the end user.

lucy24

8:15 pm on Oct 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good traffic for your site if your pages are of value [to] the end user.

And if your pages are not of value to the end user? (Yes, I know. Nobody ever tweets "Check out these great deals on sprockets!")

Out of curiosity I checked raw logs for EveryoneSocialBot, found a request for an easily isolated interior page, and then cross-checked requests for that page in the immediate surrounding time period. After a small cluster of human visits-- one of whom presumably tweeted-or-equivalent-- we get:

52.5.154.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 403 3320 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31" 
46.236.24.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 200 26565 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0"
74.112.131.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10538 "-" "Mozilla/5.0 ()"
23.29.122.abc - - [22:35:abc -0700] "HEAD /dir/page.html HTTP/1.1" 200 210 "-" "MetaURI API/2.0 +metauri.com"
173.192.79.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 403 1716 "-" "ShowyouBot (http://showyou.com/crawler)"
17.142.152.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10538 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)"
5.133.215.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 403 1716 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:36.0) Gecko/20100101 Firefox/36.0"
23.29.122.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 200 26584 "-" "MetaURI API/2.0 +metauri.com"
199.16.156.abc - - [22:35:abc -0700] "GET /robots.txt HTTP/1.1" 200 569 "-" "Twitterbot/1.0"
199.16.156.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10482 "-" "Twitterbot/1.0"
50.18.94.abc - - [22:35:abc -0700] "HEAD /dir/page.html HTTP/1.1" 403 220 "-" "Google-HTTP-Java-Client/1.17.0-rc (gzip)"
50.18.94.abc - - [22:35:abc -0700] "HEAD /dir/page.html HTTP/1.1" 403 220 "-" "Google-HTTP-Java-Client/1.17.0-rc (gzip)"
199.16.156.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10482 "-" "Twitterbot/1.0"
199.16.156.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10482 "-" "Twitterbot/1.0"
23.96.208.abc - - [22:35:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10538 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
54.178.213.abc - - [22:36:abc -0700] "GET /dir/page.html HTTP/1.1" 403 1772 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)"
74.112.131.abc - - [22:36:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10538 "-" "Mozilla/5.0 ()"
150.70.173.abc - - [22:37:abc -0700] "GET /dir/page.html HTTP/1.1" 403 3301 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)"
52.68.118.abc - - [22:38:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10538 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)"
54.92.110.abc - - [22:38:abc -0700] "GET /dir/page.html HTTP/1.1" 403 1772 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)"
52.68.118.abc - - [22:38:abc -0700] "GET /dir/page.html HTTP/1.1" 200 10538 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)"
104.130.125.abc - - [22:38:abc -0700] "HEAD /dir/page.html HTTP/1.1" 403 164 "-" "Jakarta Commons-HttpClient/3.0.1"
52.6.27.abc - - [22:42:abc -0700] "HEAD /dir/page.html HTTP/1.1" 403 183 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9) Gecko/2008052906 Firefox/3.0"
69.164.221.abc - - [22:45:abc -0700] "GET /dir/page.html HTTP/1.1" 200 26614 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)"
37.59.19.abc - - [22:55:abc -0700] "GET /dir/page.html HTTP/1.1" 403 1772 "-" "Mozilla/5.0 (compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li)"
That's probably a representative list of robots who come flocking around after a tweet-or-equivalent. (Incidentally, the Twitterbot asks for robots.txt; nobody else does.) The ones with humanoid UA are robots, not inadvertently blocked human cloud users.

In fact it looks as if EveryoneSocialBot is a bit slow on the uptake compared to most others :) The only one slower-- I didn't look past 23:00-- is the PaperLiBot, a UA I wouldn't have recognized at the time, but do now, because it comes crawling for ebooks in response to one site's RSS feed.

What the heck is CrowsNest? Don't recognize the name.

keyplyr

9:02 pm on Oct 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As you know, the Twitter stream has a bunch of scrapers so it is an easy way of getting our pages picked-up and linked from other social sites, but if we play this game, there's constant work on our end to control the bad bots. Again, this takes constant watch over the server logs IMO.

To get the most benefit for this potential, I suggest setting up each page with a snippet and image to be used for the link-back. Most of this info can be found in the dev section of Twitter or Facebook.

What is CrowsNest?
Archived thread: [webmasterworld.com...]

CrowsNest *was* a social media platform located in Japan, however...
Notice of termination of service
Crowsnest is, actually without permission, became a thing where I am allowed to terminate the provision of this service every time.
To everyone that had you use up to now, we deeply apologize for that inconvenience.
2011 since the start of the service of, received patronage the Crowsnest, Thank you.
source: crowsnest.tv

But their bot is still active - so who knows what's up?

lucy24

10:23 pm on Oct 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I suggest setting up each page with a snippet and image to be used for the link-back

Sometimes people tweet a page that doesn't happen to have any images; I've got a few where the only image files are text-as-image because some fonts aren't suitable for embedding and, well, there just aren't any pictures. I noticed that Facebook, in particular, crawls the <noscript> version of a page-- or maybe they grab everything in <img> tags, dunno-- so any new request will include piwik's 1x1 gif. I've taken advantage of this by rewriting select UAs to a little image containing the sitename-- the same kind of thing you'd use in a graphic link-- so there's always something for the human user to select.

:: memo to self: make sure I've still got the blowup-to-thumbnail rewrite, so requests aren't bogged down with full-size jpgs that make it pointless even to visit the page ::

keyplyr

10:58 pm on Oct 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google+, Facebook, and most of the other social sites will use the first, largest image on the page by default. If a webmaster wants to chose another image, there are a couple easy ways of doing this.

In the <HEAD> of the page you can designate a custom image; anything as long as it is jpg, jpeg, gif or png and at least 200x200 px. I usually make them 400x400 to 600x600 to fill the given space at these sites. Cube dimensions work best so they don't get stretched to fill space.

This is the tag most prefer:
<meta property="og:image" content="http://example.com/image.jpg">

But that tag is HTML 5.0 and will not validate at W3C in earlier DOCTYPEs. So for legacy mark-ups, this tag works in most cases:
<link rel="image_src" href="http://example.com/image.jpg">

If anyone has posted a link at FB and you don't like the image choice, you can *try* to force your own image (if you've now installed one of those tags) here: https://developers.facebook.com/tools/debug/og/object

I'm not aware of any other social site that offers a tool to change the link image.