Forum Moderators: open

Message Too Old, No Replies

Jasper Technologies (Internet of Things)

         

aristotle

7:58 pm on Aug 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is really puzzling to me. Entries labeled "Jasper Technologies" have appeared in my Statcounter logs for this site nine times over the last three days. BUT THERE ARE NO CORRESPONDING ENTRIES IN THE SERVER LOGS.

All of the Statcounter entries except the last one look like this:
Jasper Technologies (128.177.161.167)
United States
31 Jul 18:37:05
www.google.com/
www. example.com/
Chrome for Android Motorola Moto G 360x640

I thought that it's some kind of bot, but today the last Statcounter entry shows that links on the page are being clicked, which proves that it's a human.

If you look up Jasper Technologies, it's described as a provider of a cloud-based software platform for the Internet of Things. Numerous large corporations are using its services.

I don't understand why these visits don't appear in the server logs. I searched for the past two months and can't find anything. Two months seems rather old for a cache. Does anyone have an explanation?

aristotle

9:02 pm on Aug 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



After further thought, it occurs to me that an old cache of my page might have been stored in the cloud, and that this is what is being accessed. This could explain why there aren't any recent entries in the server logs.

keyplyr

9:57 pm on Aug 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Header set X-Robots-Tag "noarchive"

User-agent: Archive-It
Disallow: /

User-agent: archive.org_bot
Disallow: /

As well as blocking scrapers, bad bots & all know server farms. colos & data centers that offer hosting.

Having said that, occasionally one gets through and copies (caches) your site and publishes it somewhere. Sometimes searching for a text snippet from one of your pages may turn up something, or there are tools like Copyscape that will uncover copies of your site. Then you have to deal with the sometimes tedious task of getting your intellectual property removed from the remote server.

wilderness

10:12 pm on Aug 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Supporting images for page were from MS IP.
Class C of Jasper was added to denies.
Jasper Tech is a subnet of Abovenet. For the longest while (more than a decade) I had most of the Abovenet ranges denied, however opened them up a while back.

128.177.161.165 - - [12/Apr/2015:14:26:28 -0600] "GET /MyFolder/SubFolderMyPage.html HTTP/1.1" 200 5443 "-" "Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 630) like Gecko"
128.177.161.165 - - [12/Apr/2015:14:26:28 -0600] "GET /Myfile.css HTTP/1.1" 200 827 "http://example.com/SameFolder/SameSub/SamePage.html" "Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 630) like Gecko"
157.55.80.72 - - [12/Apr/2015:14:26:28 -0600] "GET /ImageFolder/MyImage.gif HTTP/1.1" 403 647 "http://example.com/SameFolder/SameSub/SamePage" "Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 630) like Gecko"

aristotle

10:57 pm on Aug 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the replies.
keyplyr -- Most scraping of my articles is done by humans, not by bots. These are people who start a website and then fill it up by copying things that they see on other sites. In most cases this is harmless, and not worth spending any time on.

But I don't know exactly what Jasper Technologies does. If they are copying websites into the cloud and giving access to them there, that could be a problem.

wilderness -- Isn't that a real human visitor and coming to your server.

wilderness

11:25 pm on Aug 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wilderness -- Isn't that a real human visitor and coming to your server.


I'm sure it was, however two different IP's (especially with one being a Cloud and the other MSN (MSN and all major SE's are denied from my image folders; both in robots and denies)) tend to shoot off an alarm.
New Clouds are added as they appear, regardless of criteria.
This particular visitor was a result of posting on FB, although I did not provide the related requests.

keyplyr

1:24 am on Aug 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These are people who start a website and then fill it up by copying things that they see on other sites.
A huge percentage of new websites are created with CMS software like WordPress. These guys steal your content right from their admin panel. Blocking this can help stop a lot of scraping.

RewriteCond %{HTTP_USER_AGENT} (admin|WPDesk|winhttp|word/ |wordpress|wp-)
RewriteCond %{HTTP_REFERER} (admin|WPDesk|winhttp|word/ |wordpress|wp-)
RewriteRule - [F]

aristotle

6:21 pm on Aug 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks keyplyr
I was thinking that people find things with their browser, and if they see something that they want to add to their site, they can use their browser to save a copy of it to their hard drive. Then from there they add it to their site.

I don't use wordpress or other CMS, so am not familiar with the method you're referring to. But since that's only a few lines of code, then I'll add it to my .htaccess files for my sites.

wilderness

9:40 pm on Aug 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_USER_AGENT} (admin|WPDesk|winhttp|word/ |wordpress|wp-)
RewriteCond %{HTTP_REFERER} (admin|WPDesk|winhttp|word/ |wordpress|wp-)
RewriteRule - [F]


FWIW, there's a syntax error in both lines that result in a 500.

word/ 

Not sure if your intent is to include the forward-slash path followed by a blanks space?
or
Your intent is in escaping (mistakenly) a blank space that follows the term-'word' ?

[edited by: wilderness at 9:42 pm (utc) on Aug 3, 2015]

keyplyr

9:41 pm on Aug 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Stopping individuals from using browsers to cut'n paste your work is not %100 effective, but there are a couple things you can do as well. You can use a JavaScript to stop right-click, which will also stop cut'n paste but many users will become annoyed (that will also disrupt the focus and play havoc with entering text into forms & on-page search fields.)

You can also add this to your above rules to stop one of the several methods of saving to desk-top:
RewriteCond %{REQUEST_METHOD} ^(OPTIONS|PROPFIND)$

@wilderness - yes, I posted as intended. There is no syntax error and it will not cause a 500. In the UA there is a space after "word" which gets escaped in the RegEx.

wilderness

10:02 pm on Aug 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There is no syntax error and it will not cause a 500.

It certainly does/did on my host.

FWIW, an escape is a backslash, rather than a forward slash.

Thanks for the clarification.

keyplyr

10:33 pm on Aug 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, I meant to say there is NO space after the slash. So in this case, you are correct... it is a syntax error as posted, but the result of posting inconsistencies (it likely wrapped causing a space). Not so in the code on my server (That's what I get for posting with a phone.)

So, for the record:

RewriteCond %{HTTP_USER_AGENT} (admin|WPDesk|winhttp|word/|wordpress|wp-)
RewriteCond %{HTTP_REFERER} (admin|WPDesk|winhttp|word/|wordpress|wp-)
RewriteCond %{REQUEST_METHOD} ^(OPTIONS|PROPFIND)$
RewriteRule - [F]