Welcome to WebmasterWorld Guest from 54.196.26.1

Forum Moderators: Ocean10000 & keyplyr

Message Too Old, No Replies

Convincing IP spoofing / user simulation

Pointers to detection of IP spoofing

     
4:20 pm on Jan 9, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


Hi guys. I need some expert advice as this isn't my area...

Let's say a third party wants to do a very convincing job of using bots to make it appear that real users are navigating an ecommerce site. So that would mean spoofing IPs of real users and ensuring navigation patterns resemble that of real users. The obvious problem is the bidirectional issue with IP spoofing, i.e. the spoofer sends requests, but won't get a response. For example, if the spoofer wants to post forms, e.g. add to basket, they're working blind.

So, to deal with this, could the spoofer initially hit the page / add to basket / etc from a non-spoofed IP address and note the response. They then do the same thing from the spoofed IPs and ensure it matches what happened on the non-spoofed one. They could then do this multiple times from multiple spoofed IPs and it would look convincing in the server logs. They would presumably need to re-hit the page from the non-spoofed IP every now and then to ensure nothing has changed.

Is this approach plausible to make the bot more convincing or have I missed something that would preclude this, e.g. protocol handshaking issues?

BTW, I'm asking this because I'm fairly sure based on our stats that we're being intermittently hit by bots that are spoofing IPs and simulating user behaviour.
4:27 am on Jan 10, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12842
votes: 881


Keep it simple - use a CAPTCHA. The better ones are very strong at stopping what you describe. End of story.
5:09 am on Jan 10, 2016 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Dec 29, 2015
posts:43
votes: 6


I believe there are javascript scripts out there that can detect if the mouse cursor moves which would make it infinitely harder to fake for spoofers, right?
5:20 am on Jan 10, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15199
votes: 682


How do you spoof an IP?
9:25 am on Jan 10, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12842
votes: 881


How do you spoof an IP?
Can't really since the remote machine needs a response header. You can compromise another IP address and use it to connect to remote servers but storage becomes the issue. You could set-up relays to additional addresses but if the bandwidth is substantial, the activity would quickly be discovered leaving a trail.

Thieves, scrapers, script-kiddies & other bad doers use the most simple method available. There are thousands of scripts & anonymous VPNs freely available to do whatever is needed. A sophisticated IP dance is not needed.
10:16 am on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


Thanks for the responses. Not sure I'm explaining myself properly as I want to avoid naming names, but I'm talking about click fraud here. The third party is sending traffic to a website on a PPC basis. The question is whether all of that traffic is real or whether some of it is bots designed to look like users. I accept a small proportion of that traffic will always be basic bots and detected by looking for multiple clicks by the same IP, IPs on a blacklist, etc. But I'm talking about more sophisticated bots that spoof IPs and navigate around the site. Is the method described in my original email a way that this can be effectively done?
11:19 am on Jan 10, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12842
votes: 881


That info would have been helpful in the first post :)

So set up a comprehensive combination of filters (UA, IP, headers, method & behavior) that weeds out non-humans including a little script that allows a visitor one PPC limit. That is probably the best you can do.

However I do all this and Adsense still reports a small amount of click fraud on my pages occasionally (twice last year.)
12:30 pm on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


Sorry mate. I was trying to keep things at a purely technical level rather than get into motives, but I should have been clearer.

I'm not trying to block bot traffic. I'm trying to detect it. We already have traffic monitored for bots, showing ~80% human, ~10% likely humans, 1% search bots, and 1% confirmed bots. But the challenge is... could the third party make a bot appear human to detection? Again, they'd need to spoof IPs. That's not difficult, but the bidirectional side is difficult. But surely if they initially hit the site with a non-spoofed IP, recorded the 'conversation' and all the resulting http requests, then they could simply send the identical requests from the spoofed IP and not worry about there being no response. If this was then viewed in the server log, would this not look like a human user hit the site?
12:52 pm on Jan 10, 2016 (gmt 0)

Full Member

5+ Year Member

joined:Aug 16, 2010
posts:252
votes: 20


As long as there is no full TCP handshake the HTTP request is not passed to the webserver. So the spoofed IP requests are not processed or logged by your webserver but filtered out by the TCP stack.
1:02 pm on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@bhukkel Cheers. So you're saying there's no way to spoof the syn-synack-ack handshake. Aren't there still situations where it's possible to predict the sequence number?
1:09 pm on Jan 10, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12842
votes: 881


could the third party make a bot appear human to detection?
Absolutely

Again, they'd need to spoof IPs.
I don't see why. IPs are easy to hijack, no spoofing necessary. As said earlier, there's also anonymous VPNs. You can even purchase routers that will change IPs dynamically using a dump.
1:15 pm on Jan 10, 2016 (gmt 0)

Full Member

5+ Year Member

joined:Aug 16, 2010
posts:252
votes: 20


The generated ACK number in TCP is 2^32 so its very unlikely they can predict it.

But to go back to your OP why spoofing? They can use a botnet of infected PCs of real humans in combination with a headless browser. It is very hard to detect, specially if you are only looking at a single website.
3:51 pm on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


Thanks guys. I'll explain more. Apologies for not doing so earlier, but I wanted to avoid this thread being hijacked by the usual trolls and being turned into an anti-Google rant.

The third party is Google. I appreciate there are theories of how Google has botnets (e.g. that Chrome itself facilitates a botnet), but I'm trying to minimise the conspiracy theories. We have strong evidence of unnatural click vs transaction/basket-add patterns on Google Shopping. We have a sufficiently high number of transactions and monitor user activity sufficiently closely that unnatural patterns are absolutely clear and cannot be attributed to natural statistical variation or other factors. So something is clicking through to the site that isn't 'real' users.

As well as this, we're seeing a bot purporting to be from Google Inc IPs (not the usual googlebot) hit the site once per month for about one minute. It's direct traffic, hits exactly 30 products each time and adds them to basket (i.e. issues an http post), each 3 seconds apart, which is too frequent for a real user on a real browser. Each item is a new landing page and new session. And those 30 items are always in the top 150 most-clicked out of 10,000 products that we list on Google Shopping, so only Google could know this. I can see no legitimate reason why Google (or anyone) would need to do this, which is why I wanted to explore the idea that this is Google hitting the site from a non-spoofed IP to learn the exchange when an item is added to basket, in order that it can then reproduce this from spoofed IPs or similar. Yes, I suppose that Google could be using a botnet to do this, but that seems a bit too risky.

Any thoughts on this? Again, would be good to stick purely to the technical possibilities and avoid the temptation to bitch about Google.
8:03 pm on Jan 10, 2016 (gmt 0)

Full Member

5+ Year Member

joined:Aug 16, 2010
posts:252
votes: 20


Google delivers cloud services so the IP could be from a hosting customer. Could you share the /24 subnet?

I read several times the word reproduce you mean that session cookies are shared between several IPs? Or is it just a click pattern?
8:28 pm on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@bhukkel The IPs are 74.125.63.33, 104.132.20.64/71/77/78/86/90/93. These don't seem to match Google's published cloud IPs. Plus, they're hitting the most clicked products within Google Shopping; only Google should know that.

We get 30 items added to basket, one every 3 seconds, each with a different session, i.e. I doubt they're using session cookies. Each item is direct traffic with its own landing page, i.e. each item has been reached by an external inbound link, then the item has been added to basket, then this repeats. Previous batches have all been from the same IP. The most recent batch varied IP every now and then, although kept to 3 seconds between items.
8:37 pm on Jan 10, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4029
votes: 246


Those are all Google owned IPs, check the user agent in your logs or stats for those hits and they may be related to Google Shopping.
8:53 pm on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@not2easy Yes, I know. That's the point. They are indeed related to Google Shopping, as they're adding to basket 30 of the most clicked items within Google Shopping. The question is why does Google have a bot doing this and does it relate in any way to the unnatural click activity we're seeing on the account. I can see no legitimate reason why it would help Google in any way to add 30 of the most-clicked Shopping items to basket, and Google themselves say that their bots won't submit post forms as they appreciate this can have unforeseen consequences. Hence, the reason for this thread; could Google be doing this for questionable reasons?
8:54 pm on Jan 10, 2016 (gmt 0)

Full Member

5+ Year Member

joined:Aug 16, 2010
posts:252
votes: 20


I have 30 hits from the same IPs, the UA is "Mozilla/5.0 (X11; CrOS x86_64 7520.63.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"

My website is not ecommerce related.

Could it be some kind of google proxy? I do not save any HTTP headers so i cant see it.
9:04 pm on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


The UA of ours is "Mozilla/5.0 (X11; CrOS x86_64 7647.1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.8 Safari/537.36", so quite similar.

Do you use any Google paid services, e.g. adwords?
9:10 pm on Jan 10, 2016 (gmt 0)

Full Member

5+ Year Member

joined:Aug 16, 2010
posts:252
votes: 20


I only have adsense on my site.
9:32 pm on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


Do you have any other information on what it's doing on your site, e.g. which pages it's hitting, time between hits, etc?
9:59 pm on Jan 10, 2016 (gmt 0)

Full Member

5+ Year Member

joined:Aug 16, 2010
posts:252
votes: 20


It seems like human behaviour. Complete page gets loaded including images, favicon, css and js. Even some lazy loaded ajax calls (at the bottom of the page) are loaded.

Mostly GET requests and 3 POST requests. Random time between hits. I got search engine traffic (google.com) and direct hits.

So i really think it is some kind of proxy.
10:03 pm on Jan 10, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12842
votes: 881


Are you sure this is not the Google Publisher Toolbar extension for Chrome? It does generate additional HTTP connections to Google IPs when clicking on Adsense/Adwords.
10:47 pm on Jan 10, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


This is getting interesting! The behaviour we see is extremely unlikely to be human. Again, once per month, we see direct hits on 30 items that are added to basket, all 3 seconds apart with no retained sessions. Due to page load times, it would almost certainly take a human with a real browser longer than 3 seconds to carry out each of these sets of operations.

@keyplyr We don't use the Google Publisher Toolbar. Could you please elaborate on how you think this could be responsible?

@bhukkel So these 30 hits are all different types of pages and doing inconsistent things on your site? Could you please elaborate on the proxy theory? I still don't see why this would be happening on our sites.
11:17 pm on Jan 10, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12842
votes: 881


@keyplyr We don't use the Google Publisher Toolbar. Could you please elaborate on how you think this could be responsible?

Not by you, by advertisers and other Adword users evaluating your pages for the ads... why don't you install it and test yourself.
11:12 am on Jan 11, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@keyplyr We don't use Adsense, etc or have any ads on our site at all. We just use Google Shopping to drive traffic to the site. So I think the Publisher Toolbar is probably not responsible. It also wouldn't explain why this was happening exactly once per month.

Any other thoughts?
11:20 am on Jan 11, 2016 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12842
votes: 881


@ Simon H - sorry, it hasn't been very clear to me what you are doing.

When you say...
The third party is Google.
You may want to consider that just because the IP range may be assigned to Google, your visits may in fact be from someone else entirely. Google has may ranges, used for many things. They lease huge amounts of space to anyone who wants to pay them for it, much like Amazon or any other server farm.
11:27 am on Jan 11, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@keyplyr Thanks and sorry again for not being clearer. Google publishes the IP ranges that they assign to their cloud services and similar services used by paying customers, and those IPs are not in that range. So this really does seem to be Google themselves.
9:17 pm on Jan 11, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15199
votes: 682


each item has been reached by an external inbound link

Is this a conjecture based on the variety of landing pages, or does the request really have a referer? Is the referring URL legitimate or spam?
9:52 pm on Jan 11, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Sept 28, 2015
posts: 273
votes: 171


@lucy24 Each product page hit is reported in Analytics as direct traffic with each individual product page also being reported as the landing page. Also, the bot is hitting an item page, then adding to basket, then hitting another item page, then adding to basket, etc but there is no link from the basket page to the next item. So, pretty sure each individual product page is being separately hit from an external inbound link.
This 62 message thread spans 3 pages: 62
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members