Welcome to WebmasterWorld Guest from 107.20.34.173

Forum Moderators: open

Message Too Old, No Replies

How can I tell if someone is cloaking

     

mike73

2:17 pm on Sep 12, 2013 (gmt 0)

10+ Year Member



I run a price comparison site. I noticed recently that for one dealer in particular, if I open his product page in 2 browsers, side-by-side--one normal view, and the other through the eyes of my web-crawling script--the prices my web-crawler see are 3% lower across the board, making this dealer the cheapest in my pricing tables.

It really seems like they're using cloaking to be #1 on my list, but I want to give them the benefit of the doubt. I know I'm not viewing cached pages, because the values on the "fake" pages update constantly along with the real pages. Is there anything else I can check to know for sure that they are or aren't cheating? I'm a n00b at this black hat stuff, so I really don't know what to look for.

incrediBILL

2:53 pm on Sep 12, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



That sure sounds like cloaking to me.

Not only that, if they're advertising that price they're required by law in many areas to honor it so if your regular customers ever catch them using Lynx or something...

At any rate the FTC would fry them IMO.

An easy way to possibly solve the problem would be to make your crawler send a standard browser user agent string assuming they aren't checking for your IP address as well. If they think it's a browser maybe they'll give you the right price.

Try that.

[edited by: incrediBILL at 2:56 pm (utc) on Sep 12, 2013]

phranque

2:53 pm on Sep 12, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



if they were cloaking it would most likely be based on IP address or User-Agent, so you should test that first.
when you say "side-by-side" do you mean the dealer's server got the requests from the same IP?
what User-Agent string is used for your web-crawling script?

engine

3:25 pm on Sep 12, 2013 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You could try using a search engine translation, without the translation. ;)

[bing.com...]
[translate.google.com...]

topr8

3:27 pm on Sep 12, 2013 (gmt 0)

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>IP address or User-Agent

i'd assume user agent given that you tested your 'bot/script' and a normal browser from the same IP address and they showed different results.

arieng

3:37 pm on Sep 12, 2013 (gmt 0)

5+ Year Member



If they are honoring the lower price to your users, is it really a problem? There are lots of companies that have different prices for different customers/prospects.

mike73

5:56 pm on Sep 12, 2013 (gmt 0)

10+ Year Member



Thanks for all the replies, guys. I did do a test with a fake User-Agent, and it still looked like they were cheating. I'll try testing from another server later tonight for further confirmation.

The reason I want to get to the bottom of this is that it makes me look bad when people click their links and the prices don't match. People think either my site is not reliable or maybe I'm getting paid off to send them traffic or whatever. Regardless, I like to do things right :)

Is there anything else I could be overlooking that could turn out to be an honest mistake?

mike73

7:16 pm on Sep 12, 2013 (gmt 0)

10+ Year Member



UPDATE:

I complimented my first test with a test from another server. So I have:
* window A where I view their page with my home IP address
* window B where the page is being fetched from an alternate server
* window C where the page is being fetched from my normal web crawler

The prices in window A and B matched, while C displayed lower prices. In other words, they're cloaking. Busted!

mike73

8:10 am on Sep 13, 2013 (gmt 0)

10+ Year Member



I plan on exposing the perpetrators, but before I do I need to know if this is 100%. Is there any possible way this can be a mistake or an accident?

engine

5:05 pm on Sep 13, 2013 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



>Is there any possible way this can be a mistake or an accident?
That's a defence they may put up.

I would suggest you speak to a lawyer.

lucy24

7:10 pm on Sep 13, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Whether you choose to stomp on them or not, you can still prevent them from continuing to do it. Change your robot's UA string to something humanoid. I suggest a current Chrome, which is extremely generic.

JD_Toims

3:47 am on Sep 14, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



I plan on exposing the perpetrators, but before I do I need to know if this is 100%. Is there any possible way this can be a mistake or an accident?

Accidentally serve lower prices to a bot UA string than what's being served to a browser UA string within seconds of access from one or the other? Uh, to me that sounds about as believable as Google "accidentally" storing all those e-mails and other info collected via street view wifi sniffing.

volatilegx

3:43 pm on Dec 27, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you should do a GET request for the site with a standard browser user-agent from the same IP address on which your crawler is operating. This will eliminate the (potentially-innocent) possibility that they are geotargeting.

brotherhood of LAN

4:37 pm on Dec 27, 2013 (gmt 0)

WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It's worth mentioning that you can mitigate these issues somewhat by using an IP to spider that is separate from your website IP. This kind of thing was easy to manipulate with particular "directory scripts" where they give you a link for a reciprocal link. You'd simply serve a link based on a lookup of the sites IP, store it in a DB and serve the link to those IPs GET requests.

Using a different IP is less predictable.

Jonesy

6:32 pm on Dec 29, 2013 (gmt 0)

5+ Year Member



Many web sites either 403 or serve cloaked responses to the
default UAs offered by most spidering engines.

In most of my scripts used to fetch information from the web
(using wget, lynx, curl, etc.) I have a list of a dozen or so
valid, different UAs, and use logic to randomly select one
from the list to make each request.

It's easy to find (huge) lists of UAs on the web via a search.

rominosj

11:31 pm on Jan 1, 2014 (gmt 0)

10+ Year Member



Why not just ask them? If, after asking, you see changes, then they definitely were doing something wrong. If nothing changes, you might get a reply from them.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month