That sure sounds like cloaking to me.
Not only that, if they're advertising that price they're required by law in many areas to honor it so if your regular customers ever catch them using Lynx or something...
At any rate the FTC would fry them IMO.
An easy way to possibly solve the problem would be to make your crawler send a standard browser user agent string assuming they aren't checking for your IP address as well. If they think it's a browser maybe they'll give you the right price.
[edited by: incrediBILL at 2:56 pm (utc) on Sep 12, 2013]
if they were cloaking it would most likely be based on IP address or User-Agent, so you should test that first.
when you say "side-by-side" do you mean the dealer's server got the requests from the same IP?
what User-Agent string is used for your web-crawling script?
You could try using a search engine translation, without the translation. ;)
>>IP address or User-Agent
i'd assume user agent given that you tested your 'bot/script' and a normal browser from the same IP address and they showed different results.
If they are honoring the lower price to your users, is it really a problem? There are lots of companies that have different prices for different customers/prospects.
Thanks for all the replies, guys. I did do a test with a fake User-Agent, and it still looked like they were cheating. I'll try testing from another server later tonight for further confirmation.
The reason I want to get to the bottom of this is that it makes me look bad when people click their links and the prices don't match. People think either my site is not reliable or maybe I'm getting paid off to send them traffic or whatever. Regardless, I like to do things right :)
Is there anything else I could be overlooking that could turn out to be an honest mistake?
I complimented my first test with a test from another server. So I have:
* window A where I view their page with my home IP address
* window B where the page is being fetched from an alternate server
* window C where the page is being fetched from my normal web crawler
The prices in window A and B matched, while C displayed lower prices. In other words, they're cloaking. Busted!
I plan on exposing the perpetrators, but before I do I need to know if this is 100%. Is there any possible way this can be a mistake or an accident?
>Is there any possible way this can be a mistake or an accident?
That's a defence they may put up.
I would suggest you speak to a lawyer.
Whether you choose to stomp on them or not, you can still prevent them from continuing to do it. Change your robot's UA string to something humanoid. I suggest a current Chrome, which is extremely generic.
|I plan on exposing the perpetrators, but before I do I need to know if this is 100%. Is there any possible way this can be a mistake or an accident? |
Accidentally serve lower prices to a bot UA string than what's being served to a browser UA string within seconds of access from one or the other? Uh, to me that sounds about as believable as Google "accidentally" storing all those e-mails and other info collected via street view wifi sniffing.
I think you should do a GET request for the site with a standard browser user-agent from the same IP address on which your crawler is operating. This will eliminate the (potentially-innocent) possibility that they are geotargeting.
|brotherhood of LAN|
It's worth mentioning that you can mitigate these issues somewhat by using an IP to spider that is separate from your website IP. This kind of thing was easy to manipulate with particular "directory scripts" where they give you a link for a reciprocal link. You'd simply serve a link based on a lookup of the sites IP, store it in a DB and serve the link to those IPs GET requests.
Using a different IP is less predictable.
Many web sites either 403 or serve cloaked responses to the
default UAs offered by most spidering engines.
In most of my scripts used to fetch information from the web
(using wget, lynx, curl, etc.) I have a list of a dozen or so
valid, different UAs, and use logic to randomly select one
from the list to make each request.
It's easy to find (huge) lists of UAs on the web via a search.
Why not just ask them? If, after asking, you see changes, then they definitely were doing something wrong. If nothing changes, you might get a reply from them.