| This 34 message thread spans 2 pages: 34 (  2 ) > > || |
This bot is a very serious issue if you run an ecommerce site.
so far only seen on ip 188.8.131.52
I'd reported it before but until tonight I did not realize what it was actually doing... because this time it sent a referrer of another website.
The referrer was two different competitors of ours... when their pages load, this bot goes out and snags images from our site and loads them not in a hot link, but snags the image then displays it in an app of sorts.
They both use a service to provide product information to them in a data feed and I feel sure that this provider has set this up with a bot. They just get to sit back and rake in the sales without all the legwork of getting actual product images.
I'm trying to decide just what photos to serve them....
Thinking very evil thoughts. Serve them up our logo then do a DMCA report on it, or something vile.
You don't block the old your-server.de now Hetzner Online AG?
Yeah, I have run into a bunch of leeches like that which is why my hotlink code on one well abused site of mine pops an error for any image request without my domain in the referrer.
There was a very nice lady that hot linked my images for child products onto all her eBay pages. To let her know how much I appreciated her efforts I redirected her hot links to some of the sickest stuff you can imagine.
I think she lost her eBay account.
If you want to have some real fun with them start sending out GIF's with large red flashing letters that say "DON'T BUY HERE!" or something equally as amusing.
Someone hot linked one of my customers background images once and we replaced it with big thick letters "THIEVES" which beautifully tiled across the page on their entire site.
Quite amusing ;)
176.9 is Hetzner. afaik you can safely block the whole range. My notes say, quote, "too many robots to track". I don't remember who lives at 184.108.40.206 but I noted them particularly as "unattractive robot" :)
From this message I didn't understand what your claims are. As Google does, Qippo crawls websites and shows link to them. As Google does, Qippo uses some images, but only thumbnail ones with a mandatory link to the source. As Google does, Qippo listens to robots.txt. If you don't want your content to be indexed, you should explicitly express this, just as you do it with Google.
>>because this time it sent a referrer of another website<<
What referrer, who sent, how and when? Can you please be more specific?
>>I'm trying to decide just what photos to serve them<<
You can do it of course, but also you can just write and make all your products get excluded, as you would do with Google. No evil here.
Welcome to Webmaster World
|The referrer was two different competitors of ours... when their pages load, this bot goes out and snags images from our site and loads them not in a hot link, but snags the image then displays it in an app of sorts. |
|>>because this time it sent a referrer of another website<< |
|What referrer, who sent, how and when? Can you please be more specific? |
I'm not sure how Bewenched could be more specific than the above 1st quote?
FWIW and since your a newcomer here:
Forum charter [webmasterworld.com]
Links that are allowed to be posted within the Spider ID forum:
Links contained within Search Engine User Agent strings are allowed
Links to the Search Engine home page
Links to the Search Engine crawler page or robots.txt page
Educational material and standards documents - Microsoft, Apache, Google Guidelines, etc.
Authoritative news stories - NY Times, Wall Street Journal, PC World, Wired, BBC, CNN, NBC, etc.
Please do not link to other forums or blogs.
In addition, it is never appropriate to link to any website that you operate or that hosts your own content - no matter how authoritative that content may be.
@qippo, Welcome to WebmasterWorld as we're definitely interested in the details of your crawler.
|In addition, it is never appropriate to link to any website that you operate or that hosts your own content - no matter how authoritative that content may be. |
Don, I think in this case, since they represent the crawler and are the authority, we would allow some links to pages that explain details about the crawler operation, IP range, etc. just like we allow for Google and Bing reps.
Basically, as long as the links are pertinent to the discussion and not promotional in nature I'll allow them.
With all due respect, it was my understanding that a request was being made for the refer links from Bewenched's visitor logs.
In the event that I've read the request wrong, than I apologize.
quippo, It is against forum charter to post links to sites as they've stated above.
I am not going to publicly publish my logs here. If you'd like to send me a private message with your email and direct phone number I'd be happy to discuss this serious situation further.
I'll be down right specific when we launch our copyright infringement case. You aren't referring people to us from another site.... you're snagging images for them. I have proof in my logs... and proof on their pages.
Now I need to finish parsing logs for this past year to find all the instances... I've only gone through 3 weeks worth and have found at minimum 7 sites associated with this issue.
|With all due respect, it was my understanding that a request was being made for the refer links from Bewenched's visitor logs. |
Oops. My bad, the referrer would be inappopriate.
Oops, lets make that 12 different sites snagging and displaying our product images through qippo being fetched by qippobot for these sites with qippo's ip address and these sites as the referrer.
Just an update on this situation. What the sites are doing is using a DNS proxy service that may or may not be associated with this bot... They call the image, the bot retrieves it and it's served on their page.
After many emails with this DNS Proxy service they say they cannot help since they are not the host even though they are listed as the hosting company on who is.
I've done a tracert to the sites in question and they all resolve to rackspace. So I've contacted rackspace about it and because the ip that appeared in our logs is not one of theirs they keep saying they dont host them even though I know that they have traces don't lie.
I'll be spending the next few days filing DMCA violations and see what transpires.
Legally what they're doing is wrong and we should be owed compensation for their abuse and use of our formatted images. I've contacted a few attorneys, but the ones I spoke with aren't tech savvy enough to know what I'm talking about. Any ideas?
You don't block rackspace?
Hi Bewenched, I've sent you a personal message, please reply. We'll investigate this situation and I'll post a reply here.
Basically saying, Qippo doesn't do anything Google doesn't. Qippo places link to the source of the image, explicitly showing where this image came from. Crawler listens to robots.txt and shows only thumbnails. Qippo removes anything on the first request.
If you're ok with Google doing it, you should be ok with Qippo. If you're not - use robots.txt or just write us. This is how crawlers work.
Using google as an example of good practice is very naive. They are now considered a major baddie by a lot of people and many of us block all but the basic googlebot; and even then we only let that through because our clients demand it.
No probs, I agree. That's why there are simple ways to block crawlers from crawling - tell them where you don't allow them to go. That's how you block google, right?
What's all this pretension comparing yourself to Google? Qippo is a shopping/marketing index IMO, nothing even close to a search engine.
|No probs, I agree. That's why there are simple ways to block crawlers from crawling - tell them where you don't allow them to go. That's how you block google, right? |
FWIW, you have your terminology confused.
A request within robots.txt is exactly that for bots that are compliant.
"Blocking access" (aka denial of access), whether a bot, or any other type of visitor, is a server action, of which the visitor has no choice.
dstiles offered the following, which you apparently overlooked.
|many of us block all but the basic googlebot |
|What's all this pretension comparing yourself to Google? |
Why not just deny the 176 Class A, and be down with it ;)
|Why not just deny the 176 Class A, and be down with it ;) |
I don't block the 176 A class, just the server farms (that I'm aware of):
220.127.116.11 - 18.104.22.168
22.214.171.124 - 126.96.36.199
188.8.131.52 - 184.108.40.206
220.127.116.11 - 18.104.22.168
22.214.171.124 - 126.96.36.199
188.8.131.52 - 184.108.40.206
220.127.116.11 - 18.104.22.168
22.214.171.124 - 126.96.36.199
188.8.131.52 - 184.108.40.206
220.127.116.11 - 18.104.22.168
All them 176's look alike ;)
Well all 176's don't look alike to me. Example:
NetCom Mobile Broadband
22.214.171.124 - 126.96.36.199
And there's more ISPs in there.
Ok, that's fine with me. You, as the owner have all rights to block crawling of your website any way you wish. The good way is to use robots.txt, some crawlers don't listen to it - you block it by network. That's ok. It's not about Qippo though. We presume that any e-store wants publicity and getting more links to their products (we don't charge for it and crawl automatically), and if not - they use the most obvious way to express it: using robots.txt (or meta tags for specific web pages). I don't know any other way how any search engine could work otherwise.
For several days all I've been getting is:
|We apologize, Qippo is down for maintenance |
Planning to be back in a few moments
Looks like only a cover page and no actual search results. So maybe you're crawling for other reasons?
I dont know what to answer here, really :) Qippo has millions of pages, all are working. Go to qippo.com, go through catalog, or use direct links from [qippo.com...]
We're still in early stage though, but the site is working.
This is the result of using your search utility. Are you saying that is not a message generated by your server? That your server has been compromised?
Is this correct, that for any search query you enter, you get this message? This is out of this topic but I'd appreciate private message with query examples. Again, search works well here, we may have some mistakes or problems with engine, as it's still developing, but for most of queries it works ok.
Well after a few emails with qippo they said it wasn't them but doing a reverse DNS and talking with my service provider... it was them! now they're coming around without showing their user agent!
The icing on the cake ... drum roll please.
The sites that appeared to be hot-linking were using a data provider for their product information...... I've spoken with them and they swore it wasn't them either, but guess what ......
Now that hot-linking isn't showing when qippo snags an image. Amazing how that worked out.
No it wasn't us.... BS. Qippo may be out of the country, but the company and sites that stole our nice clean images without so much as some lube or a coke. And they're our competitors, but they're too lazy to do their own artwork.. how pathetic. Maybe they should go back to scraping google images and fill their site full of irrelevant pictures again.
Both companies swore it wasn't them. BOTH LIED! Amazingly the "referrer" stopped showing up and that didn't stop until my telephone conversation with the guys at %&**^%! (the product data provider) and two of the websites in question. One of the websites in question was just in court last month for copyright violations... guess they like paying for attorneys instead of images.
Sad that a big corporations making millions a year in business feels the need to STEAL images from a family and employee owned business.
Talked to an attorney yesterday afternoon, nice to have a friend that does corporate law.
Just a FYI - if you search, there's plenty of examples how to write anti-hotlinking code in your htaccess.
I use a variation that switches the hotlinked file with one that advertises my site and embarrasses the hotlinker. Works great for those forum hotlinkers. Example:
|I am a low life thief. |
I am trying to steal an image owned by:
Bewenched, I still hope that you're doing it because you made some mistake conclusions, mixed us with someone else. But it's getting more and more annoying. If you have some proofs of what you write - go ahead and take legal actions. Of course, in internet you can write whatever you want. I'm, again, would like to point the following facts:
You wrote me a message with a list of websites and IP asking if it was us. I said NO. I don't know how to give any proofs that we are not these guys. If you're not related with someone there are usually no proofs of this. Please prove to me that you're not hired by one of our competitors. You write nonsense mixing us with some guys I've never heard of, and do it with such a big pressure. Pushing.
I asked you what your website is to investigate the situation, you refused, giving me much less chances to understand what's going on and what happened at all.
>>now they're coming around without showing their user agent! << >>Now that hot-linking isn't showing when qippo snags an image.<< Oh, really? This is a lie. Or a mistake, but this is NOT TRUE.
To close this conversation, I'd like to make a few statements:
1) Qippo does crawl some electronic stores pages and images with an idea to include it in our listing. We believe that this is good for stores to be included there.
2) We ALWAYS send our user agent when requesting the page.
3) We listen to robots.txt (and <meta name="robots" for HTML pages).
4) We will stop crawling your website by your first request, if you don't want to change your robots.txt WE RESPECT YOUR RIGHT NOT TO BE CRAWLED OR INDEXED.
5) We don't give this data to ANYONE else, we don't crawl it for any other reason except of what's written in (1)
6) We display only thumbnails of images, with direct links to the source of where this image came from.
7) We DO NOT hot-link. Show any image that is hotlinked. How you can use this argument at all if on our website there is no a single hotlinked image?
8) We participate in any conversations and are open to it. We promise to investigate any case openly.
This is what we do, Bewenched. I know how forum conversations usually go and understand that it's always easier to blame someone than to protect yourself. But please, if you write anything else here, be more responsible. Because this is not only you who think about your rights here. You blame us publicly saying things that you cannot prove because they're not true. You did it several times. If you want to take legal actions with what you wrote above - again, go ahead and do it, we have nothing to fear here. But, again think a little bit if you may be wrong and made some wrong conclusions.
Once more time I offer to investigate this situation and send you all details regarding crawling your site, if you give me an address of it.
FWIW, I've thousands of active images.
Until 2000 or 2001, I used "names" for images, until thousands of requests appeared without any viewing of pages (for a popular "0name").
At that time, I began numbering images, and have used that method since.
If you don't know the "fish's" number than your not able to locate the fish. Neither are the SE's (unless you offer alt and/or name text for your images).
Unless your a photographer selling images, there's not any benefit to allowing SE's or other websites to have access to your images. Despite the benefits of added traffic from the images.
| This 34 message thread spans 2 pages: 34 (  2 ) > > |