homepage Welcome to WebmasterWorld Guest from 23.20.34.25
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 34 message thread spans 2 pages: 34 ( [1] 2 > >     
QippoBot/1.4+(+http://www.qippo.com/bot/)
Bewenched




msg:4522781
 6:40 am on Nov 27, 2012 (gmt 0)

This bot is a very serious issue if you run an ecommerce site.

QippoBot/1.4+(+http://www.qippo.com/bot/)
so far only seen on ip 176.9.31.72

I'd reported it before but until tonight I did not realize what it was actually doing... because this time it sent a referrer of another website.

The referrer was two different competitors of ours... when their pages load, this bot goes out and snags images from our site and loads them not in a hot link, but snags the image then displays it in an app of sorts.

They both use a service to provide product information to them in a data feed and I feel sure that this provider has set this up with a bot. They just get to sit back and rake in the sales without all the legwork of getting actual product images.

I'm trying to decide just what photos to serve them....
Thinking very evil thoughts. Serve them up our logo then do a DMCA report on it, or something vile.

 

incrediBILL




msg:4522784
 6:54 am on Nov 27, 2012 (gmt 0)

You don't block the old your-server.de now Hetzner Online AG?

Yeah, I have run into a bunch of leeches like that which is why my hotlink code on one well abused site of mine pops an error for any image request without my domain in the referrer.

There was a very nice lady that hot linked my images for child products onto all her eBay pages. To let her know how much I appreciated her efforts I redirected her hot links to some of the sickest stuff you can imagine.

I think she lost her eBay account.

If you want to have some real fun with them start sending out GIF's with large red flashing letters that say "DON'T BUY HERE!" or something equally as amusing.

Someone hot linked one of my customers background images once and we replaced it with big thick letters "THIEVES" which beautifully tiled across the page on their entire site.

Quite amusing ;)

lucy24




msg:4522789
 7:18 am on Nov 27, 2012 (gmt 0)

Oops, overlapped...

176.9 is Hetzner. afaik you can safely block the whole range. My notes say, quote, "too many robots to track". I don't remember who lives at 176.9.140.5 but I noted them particularly as "unattractive robot" :)

qippo




msg:4523861
 12:11 pm on Nov 30, 2012 (gmt 0)

Hi there,

From this message I didn't understand what your claims are. As Google does, Qippo crawls websites and shows link to them. As Google does, Qippo uses some images, but only thumbnail ones with a mandatory link to the source. As Google does, Qippo listens to robots.txt. If you don't want your content to be indexed, you should explicitly express this, just as you do it with Google.

>>because this time it sent a referrer of another website<<

What referrer, who sent, how and when? Can you please be more specific?

>>I'm trying to decide just what photos to serve them<<

You can do it of course, but also you can just write and make all your products get excluded, as you would do with Google. No evil here.

wilderness




msg:4523888
 3:09 pm on Nov 30, 2012 (gmt 0)

Welcome to Webmaster World

The referrer was two different competitors of ours... when their pages load, this bot goes out and snags images from our site and loads them not in a hot link, but snags the image then displays it in an app of sorts.


>>because this time it sent a referrer of another website<<


What referrer, who sent, how and when? Can you please be more specific?


I'm not sure how Bewenched could be more specific than the above 1st quote?

FWIW and since your a newcomer here:
Forum charter [webmasterworld.com]

Links that are allowed to be posted within the Spider ID forum:
Links contained within Search Engine User Agent strings are allowed
Links to the Search Engine home page
Links to the Search Engine crawler page or robots.txt page
Educational material and standards documents - Microsoft, Apache, Google Guidelines, etc.
Authoritative news stories - NY Times, Wall Street Journal, PC World, Wired, BBC, CNN, NBC, etc.

Please do not link to other forums or blogs.

In addition, it is never appropriate to link to any website that you operate or that hosts your own content - no matter how authoritative that content may be.

incrediBILL




msg:4523938
 7:11 pm on Nov 30, 2012 (gmt 0)

@qippo, Welcome to WebmasterWorld as we're definitely interested in the details of your crawler.

In addition, it is never appropriate to link to any website that you operate or that hosts your own content - no matter how authoritative that content may be.


Don, I think in this case, since they represent the crawler and are the authority, we would allow some links to pages that explain details about the crawler operation, IP range, etc. just like we allow for Google and Bing reps.

Basically, as long as the links are pertinent to the discussion and not promotional in nature I'll allow them.

wilderness




msg:4523940
 7:17 pm on Nov 30, 2012 (gmt 0)

Hey Bill,
With all due respect, it was my understanding that a request was being made for the refer links from Bewenched's visitor logs.

In the event that I've read the request wrong, than I apologize.

Bewenched




msg:4524023
 2:14 am on Dec 1, 2012 (gmt 0)

quippo, It is against forum charter to post links to sites as they've stated above.

I am not going to publicly publish my logs here. If you'd like to send me a private message with your email and direct phone number I'd be happy to discuss this serious situation further.

I'll be down right specific when we launch our copyright infringement case. You aren't referring people to us from another site.... you're snagging images for them. I have proof in my logs... and proof on their pages.

Now I need to finish parsing logs for this past year to find all the instances... I've only gone through 3 weeks worth and have found at minimum 7 sites associated with this issue.

incrediBILL




msg:4524028
 2:29 am on Dec 1, 2012 (gmt 0)

With all due respect, it was my understanding that a request was being made for the refer links from Bewenched's visitor logs.


Oops. My bad, the referrer would be inappopriate.

Bewenched




msg:4524166
 3:57 pm on Dec 1, 2012 (gmt 0)

Oops, lets make that 12 different sites snagging and displaying our product images through qippo being fetched by qippobot for these sites with qippo's ip address and these sites as the referrer.

Bewenched




msg:4524715
 1:29 am on Dec 4, 2012 (gmt 0)

Just an update on this situation. What the sites are doing is using a DNS proxy service that may or may not be associated with this bot... They call the image, the bot retrieves it and it's served on their page.

After many emails with this DNS Proxy service they say they cannot help since they are not the host even though they are listed as the hosting company on who is.

I've done a tracert to the sites in question and they all resolve to rackspace. So I've contacted rackspace about it and because the ip that appeared in our logs is not one of theirs they keep saying they dont host them even though I know that they have traces don't lie.

I'll be spending the next few days filing DMCA violations and see what transpires.

Legally what they're doing is wrong and we should be owed compensation for their abuse and use of our formatted images. I've contacted a few attorneys, but the ones I spoke with aren't tech savvy enough to know what I'm talking about. Any ideas?

keyplyr




msg:4524773
 5:52 am on Dec 4, 2012 (gmt 0)

You don't block rackspace?

qippo




msg:4524780
 6:21 am on Dec 4, 2012 (gmt 0)

Hi Bewenched, I've sent you a personal message, please reply. We'll investigate this situation and I'll post a reply here.

Basically saying, Qippo doesn't do anything Google doesn't. Qippo places link to the source of the image, explicitly showing where this image came from. Crawler listens to robots.txt and shows only thumbnails. Qippo removes anything on the first request.

If you're ok with Google doing it, you should be ok with Qippo. If you're not - use robots.txt or just write us. This is how crawlers work.

dstiles




msg:4525095
 8:57 pm on Dec 4, 2012 (gmt 0)

Using google as an example of good practice is very naive. They are now considered a major baddie by a lot of people and many of us block all but the basic googlebot; and even then we only let that through because our clients demand it.

qippo




msg:4525102
 9:13 pm on Dec 4, 2012 (gmt 0)

No probs, I agree. That's why there are simple ways to block crawlers from crawling - tell them where you don't allow them to go. That's how you block google, right?

keyplyr




msg:4525104
 9:38 pm on Dec 4, 2012 (gmt 0)

What's all this pretension comparing yourself to Google? Qippo is a shopping/marketing index IMO, nothing even close to a search engine.

wilderness




msg:4525123
 11:18 pm on Dec 4, 2012 (gmt 0)

No probs, I agree. That's why there are simple ways to block crawlers from crawling - tell them where you don't allow them to go. That's how you block google, right?


FWIW, you have your terminology confused.

A request within robots.txt is exactly that for bots that are compliant.

"Blocking access" (aka denial of access), whether a bot, or any other type of visitor, is a server action, of which the visitor has no choice.

dstiles offered the following, which you apparently overlooked.

many of us block all but the basic googlebot


What's all this pretension comparing yourself to Google?


Ditto.
Why not just deny the 176 Class A, and be down with it ;)

keyplyr




msg:4525139
 12:15 am on Dec 5, 2012 (gmt 0)

Why not just deny the 176 Class A, and be down with it ;)

I don't block the 176 A class, just the server farms (that I'm aware of):

Hetzner
176.9.0.0 - 176.9.0.31
176.9.0.0/16

OVH
176.31.96.0 - 176.31.127.255
176.31.96.0/19

OVH
176.31.224.0 - 176.31.255.255
176.31.224.0/19

Amazon
176.32.64.0 - 176.32.95.255
176.32.64.0/19

Amazon
176.34.0.0 - 176.34.255.255
176.34.0.0/16

Radore
176.53.43.0 - 176.53.43.255
176.53.43.0/24

Nimbus
176.56.59.128 - 176.56.59.191
176.56.59.128/26

Linode
176.58.96.0 - 176.58.127.255
176.58.96.0/19

Cloudata
176.123.0.0 - 176.123.31.255
176.123.0.0/19

keyplyr




msg:4525171
 3:55 am on Dec 5, 2012 (gmt 0)

(Correction)

Hetzner
176.9.0.0 - 176.9.255.255
176.9.0.0/16

wilderness




msg:4525177
 5:08 am on Dec 5, 2012 (gmt 0)

(Correction)


All them 176's look alike ;)

keyplyr




msg:4525185
 5:46 am on Dec 5, 2012 (gmt 0)

Well all 176's don't look alike to me. Example:

NetCom Mobile Broadband
176.11.0.0 - 176.11.255.255
176.11.0.0/16

And there's more ISPs in there.

qippo




msg:4525193
 7:05 am on Dec 5, 2012 (gmt 0)

Ok, that's fine with me. You, as the owner have all rights to block crawling of your website any way you wish. The good way is to use robots.txt, some crawlers don't listen to it - you block it by network. That's ok. It's not about Qippo though. We presume that any e-store wants publicity and getting more links to their products (we don't charge for it and crawl automatically), and if not - they use the most obvious way to express it: using robots.txt (or meta tags for specific web pages). I don't know any other way how any search engine could work otherwise.

keyplyr




msg:4525200
 8:11 am on Dec 5, 2012 (gmt 0)

RE: qippo

For several days all I've been getting is:
We apologize, Qippo is down for maintenance
Planning to be back in a few moments


Looks like only a cover page and no actual search results. So maybe you're crawling for other reasons?

qippo




msg:4525202
 8:20 am on Dec 5, 2012 (gmt 0)

I dont know what to answer here, really :) Qippo has millions of pages, all are working. Go to qippo.com, go through catalog, or use direct links from [qippo.com...]

We're still in early stage though, but the site is working.

keyplyr




msg:4525224
 9:22 am on Dec 5, 2012 (gmt 0)

This is the result of using your search utility. Are you saying that is not a message generated by your server? That your server has been compromised?

qippo




msg:4525227
 9:38 am on Dec 5, 2012 (gmt 0)

Is this correct, that for any search query you enter, you get this message? This is out of this topic but I'd appreciate private message with query examples. Again, search works well here, we may have some mistakes or problems with engine, as it's still developing, but for most of queries it works ok.

Bewenched




msg:4525797
 6:31 am on Dec 7, 2012 (gmt 0)

Well after a few emails with qippo they said it wasn't them but doing a reverse DNS and talking with my service provider... it was them! now they're coming around without showing their user agent!

The icing on the cake ... drum roll please.
The sites that appeared to be hot-linking were using a data provider for their product information...... I've spoken with them and they swore it wasn't them either, but guess what ......

Now that hot-linking isn't showing when qippo snags an image. Amazing how that worked out.

No it wasn't us.... BS. Qippo may be out of the country, but the company and sites that stole our nice clean images without so much as some lube or a coke. And they're our competitors, but they're too lazy to do their own artwork.. how pathetic. Maybe they should go back to scraping google images and fill their site full of irrelevant pictures again.

Both companies swore it wasn't them. BOTH LIED! Amazingly the "referrer" stopped showing up and that didn't stop until my telephone conversation with the guys at %&**^%! (the product data provider) and two of the websites in question. One of the websites in question was just in court last month for copyright violations... guess they like paying for attorneys instead of images.

Sad that a big corporations making millions a year in business feels the need to STEAL images from a family and employee owned business.

Talked to an attorney yesterday afternoon, nice to have a friend that does corporate law.

keyplyr




msg:4525808
 7:26 am on Dec 7, 2012 (gmt 0)

Just a FYI - if you search, there's plenty of examples how to write anti-hotlinking code in your htaccess.

I use a variation that switches the hotlinked file with one that advertises my site and embarrasses the hotlinker. Works great for those forum hotlinkers. Example:

I am a low life thief.
I am trying to steal an image owned by:
www.example.com

qippo




msg:4525820
 7:57 am on Dec 7, 2012 (gmt 0)

Bewenched, I still hope that you're doing it because you made some mistake conclusions, mixed us with someone else. But it's getting more and more annoying. If you have some proofs of what you write - go ahead and take legal actions. Of course, in internet you can write whatever you want. I'm, again, would like to point the following facts:

You wrote me a message with a list of websites and IP asking if it was us. I said NO. I don't know how to give any proofs that we are not these guys. If you're not related with someone there are usually no proofs of this. Please prove to me that you're not hired by one of our competitors. You write nonsense mixing us with some guys I've never heard of, and do it with such a big pressure. Pushing.

I asked you what your website is to investigate the situation, you refused, giving me much less chances to understand what's going on and what happened at all.

>>now they're coming around without showing their user agent! << >>Now that hot-linking isn't showing when qippo snags an image.<< Oh, really? This is a lie. Or a mistake, but this is NOT TRUE.

To close this conversation, I'd like to make a few statements:

1) Qippo does crawl some electronic stores pages and images with an idea to include it in our listing. We believe that this is good for stores to be included there.

2) We ALWAYS send our user agent when requesting the page.

3) We listen to robots.txt (and <meta name="robots" for HTML pages).

4) We will stop crawling your website by your first request, if you don't want to change your robots.txt WE RESPECT YOUR RIGHT NOT TO BE CRAWLED OR INDEXED.

5) We don't give this data to ANYONE else, we don't crawl it for any other reason except of what's written in (1)

6) We display only thumbnails of images, with direct links to the source of where this image came from.

7) We DO NOT hot-link. Show any image that is hotlinked. How you can use this argument at all if on our website there is no a single hotlinked image?

8) We participate in any conversations and are open to it. We promise to investigate any case openly.


This is what we do, Bewenched. I know how forum conversations usually go and understand that it's always easier to blame someone than to protect yourself. But please, if you write anything else here, be more responsible. Because this is not only you who think about your rights here. You blame us publicly saying things that you cannot prove because they're not true. You did it several times. If you want to take legal actions with what you wrote above - again, go ahead and do it, we have nothing to fear here. But, again think a little bit if you may be wrong and made some wrong conclusions.

Once more time I offer to investigate this situation and send you all details regarding crawling your site, if you give me an address of it.

wilderness




msg:4525881
 12:58 pm on Dec 7, 2012 (gmt 0)

Bewenched,
FWIW, I've thousands of active images.
Until 2000 or 2001, I used "names" for images, until thousands of requests appeared without any viewing of pages (for a popular "0name").

At that time, I began numbering images, and have used that method since.

If you don't know the "fish's" number than your not able to locate the fish. Neither are the SE's (unless you offer alt and/or name text for your images).

Unless your a photographer selling images, there's not any benefit to allowing SE's or other websites to have access to your images. Despite the benefits of added traffic from the images.

This 34 message thread spans 2 pages: 34 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved