Forum Moderators: DixonJones
2003-12-26 11:28:29 128.111.29.168 GET /index.htm 206 3698 116 www.site.org pixfinder/2.0 -
2003-12-26 11:28:29 128.111.29.168 GET /index.htm 206 3640 125 www.site.org pixfinder/2.0 -
2003-12-26 11:28:29 128.111.29.168 HEAD /image001.jpg 200 248 109 www.site.org LWP::Simple/5.76 -
2003-12-26 11:28:29 128.111.29.168 GET /image001.jpg 206 8465 129 www.site.org libwww-perl/5.76 -
It's the first I've seen it. It took most of the pages on the site, twice, complete with htm pages and associated jpg images. No robots.txt request. It identified itself as pixfinder when it first came in on the index, then just "LWP::Simple/5.76" and "libwww-perl/5.76".
A search for pixfinder found this:
Pix Finder, is for quickly finding and downloading pictures off the internet by searching an offline list of links to pictures with ease. Over 50,000 picture links available in sub categories including CD Covers, Textures & Super Models. The Link engine is updated regularly & automatically downloads it from the internet to keep you fully up to date, Multi-Threading environment, i.e. multi tasking, Full HTML Help, Supports Windows 95, 98, ME, NT4 & Windows 2000. + it's all totally FREE.
What is this thing, some sort of downloadable image pirate?
Here's my take on these critters [webmasterworld.com]. There out there and there are more coming every day.
Not only those that download images, but those that - like one I had yesterday - who run helter-skelter thru your work and pilfer at rates exceeding 18 files per second!
If it isn't that, try some idiot trying to do formmail queries [google.com] at your expense. Bounced e-mails you never sent is a likely indicator.
Dozens of them. Cowbot/Naverbot [webmasterworld.com] and SiteSuckers [webmasterworld.com] are just two.
Last December I learned that MFC_Tear_Sample [webmasterworld.com] did not mean anything remotely associated with the threads of fabrics being severed by brute force and thusly another omminous thief in the wing. <chuckle> Uh, you see, I was thinking 'tear' as in those salty solutions sometimes emminated from both sides of that nasal entry area. Somehow sadness and aggression never came to me in the same thought.
Ah, perspective sure is a wonderful thing.
Some of 'em have real spiffy names too. P.Arthur [webmasterworld.com] was one. SecretBrowser007 [webmasterworld.com] certainly had that Connery flair to it. :)
Then again you have educational stuff like Crayon Crawler [webmasterworld.com] Ohhhh, now that one sounds really ferocious doesn't it?
Silverbytes asked once "
How to protect from site copiers like teleport? [webmasterworld.com]
If you ever want to read a good thread on spiders agent names [webmasterworld.com], this is the one.
I don't know how much you know, nor how much you are willing to learn. I do not know nearly half as much as most of the newbies here, but I've had my domain online since the late Nineties and it's just no fun anymore.
Read as much as you can and you'll end up reading more. It goes that way you know....
The horrible bot noted in my post came through again yesterday from the same IP# grabbing jpg's. It eats bandwidth like there's no tomorrow. I'm planning on changing the site to Unix soon... I'll have a few more tools to combat this nonsense.
The stolen bandwidth by site stealing applications, formmail queries, hotlinking to my The Wall images, just to name a few.
Seeing your work stolen by a leading collegate institution in the US.
Having UCE/SPAM sent with your @domain.com forged into the headers and getting al-l-l-l-l-l those e-mail account clogging bounces.
As an owner/operator of an educational/academic type portal, I get so frustrated when a site I've listed feels it is s-o-o-o-o-o important to change the very Title Tags and Descriptive Tags (that I use to annotate the site itself)...and they do it many times. Title Tags are meant to accurately Title the Page being published! Period. That's depressing because if I'm to accurately carry what I'm listing it is imperative the site be Titled using the Title Tags. The end result is a site listed that looks like crap. I Titles 'em like I sees 'em.
Ex: If a site says it's a .com in the Title Tags I damned well better see that same Title when I view source. Particularly when once the site is opened it isn't even a true domain, rather hosted by a free hoster and you got there via some lame re-direct. In other words, if the Title Tags don't match the actual Title of the page, Toi-sinloi (phonetic spelling). You're (figuratively speaking) NOT going to get listed with me.
If I did not have to take so many counter-measures to protect/prevent, it would be fun again.
So, before long, I'll sell the domains and find something else to do for fun.
So, before long, I'll sell the domains and find something else to do for fun.
Hey, Pedanticist, you sure you want to bail out? You'll miss it....
For my one site, when people steal my text, because it's so particular, I see it in the serps; then I contact them and tell them to give me a credit and link, which they usually do. The image-thief-bots really p*ss me off though... they steal my pictures and suck back a lot of bandwidth. My site is hosted using Windows, not Unix, so I can't even ban the IP#. I'm going to change this as soon as I have more time, (right now, I'm working on logistics for an upcoming expedition).