Forum Moderators: DixonJones
The site is a placeholder w/an email addy and silly disclaimer.
GoDaddy WHOIS shows the admin contact is a dude from the 2wrongs site - Cyveillance apparently acquired 2wrong in August of 2001:
[dc.internet.com...]
So is Dumbot really another Cyveillance effort to mine sites, now that their usual IP blocks are on all the ban lists?
I have an exclude list so if any of you want me to remove your sites please just send my a list of hosts you don't want spidered. If anyone has any questions feel free to email me at info@dumbfind.com.
I think you are getting backlash because any legitimate bot should follow the robots.txt exclusion protocol - if they don't, well they are just dumb.
I put up a little info page that the user-agent now points to:
[dumbfind.com...]
The Contractor,
please let me know what your web address is so I can track down the issue. It may just be a caching thing as since my spider doesn't visit very often it doesn't grab robots.txt very often. I am probably going to purge the entire cache today so that the angry hordes of hatemongers congregating at this site don't come and kill me.
The reason I come from different ip's is that I have 7 dsl lines running into my house. Which is where I work. For myself. Alone. Why 7 dsl lines do you ask? Because it is the cheapest source of bandwidth available to me. 6 of the lines are Verizon lines with dynamic ip's, one is a Covad line with a static ip. Again, if I wanted to hide I WOULDN'T PUT MY WEBSITE ADDRESS IN THE USER-AGENT STRING.
<!-- saved from url=(0040)http://www.donkeycake.com/gunk/dumbfind/ -->
Also, what's all that about the mailto: being hidden behind the "Hey!" on your frontpage instead of the "email us"?
However, I do agree with points 1,2,5,6,7,8 (although not 3 and 4) of your manifesto :)
The Contractor,
please let me know what your web address is so I can track down the issue.
Sorry, not trying to be an idiot, but I am not giving out any website addresses. You need to run your bot on your sites and see why they do not adhere to robots.txt - should be a simple issue if you are the developer.
You never did say which of my two examples are correct for robots.txt. I have tried one or the other on multiple sites and neither seem to work. I am not against anyone building a bot, but they need to adhere to robots.txt to be legit for the same reason you have DSL - why do I want to burn up gigabytes of bandwidth and pay for it for bots I don't want (no offense).
Until you give the robots.txt syntax, get your bot to adhere to it, and explain it on your site to those that don't have a clue how to block it - you will never be legit IMHO.
Spotting a new 'bot in the server logs triggers anxiety in some Webmasters because of the massive abuse prevalent on the 'net today. This anxiety is further heightened if a new robot has a bug that causes it to misinterpret robots.txt, or even when the robots.txt file itself is to blame. Anything that makes it more difficult for a Webmaster to find out the intent of the new 'bot only exacerbates this anxiety, until in the end some Webmasters adopt an attitude of "If I haven't heard of you, stay off my site." This is unfortunate... I keep thinking of the first time I saw "Googlebot" in my logs and thought, "What is that? It's sure an unappealing name..."
Can we please give Dumbfounder a break here, and treat him with the respect that every WebmasterWorld member is entitled to? Personal attacks are a violation of the WebmasterWorld terms of service, BTW.
Dumbfounder, please take a look at the pages Google has put up for Webmasters. Note the depth of their explanations and reassurances. They are completely forthright about their robot and their quality policies. Google didn't do this just for fun; This totally-open approach is what is needed today to allay the fears you see expressed in this thread. Designing a robot and a ranking algorithm is one thing; Managing "Webmaster relations" is something else -- and a subject not covered in the technical manuals or in CS classes.
Anyway, let's all be polite here, and not rush to judgement. Each Webmaster may decide what to do with each user-agent that visits his/her site, but I'd argue that such judgements should be made rationally based on observed behaviour, and not on fear and unreasonable doubt. Business decisions should be made calmly and cooly.
Jim
If you want to be taken seriously and are working on a project indexing sites - the more info you can give on who you are and why you are crawling the better. There are enough site scrapers/downloaders out there where if someone doesn't say what they are doing and who they are - I block them instantly (at least I try with robots.txt and .htaccess).
Either is fine. I do a case-insensitive search for "dumbot" anywhere on any line that starts with "user-agent" (again, case insensitive). Then any disallows found after that, up until the next user-agent line, are attributed. Other than the pattern match, it is the same exact code I use for the user-agent "*". You are the only person to ever complain that I have not adhered to robots.txt and I have been doing this for 8 months now, which leads me to think that the problem is not a large one. The absolute best way to track down the problem is to use real examples from real websites, I cannot account for the infinite possibilities found out on the web. I think you are doing a disservice to others by refusing to give out your web address. You can email it to me if you wish. info@dumbfind.com. Or you can call me 703-683-3077, or you can drive to my house and hand me a letter (just do a whois on dumbfind.com).
thanks jd, how about if I put a link to google's bot information page at the end of mine? I would feel better about that than simply plagiarizing their content.
>I'm not accusing you of trying to hide dumbfounder - just of not being very professional.
ok, phew. I am definitely not professional, as evidenced by the fact that noone is paying me to do this.
><!-- saved from url=(0040)http://www.donkeycake.com/gunk/dumbfind/ -->
mmmmmmmmm.....donkey cake....
>Also, what's all that about the mailto: being hidden behind the "Hey!" on your frontpage instead of the "email us"?
my cousin designed the page. I try not to constrain his artistic nature so I left it as-is. I will make it bigger just to appease you though.
>However, I do agree with points 1,2,5,6,7,8 (although not 3 and 4) of your manifesto :)
does that mean if you think the developers are indeed funny that you won't support their idea? everytime you say that a code-writing clown dies somewhere.