Forum Moderators: open

Message Too Old, No Replies

Dom2Dom

         

Pfui

4:17 pm on Jul 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Speaking of OVH (see "C4PC") --

ns30179*.ovh.net
Dom2Dom/0.1.0.0

robots.txt? NO
Ref? Fake

Two hits to / almost an hour apart. Fake ref goes to Florent Clairambault's site w/ project info: "The program just goes from links to links to find new domain names. ..."

Am always amazed anyone thinks they're entitled do anything on any site.

superfc

8:04 pm on Jul 21, 2010 (gmt 0)

10+ Year Member



Hi Pfui,

Dom2dom is a program I made. I changed the referer ( [florent.clairambault.fr...] ) to explain people that this program isn't saving any kind of personal data (If you're ok to consider a server IP address isn't personal).

You're right about the robots.txt thing but as it doesn't store any personal information, it doesn't seem very necessary and it would considerably slow down the exploration process.

But well, you're right. Still, this is an experimental project to offer a free service. So you shouldn't only focus on the negative side. By the way, the service has already linked "www.webmasterworld.com" to "www.pubcon.com".

Best regards,

Florent

keyplyr

9:27 pm on Jul 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@ superfc

Thanks for responding.

You're right about the robots.txt thing but as it doesn't store any personal information, it doesn't seem very necessary

I think you're missing the point regarding the robots.txt standard. It is to tell robots how to behave if they want to access to our servers, not for the robot to decide whether it's going to abide by it or not.

Banned by UA and IP until robots.txt is supported.

superfc

10:06 pm on Jul 21, 2010 (gmt 0)

10+ Year Member



Hi Keyplyr,

Well banning the server & program seems like a fair move.

I understand that I need to test the /robots.txt file first and believe me I will do it. But it's a step by step process. I'm trying to have a working service as fast as possible. Once I have something to show I'll fix the little problems I created. Correctly handling the /robots.txt file could take me quite some time.

Best regards,

Florent

jmccormac

5:10 pm on Jul 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is a very inefficient way of building lists of domains hosted on the same IPs. The only benefit would be the ccTLD domains it picks up but there are easier ways of doing a full survey of the gTLDs.

Regards...jmcc

superfc

5:16 pm on Jul 28, 2010 (gmt 0)

10+ Year Member



Hi jmccormac,

What are these ways ?

Best regards,

Florent

topr8

11:52 pm on Jul 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@ superfc

thanks for dropping by and explaining your project here!

of course many people here, including myself, will try to ban all bots that don't obey robots.txt, obviously it's not personal it's just there are 1000's of bots out there sucking up our bandwidth and processing power

jmccormac

9:41 am on Jul 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@superfc
Replied by pm.

Regards...jmcc

tangor

2:38 am on Jul 31, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I understand that I need to test the /robots.txt file first and believe me I will do it


Get that part of robots.txt right and we're good to go. Honor the below FIRST and then give us a reason to allow your bot:

# Disallow all others
User-agent: *
Disallow: /


Else, look for bans (403s) for failing the above.

Pfui

8:21 am on Aug 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hard not to notice the negative when there's nothing else...

Now running from one of Florent's domains and with a new, curiously tortuous string. Alas, still no robots.txt. Still undeterred by 403s. Still log-spamming.

webingenia.com
Dom2Dom/0.1.3870.38298_2010-08-06_21:16:36

08/21 00:03:15
08/21 00:07:11
08/21 00:18:04
08/21 00:25:33

robots.txt? NO
Ref? Fake. (Another of the bot-runner's sites/bot info page.)

superfc

9:27 am on Aug 21, 2010 (gmt 0)

10+ Year Member



Hi Pfui, the "else" is that it will be a service opened to everyone. If you want in private I can give you an access to the service.

Pfui

2:36 am on Aug 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@superfc (a.k.a. Florent): Thank you for your offer but no thank you. All I'd like -- and echoing prior remarks -- is for you to at least craft your robot in accordance with the long-held "robots.txt" standard, please.

Beyond that, your bot embodies a bunch of Bad Bot Behaviors. As of this moment, Dom2Dom:

- crawls solely for your benefit;
- uses an atypical UA string;
- does not read/heed robots.txt, and
- log-spams.

Florent, you come across as a reasonable, pleasant fellow, so I'm baffled as to why your bot is increasingly, persistently egregious. If you could at least code Dom2Dom to respect the "robots.txt" standard, you'll save your resources -- and stop wasting ours. Merci.

Pfui

2:37 am on Aug 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



New string. Same old bad bot:

webingenia.com
Dom2Dom/0.1.3887.14719_2010-08-23_08:10:38

robots.txt? NO
Log spam? YES

FWIW, even a UK-based university student's recent "short term experiment" bot thankfully complied with robots.txt (& bot string structure). See: PSS-Bot [webmasterworld.com...]

superfc

10:09 pm on Sep 2, 2010 (gmt 0)

10+ Year Member



Hi Pfui,

I found the time to modify the program to do not send the referrer anymore and have a new UserAgent ("Dom2Dom/<version> (<aboutLink>)").
I'll do the robots.txt soon.

And the website should open someday but well, I'm a little bit frustrated by the current results.

Everything will be fixed soon.

You're right, even a student doing a short term experiment could make a better program. The difference is that a student has a lot more free time.

Anyway, thank your for your feedback.

keyplyr

11:10 pm on Sep 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I love this forum - LOL