Forum Moderators: open

Message Too Old, No Replies

freedir, tags2dir

         

dstiles

9:29 pm on Feb 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This was originally going to be a request for opinions on these two "search engiens". As I wrote and investigated it morphed into the following:

I originally allowed scans by freedir.co.uk, although its owned by a South African company OrderWeb Software despite being a UK domain. It recently blocked itself and on looking for a reason I discovered it was on an IP adjacent to tags2dir, which I'd blocked as a single IP but recently blocked as a server farm 206.196.96.0 - 206.196.127.255 (InLink Communications Company).

Freedir comes back every day or so looking for sites, working on the general pattern...

example.com
www.example.com

It tries for 16 domains (default page) on the same server, making a total of 32 hits in as many seconds.

If that's all it's going to do I'm not worried providing it is a useful service - in fact I'd originally passed the crawler as ok.

Looking at the site now my feeling is that it is no longer a desirable one (if it ever was). There is no obvious ToC - in fact nothing at all apart from lists of URLs. What its reason for living is I can't determine but I wonder if it's corporate scraping and analysis.

I can no longer recall why I blocked Tags2dir but an entry in this forum suggests it was scraping and that was probably my own experience. It belongs to the same company as freedir - OrderWeb Software of South Africa.

freedir.co.uk and tags2dir.com are within half a dozen IPs of each other at 206.196.111.nnn (InLink Communications).

Does anyone have anything to add to this that will either endorse or contradict my conclusions?

wilderness

3:00 am on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



dstiles,
I've this Inlink Comm IP:
206.196.96.0 - 206.196.127.255
Denied since 2004 with nine successive requests for the same page, and using the requested pages as the refer. ALL within THREE SECONDS.

In April 2007, a colo with the IP:
206.196.111.. . . . two-hundred and one
Under the umbrella of metatagsdir.

There's hordes of these directories and I don't see in actual benefit in allowing any of them.
They simply detract upon SERPS for actual web pages.

Course, I feel the same way for Wiki and all the under sub-orgs that allow users to create/duplicate pages restricting traffic to within the Wiki structure, when an actual organizational website existed previously.
It simply detracts SERPS and visitors away from their intended destination.

Don

dstiles

3:52 am on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, Wilderness.

I've had that range blocked for a while now - can't recall the date but for a few months. As I said, I originally allowed freedir through but I think it might have come from another IP range then. Can't be sure without checking back several months' logs.

The tags2dir web site seems turned off now, by the way. Not sure if tag2dir was a part of this - I began my posting by including it but I couldn't find any real evidence about it.

I wonder if OrderWeb Software may be scraping the net, changing the directories name/domain occasionally and then dumping them for some reason. I can't imagine why as very few people would naturally block bots - we in this forum are amongst a very few! :(

wilderness

4:09 am on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



we in this forum are amongst a very few

It's surprising how many webmasters are either unaware of the location of their visitor logs, or even the mere existence of visitor logs ;)

incrediBILL

8:05 am on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Most webmasters simply don't know any better, it's a total lack of knowledge.

The serious online businesses may care about this stuff, but joe schmoe with his little hobby site, blog or photo gallery really doesn't care as it's just a hobby to them and not worth the effort.

Even if they knew where the log files were, most wouldn't know what they were looking at anyway.

Even if they did know what they were looking at, most wouldn't have the skills or tools to stop the problem.

Even if they knew what they were looking at and had the skill and tools to solve many of the problems, they probably wouldn't keep an eye on how the things they blocked were being used and re-purposed over time and could, in the long term, cause more harm than good.

For the most part, it's probably best the majority are blissfully ignorant of the situation and ignore it for their own safety.

[edited by: incrediBILL at 8:09 am (utc) on Feb. 15, 2009]

dstiles

10:00 pm on Feb 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On the other hand, it would help the cause of scum-eradication if server owners did something about it. :)

rrun

7:53 pm on Mar 28, 2009 (gmt 0)

10+ Year Member



I am new to this site, and a relative newbie as a webmaster.

I have reviewed the forums, but i am wondering if there is a good list of known bad IP's. I get some really weird looks sometimes, from China especially, and also some others that I am not sure are search engines or what.

Thanks

wilderness

10:14 pm on Mar 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are hordes of list, however none are recent or kept updated.

Each webmaster has different goals for their content and must decide on their own what bots, regions and providers are either beneficial or detrimental to their own site (s).

You may begin with there threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]

wilderness

10:18 pm on Mar 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BTW, as is the tradition here, Welcome to Webmaster World.

rrun

10:49 pm on Mar 28, 2009 (gmt 0)

10+ Year Member



thanks for the resource links! i'm on it!

[edited by: tedster at 12:13 am (utc) on Mar. 29, 2009]
[edit reason] no personal urls, please [/edit]