Forum Moderators: phranque

Message Too Old, No Replies

roll-your-own dupe filter? for testing

is there an app out there that measures similarity

         

httpwebwitch

8:43 pm on Mar 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Like most wms, I'd like to stay on the happy side of the dupe filter.

What I'd like is something I can use during development to spider my site and generate a similarity report. I have similar tools for checking broken links and reporting anchor text reputation, but nothing I can use to measure similarity.

Any suggestions?

httpwebwitch

7:00 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I assume from the lack of responses that there is no such thing...

PizdusInc

1:29 am on Mar 15, 2005 (gmt 0)

10+ Year Member



Not sure if it is what you are looking for, but HTMLDiff (HTMLDiff.com) is a program that compares html files for similarities/differences.

httpwebwitch

6:35 am on Mar 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hmm... well not really
that tool will compare files in two folders and show differences, more like version control

I'm looking for something in a web spider that generates a similarity matrix before and after parsing out HTML tags.

In essence, it would replicate the Google dupe filter and alert a WM if any pages don't have enough unique content to be indexed separately.

PizdusInc

1:23 pm on Mar 15, 2005 (gmt 0)

10+ Year Member



Hmm, I see what you are saying... I have not seen anything like that around so you will probably have to code something like that yourself. Unless someone else in the forum could help out.

Sorry...

httpwebwitch

3:49 pm on Mar 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



no don't be sorry, it's all right. I'll make my own, if the effort seems worthwhile.

Hollywood

4:28 pm on Mar 15, 2005 (gmt 0)

10+ Year Member Top Contributors Of The Month



httpwebwitch

If you get it done let me know, I think it would be very worth the time.

;0)

PizdusInc

4:35 pm on Mar 15, 2005 (gmt 0)

10+ Year Member



It would definitely be a very useful tool to have... If you end up coding it yourself, are you planning on making it available to the web?

httpwebwitch

9:18 pm on Mar 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



for a "modest" fee... mmmuaaahahaha ha ha (cough cough)