grnidone

msg:19411 | 7:52 pm on Jul 6, 2000 (gmt 0) |
I wish!! I have tried to kill a page from AV for almost a month and while the spider has visited, it will not remove the 404 file not found page from the index. -G
|
seth_wilde

msg:19412 | 9:53 pm on Jul 6, 2000 (gmt 0) |
did the pages have at least %75 of the same content? I haven't experienced this myself, but it reminded me of some pretty interesting info that I've been reading about duplicate detection in vector databases.
|
engine

msg:19413 | 7:36 am on Jul 7, 2000 (gmt 0) |
Yes, the pages are meant to have the same content. I'm running an experiment on this trying to get the pages out. Bait laid, Scooter came and I'm watching closely.
|
pete

msg:19414 | 8:07 am on Jul 7, 2000 (gmt 0) |
Seth, I have been wacked a number of times regarding pages with similar content. (could be around the 75% mark). These pages have obtained fantastic positions, stuck around for approximately 15 days and then boom - Gone! Duplicate elimination tech - possibly. It would be good to know what A.V's criteria is for treating pages as sufficiently dissimilar. The REAL gold for me would be to define the exact specs for Inktomi's doorway eliminator. BUT, this would be suitable for another discussion on another day.
|
JamesR

msg:19415 | 4:24 pm on Jul 7, 2000 (gmt 0) |
Pete, there are several possibilities for duplicate elimination that at least Google, if not AltaVista and Inktomi, can employ. They have some pretty slick methods of finding mirrors and duplicate pages including the 75% thing Seth was talking about. Included is IP number matching, URL string similarities, similar link structure between two sites, and content matching. You must have unique content to have a chance at staying in these databases. Although I have seen dupes slip by, it is the exception rather than the rule and one spam notification by a competitor can bring rankings to a slam pretty fast. To key to slipping by the detectors is making them think you have differing, unique sites....
|
engine

msg:19416 | 4:44 pm on Jul 7, 2000 (gmt 0) |
JamesR, welcome. It'll be interesting to see if the simple, duplication technique I'm using gets the pages/site booted out of AV et al. It's AV I'm targeting to measure the "sensitivity." It's a pretty crude technique but the result will show how close to the line to go, or not go as the case may be. I actually want the pages out without deleting them from the site. I'll report back when the test is complete, but, I expect to have to wait until the index is updated. Coming back to the question, did anyone notice which spider is the culprit?
|
seth_wilde

msg:19417 | 5:19 pm on Jul 7, 2000 (gmt 0) |
Thanks James, Did you happen to save that report? I tried going to it today and the site wouldn't pull up. (hopefully it's just temporary server problems.
|
JamesR

msg:19418 | 5:51 pm on Jul 7, 2000 (gmt 0) |
Seth, I did not save it, only bookmarked the URL and couldn't reach it yesterday also. I hope they didn't figure something out either and the site is down due to system error and not someone pulling the plug. Had more extraction work to do on that one. Seems to me I may have seen that report somewhere else...if I find it I will email you. Edited by: JamesR
|
pete

msg:19419 | 6:40 pm on Jul 9, 2000 (gmt 0) |
James R - thanks for the info - if you get hold of that report will you include me in that mail Thanks Pete
|
boyleman

msg:19420 | 2:48 am on Jul 19, 2000 (gmt 0) |
Did you guys ever find that report that seth was talking about, the one about the 75% thing? I'm interested in that as well. Thanks!
|
|