| 7:52 pm on Jul 6, 2000 (gmt 0)|
I wish!! I have tried to kill a page from AV for almost a month and while the spider has visited, it will not remove the 404 file not found page from the index.
| 9:53 pm on Jul 6, 2000 (gmt 0)|
did the pages have at least %75 of the same content? I haven't experienced this myself, but it reminded me of some pretty interesting info that I've been reading about duplicate detection in vector databases.
| 7:36 am on Jul 7, 2000 (gmt 0)|
Yes, the pages are meant to have the same content. I'm running an experiment on this trying to get the pages out. Bait laid, Scooter came and I'm watching closely.
| 8:07 am on Jul 7, 2000 (gmt 0)|
Seth, I have been wacked a number of times regarding pages with similar content. (could be around the 75% mark). These pages have obtained fantastic positions, stuck around for approximately 15 days and then boom - Gone!
Duplicate elimination tech - possibly. It would be good to know what A.V's criteria is for treating pages as sufficiently dissimilar. The REAL gold for me would be to define the exact specs for Inktomi's doorway eliminator.
BUT, this would be suitable for another discussion on another day.
| 4:24 pm on Jul 7, 2000 (gmt 0)|
Pete, there are several possibilities for duplicate elimination that at least Google, if not AltaVista and Inktomi, can employ.
They have some pretty slick methods of finding mirrors and duplicate pages including the 75% thing Seth was talking about. Included is IP number matching, URL string similarities, similar link structure between two sites, and content matching. You must have unique content to have a chance at staying in these databases. Although I have seen dupes slip by, it is the exception rather than the rule and one spam notification by a competitor can bring rankings to a slam pretty fast. To key to slipping by the detectors is making them think you have differing, unique sites....
| 4:44 pm on Jul 7, 2000 (gmt 0)|
It'll be interesting to see if the simple, duplication technique I'm using gets the pages/site booted out of AV et al. It's AV I'm targeting to measure the "sensitivity."
It's a pretty crude technique but the result will show how close to the line to go, or not go as the case may be.
I actually want the pages out without deleting them from the site.
I'll report back when the test is complete, but, I expect to have to wait until the index is updated.
Coming back to the question, did anyone notice which spider is the culprit?
| 5:19 pm on Jul 7, 2000 (gmt 0)|
Thanks James, Did you happen to save that report? I tried going to it today and the site wouldn't pull up. (hopefully it's just temporary server problems.
| 5:51 pm on Jul 7, 2000 (gmt 0)|
Seth, I did not save it, only bookmarked the URL and couldn't reach it yesterday also. I hope they didn't figure something out either and the site is down due to system error and not someone pulling the plug. Had more extraction work to do on that one. Seems to me I may have seen that report somewhere else...if I find it I will email you.
Edited by: JamesR
| 6:40 pm on Jul 9, 2000 (gmt 0)|
James R - thanks for the info - if you get hold of that report will you include me in that mail
| 2:48 am on Jul 19, 2000 (gmt 0)|
Did you guys ever find that report that seth was talking about, the one about the 75% thing? I'm interested in that as well. Thanks!