homepage Welcome to WebmasterWorld Guest from 54.196.159.11
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
Forum Library, Charter, Moderator: open

Paid Inclusion Engines and Topics Forum

  posting off  
Pages dropped after spider visit?
Has anyone else noticed this?
engine




msg:19410
 6:03 pm on Jul 6, 2000 (gmt 0)

Has anyone noticed pages dropping out of av index after this spider visits?
brillo.pa.alta-vista.net

 

grnidone




msg:19411
 7:52 pm on Jul 6, 2000 (gmt 0)

I wish!! I have tried to kill a page from AV for almost a month and while the spider has visited, it will not remove the 404 file not found page from the index.

-G

seth_wilde




msg:19412
 9:53 pm on Jul 6, 2000 (gmt 0)

did the pages have at least %75 of the same content? I haven't experienced this myself, but it reminded me of some pretty interesting info that I've been reading about duplicate detection in vector databases.

engine




msg:19413
 7:36 am on Jul 7, 2000 (gmt 0)

Yes, the pages are meant to have the same content. I'm running an experiment on this trying to get the pages out. Bait laid, Scooter came and I'm watching closely.

pete




msg:19414
 8:07 am on Jul 7, 2000 (gmt 0)

Seth, I have been wacked a number of times regarding pages with similar content. (could be around the 75% mark). These pages have obtained fantastic positions, stuck around for approximately 15 days and then boom - Gone!

Duplicate elimination tech - possibly. It would be good to know what A.V's criteria is for treating pages as sufficiently dissimilar. The REAL gold for me would be to define the exact specs for Inktomi's doorway eliminator.

BUT, this would be suitable for another discussion on another day.

JamesR




msg:19415
 4:24 pm on Jul 7, 2000 (gmt 0)

Pete, there are several possibilities for duplicate elimination that at least Google, if not AltaVista and Inktomi, can employ.
They have some pretty slick methods of finding mirrors and duplicate pages including the 75% thing Seth was talking about. Included is IP number matching, URL string similarities, similar link structure between two sites, and content matching. You must have unique content to have a chance at staying in these databases. Although I have seen dupes slip by, it is the exception rather than the rule and one spam notification by a competitor can bring rankings to a slam pretty fast. To key to slipping by the detectors is making them think you have differing, unique sites....

engine




msg:19416
 4:44 pm on Jul 7, 2000 (gmt 0)

JamesR, welcome.
It'll be interesting to see if the simple, duplication technique I'm using gets the pages/site booted out of AV et al. It's AV I'm targeting to measure the "sensitivity."
It's a pretty crude technique but the result will show how close to the line to go, or not go as the case may be.
I actually want the pages out without deleting them from the site.

I'll report back when the test is complete, but, I expect to have to wait until the index is updated.

Coming back to the question, did anyone notice which spider is the culprit?

seth_wilde




msg:19417
 5:19 pm on Jul 7, 2000 (gmt 0)

Thanks James, Did you happen to save that report? I tried going to it today and the site wouldn't pull up. (hopefully it's just temporary server problems.

JamesR




msg:19418
 5:51 pm on Jul 7, 2000 (gmt 0)

Seth, I did not save it, only bookmarked the URL and couldn't reach it yesterday also. I hope they didn't figure something out either and the site is down due to system error and not someone pulling the plug. Had more extraction work to do on that one. Seems to me I may have seen that report somewhere else...if I find it I will email you.

Edited by: JamesR

pete




msg:19419
 6:40 pm on Jul 9, 2000 (gmt 0)

James R - thanks for the info - if you get hold of that report will you include me in that mail

Thanks
Pete

boyleman




msg:19420
 2:48 am on Jul 19, 2000 (gmt 0)

Did you guys ever find that report that seth was talking about, the one about the 75% thing? I'm interested in that as well. Thanks!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved