Welcome to WebmasterWorld Guest from 54.167.5.15

Forum Moderators: open

Message Too Old, No Replies

Pages dropped after spider visit?

Has anyone else noticed this?

     

engine

6:03 pm on Jul 6, 2000 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Has anyone noticed pages dropping out of av index after this spider visits?
brillo.pa.alta-vista.net

grnidone

7:52 pm on Jul 6, 2000 (gmt 0)



I wish!! I have tried to kill a page from AV for almost a month and while the spider has visited, it will not remove the 404 file not found page from the index.

-G

seth_wilde

9:53 pm on Jul 6, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



did the pages have at least %75 of the same content? I haven't experienced this myself, but it reminded me of some pretty interesting info that I've been reading about duplicate detection in vector databases.

engine

7:36 am on Jul 7, 2000 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes, the pages are meant to have the same content. I'm running an experiment on this trying to get the pages out. Bait laid, Scooter came and I'm watching closely.

pete

8:07 am on Jul 7, 2000 (gmt 0)

10+ Year Member



Seth, I have been wacked a number of times regarding pages with similar content. (could be around the 75% mark). These pages have obtained fantastic positions, stuck around for approximately 15 days and then boom - Gone!

Duplicate elimination tech - possibly. It would be good to know what A.V's criteria is for treating pages as sufficiently dissimilar. The REAL gold for me would be to define the exact specs for Inktomi's doorway eliminator.

BUT, this would be suitable for another discussion on another day.

JamesR

4:24 pm on Jul 7, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Pete, there are several possibilities for duplicate elimination that at least Google, if not AltaVista and Inktomi, can employ.
They have some pretty slick methods of finding mirrors and duplicate pages including the 75% thing Seth was talking about. Included is IP number matching, URL string similarities, similar link structure between two sites, and content matching. You must have unique content to have a chance at staying in these databases. Although I have seen dupes slip by, it is the exception rather than the rule and one spam notification by a competitor can bring rankings to a slam pretty fast. To key to slipping by the detectors is making them think you have differing, unique sites....

engine

4:44 pm on Jul 7, 2000 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



JamesR, welcome.
It'll be interesting to see if the simple, duplication technique I'm using gets the pages/site booted out of AV et al. It's AV I'm targeting to measure the "sensitivity."
It's a pretty crude technique but the result will show how close to the line to go, or not go as the case may be.
I actually want the pages out without deleting them from the site.

I'll report back when the test is complete, but, I expect to have to wait until the index is updated.

Coming back to the question, did anyone notice which spider is the culprit?

seth_wilde

5:19 pm on Jul 7, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks James, Did you happen to save that report? I tried going to it today and the site wouldn't pull up. (hopefully it's just temporary server problems.

JamesR

5:51 pm on Jul 7, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seth, I did not save it, only bookmarked the URL and couldn't reach it yesterday also. I hope they didn't figure something out either and the site is down due to system error and not someone pulling the plug. Had more extraction work to do on that one. Seems to me I may have seen that report somewhere else...if I find it I will email you.

Edited by: JamesR

pete

6:40 pm on Jul 9, 2000 (gmt 0)

10+ Year Member



James R - thanks for the info - if you get hold of that report will you include me in that mail

Thanks
Pete

boyleman

2:48 am on Jul 19, 2000 (gmt 0)



Did you guys ever find that report that seth was talking about, the one about the 75% thing? I'm interested in that as well. Thanks!
 

Featured Threads

Hot Threads This Week

Hot Threads This Month