homepage Welcome to WebmasterWorld Guest from 54.237.54.83
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Marketing and Biz Dev / Cloaking
Forum Library, Charter, Moderator: open

Cloaking Forum

    
Cloaking: Characteristics By Search Engine
Air




msg:678497
 6:35 am on Mar 5, 2001 (gmt 0)

Although not a definitive list, here is a breakdown of search engine characteristics useful for having cloaked pages live peacefully in the respective indexes of the following search engines.

Your additions, corrections, and suggestions are welcome.

Search Engine: Alta Vista
Stock UA: Scooter
URL: [altavista.com]

Sensitivity To Spam:
Very sensitive to spam. Pages will be removed or entire domain will be banned.

Indexing Criteria:

  • Content served to Alta Vista spiders must be very relevant to the content shown regular visitors.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Will spider pages submitted through Add URL page. Major crawls occur approx. twice a year.

    IP Address Stability:
    Infrequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • "Translate" link carries AltaVista IP Address
  • Will spider using non stock User Agent.
  • Will use IP Addresses from EXODUS for spidering at times.
  • Tends to introduce new IP Addresses at major crawl times.

    Search Engine: Excite
    Stock UA: ArchitextSpider
    URL: [excite.com]

    Sensitivity To Spam:
    Somewhat sensitive to spam. Slow to remove pages in violation. Will ban domains in some spam cases.

    Indexing Criteria:

  • Content served to Excite spiders should be relevant to the content shown regular visitors.
  • No penalty for cloaking is applied if this criteria is met.
  • Appears to favor sites with a non shared IP.

    Spidering Frequency:
    Sporadic spidering.

    IP Address Stability:
    Infrequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • May spider out of @home IP range (not confirmed)
  • Will use IP Addresses from EXODUS for spidering at times.

    Search Engine: Northern Light
    Stock UA: Gulliver/x.x
    URL: [northernlight.com]

    Sensitivity To Spam:
    Not sensitive.

    Indexing Criteria:

  • Content served to Northern Light spiders should be relevant to the content shown regular visitors. Northern Light will tolerate (read may be necessary) a much higher degree of keyword repetition and overall keyword density than other spidering engines.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Infrequent spidering.

    IP Address Stability:
    Rarely introduces new IP Addresses for it's spiders.

    Decloaking Hazards:

  • Will use IP Addresses from EXODUS for spidering at times.

    Search Engine: Inktomi
    Stock UA: Slurp
    URL: [hotbot.com] [canada.com] [anzwers.com] (<--Partial list)

    Sensitivity To Spam:
    Sensitive. Will bury pages that show different content per visit, since it's spiders visit often, the same content should be shown it's spiders on short successive requests by it's multitude of spiders.

    Indexing Criteria:

  • Content served to Inktomi spiders should be relevant to the content shown regular visitors. Inktomi will tolerate some degree of higher keyword repetition and overall keyword density, but not irrelevant content. Will accept cloaked pages through paid placement as well as free submission.
  • Spider is occasionally tripped up by sites with shared IP's or bad DNS entries.
  • No penalty for cloaking is applied if these criteria are met.
  • Currently burying pages that are submitted through the free ADD URL page.

    Spidering Frequency:
    Annoyingly frequent. Will generally only spider submitted pages, rarely following links. Multiple spiders will usually arrive requesting the same page, with each spider using a different IP Address and a mix of User Agents, the index page tends to be requested most.

    IP Address Stability:
    Frequent introduction of new IP Addresses for it's spiders, although lately the frequency is declining.

    Decloaking Hazards:

  • Multiple spider visits requesting the same page within short time periods.
  • Will use IP Addresses from EXODUS for spidering at times.
  • Huge number of IP's are assigned to it's spiders, making IP Address list maintenance more difficult.
  • Uses a variety of User Agents including the standard browser agent Mozilla/3.0

    Search Engine: Lycos
    Stock UA: Lycos_Spider_(T-Rex)
    URL: [lycos.com]

    Sensitivity To Spam:
    Moderately sensitive.

    Indexing Criteria:

  • Content served to Lycos spiders should be relevant to the content shown regular visitors.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Sporadic spidering.

    IP Address Stability:
    Infrequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • Will use IP Addresses from EXODUS for spidering at times.

    Search Engine: FAST/Alltheweb
    Stock UA: FAST-WebCrawler/x.x
    URL: [alltheweb.com]

    Sensitivity To Spam:
    Moderately sensitive, but becoming more sensitive to it.

    Indexing Criteria:

  • Content served to Fast/Alltheweb spiders should be relevant to the content shown regular visitors.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Infrequent but heavy spidering, this spider is becoming better behaved in this regard.

    IP Address Stability:
    Infrequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • Will use IP Addresses from EXODUS for spidering at times.

    Search Engine: Google
    Stock UA: Googlebot/x.x
    URL: [google.com]

    Sensitivity To Spam:
    Sensitive(Moderately and climbing), becoming more sensitive to it.

    Indexing Criteria:

  • Content served to Google spiders should be relevant to the content shown regular visitors.
  • Cloaking on non promotional domains tends to do better than straight promotional domains. Likely due to the boost from a real domain appearing in directory listings.
  • All cloaked pages should have links to each other as a "real" site would.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Adequate spidering, and fairly thorough. Initial visit is usually light, with a through follow up spidering after that. Tends to repeat this pattern monthly. Very thorough spider.

    IP Address Stability:
    Frequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • Will use IP Addresses from EXODUS for spidering at times.
  • Will use non stock user agent.
  • Caches the spidered page by default unless prevented by a "NOCACHE" tag.
  • Not a decloaking hazard but be forewarned, appears responsive to user complaints for pages that violate it's spam guidelines.

    Directories: Yahoo! - LookSmart -- Open Directory Project
    Stock UA: Not Relevant
    URL: [yahoo.com] -- [looksmart.com] -- [dmoz.org]

    Sensitivity To Spam:
    Very sensitive - sites are human reviewed.

    Indexing Criteria:

  • These are not spidering engines, therefore no indexing takes place. They should review the pages you show regular visitors, follow their rules exactly when submitting a site for review.
  • No penalty for cloaking, due to the nature of review, directories are largely oblivious to cloaking.

    Spidering Frequency:
    Spiders from directories (that use spiders) perform link checking duties only.

    IP Address Stability:
    Not Relevant.

    Decloaking Hazards:
    Not Relevant.


  •  

    2_much




    msg:678498
     7:31 am on Mar 5, 2001 (gmt 0)

    "Your additions, corrections, and suggestions are welcome. "

    Does "WOW" count as a suggestion??? Great info. Air!

    tedster




    msg:678499
     7:47 am on Mar 5, 2001 (gmt 0)

    Thanks so much for all the work you are sharing here.

    >> Inktomi: Will generally only spider submitted pages, rarely following links

    How does this relate to the current discussion [webmasterworld.com] about letting Ink first find new pages on its own, through inbound links?

    WebGuerrilla




    msg:678500
     9:24 am on Mar 5, 2001 (gmt 0)

    Great stuff Air. It all is spot on with my experiences.

    Xoc




    msg:678501
     9:31 am on Mar 5, 2001 (gmt 0)

    Nice post!

    I have to disagree about Northern Light spidering infrequently though. I see Gulliver on a pretty regular basis. They have come by and indexed pages on one site every day this month.

    makemetop




    msg:678502
     9:39 am on Mar 5, 2001 (gmt 0)

    Many thanks Air. Invaluable information as I move in to cloaking some sites (with a little trepidation). Taking pride of place on the wall in front of me!

    Air




    msg:678503
     3:08 pm on Mar 5, 2001 (gmt 0)

    Tedster,
    >How does this relate to the current discussion about letting Ink
    >first find new pages on its own, through inbound links?

    It is the same issue, cloaking doesn't affect it, it may accentuate it because some don't give as much thought to links on/to their cloaked pages as with "real" pages.


    Xoc,
    >I have to disagree about Northern Light spidering infrequently though.

    Yeah I've heard from some that Gulliver is very active on their site, while most seem to see it only on occasion. Seems to spider more often if your content changes frequently. Is this your experience also?

    Xoc




    msg:678504
     5:12 pm on Mar 5, 2001 (gmt 0)

    A week ago, I implemented some new technology that causes the Last-Modified http header to always return 18 hours ago. Before that, there was no Last-Modified date getting reported because IIS doesn't normally return Last-Modified dates on Active Server Pages, because it can contain dynamic content. One advantage of my solution is that the last-modified field of a SERP will always tell me exactly when it spidered the page.

    Maybe that makes a difference. I don't make changes to the existing content all that often, although I'm adding new pages frequently. I don't submit to Northern Light, so that doesn't explain it.

    Fast is by all the time, too, but I submit there.

    This is on three different web sites.

    Air




    msg:678505
     5:36 pm on Mar 5, 2001 (gmt 0)

    That starts to confirm it for me, NL must use the date as a trigger for more frequent spidering. When you consider who they are really serving it makes sense that they would do it that way. To some extent other engines do this too, but they seem to key on page size and/or content change.

    Wouldn't mind hearing back if you see any changes in spidering from other engines, might help determine which are sensitive to Last Modified date alone as a trigger for more frequent spidering.

    If as you said, your content doesn't change all that often, you may just have a controlled experiment on your hands.

    startup




    msg:678506
     2:01 pm on Mar 6, 2001 (gmt 0)


    Fast:
    Spidering Frequency:
    "Infrequent but heavy spidering,"

    Have you been able to determine if the heavy spidering is a result of submitting many pages, or is the spider following links?

    Xoc




    msg:678507
     4:35 pm on Mar 6, 2001 (gmt 0)

    Not really. I submit every page on my sites over time. But Fast has spidered to pages that I haven't submitted yet, so they are following links.

    startup




    msg:678508
     4:57 pm on Mar 6, 2001 (gmt 0)

    What I am trying to determine is. Will the spider follow the link on its first visit or does it return shortly after to follow the links, or does it wait until the next spidering cycle to crawl the links.
    I am in the process of building at new site and I want to submit the pages as they are built, but I don't want the spider to get a 404 after it follows a new link to a page I am working on.


    Brett_Tabke




    msg:678509
     4:58 pm on Mar 6, 2001 (gmt 0)

    Fast follows all links. They have hit _every_ link I have anywhere on the planet. It just makes me think they are building up to a big release that is hubs/authorities link vector (aka: pagerank) based.

    Nice post Air.

    han solo




    msg:678510
     5:46 pm on Mar 6, 2001 (gmt 0)

    First, I'll add: very, very nice post Air. This goes into my research file for further analysis...

    And I'd like to add to what Brett said about Fast: I have seen some ranks that I know are derived from the hubs/authorities model.

    The only question I have been wondering, is that for topic A, would you be able to rank better with authority on A, or hub for A?

    Cheers,

    Han Solo

    Air




    msg:678511
     12:22 am on Mar 7, 2001 (gmt 0)

    >Have you been able to determine if the heavy spidering is a result
    >of submitting many pages, or is the spider following links?

    I guess BT has answered it, but just to add one more yes, it follows links relentlessly.

    Brett_Tabke




    msg:678512
     5:57 am on Mar 8, 2001 (gmt 0)

    I looked around over at Danney's place for awhile but couldn't find that article on Northern Light. About two years ago, DS had a story on Northern Lights usage of page Dates.

    grnidone




    msg:678513
     8:15 pm on May 15, 2001 (gmt 0)

    This is a great thread and well worth a flag.

    I am wondering how often these attitudes change, or if they do.

    -G

    Hunter




    msg:678514
     12:41 am on May 16, 2001 (gmt 0)

    *Bam*..or does that just work while cooking?

    Brett_Tabke




    msg:678515
     6:14 am on May 17, 2001 (gmt 0)

    They seem to change like the wind grnidone. About the only constant has been av's relentless anit-seo efforts. It's really strange too, because av could benefit the most of any engine by working with the seo community. While Ink, Yahoo, Looksmart, Goto, and About.com, take away from the SEO community (eg: digging into our pockets), only Av, Google, and Fast could work with the community for their advantage. Google and Fast appear to have taken those types of approaches, while AV has all but declared public war on seo.

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Marketing and Biz Dev / Cloaking
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved