Welcome to WebmasterWorld Guest from 54.226.194.180

Forum Moderators: open

Message Too Old, No Replies

Cloaking: Characteristics By Search Engine

     

Air

6:35 am on Mar 5, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Although not a definitive list, here is a breakdown of search engine characteristics useful for having cloaked pages live peacefully in the respective indexes of the following search engines.

Your additions, corrections, and suggestions are welcome.

Search Engine: Alta Vista
Stock UA: Scooter
URL: [altavista.com]

Sensitivity To Spam:
Very sensitive to spam. Pages will be removed or entire domain will be banned.

Indexing Criteria:

  • Content served to Alta Vista spiders must be very relevant to the content shown regular visitors.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Will spider pages submitted through Add URL page. Major crawls occur approx. twice a year.

    IP Address Stability:
    Infrequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • "Translate" link carries AltaVista IP Address
  • Will spider using non stock User Agent.
  • Will use IP Addresses from EXODUS for spidering at times.
  • Tends to introduce new IP Addresses at major crawl times.

    Search Engine: Excite
    Stock UA: ArchitextSpider
    URL: [excite.com]

    Sensitivity To Spam:
    Somewhat sensitive to spam. Slow to remove pages in violation. Will ban domains in some spam cases.

    Indexing Criteria:

  • Content served to Excite spiders should be relevant to the content shown regular visitors.
  • No penalty for cloaking is applied if this criteria is met.
  • Appears to favor sites with a non shared IP.

    Spidering Frequency:
    Sporadic spidering.

    IP Address Stability:
    Infrequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • May spider out of @home IP range (not confirmed)
  • Will use IP Addresses from EXODUS for spidering at times.

    Search Engine: Northern Light
    Stock UA: Gulliver/x.x
    URL: [northernlight.com]

    Sensitivity To Spam:
    Not sensitive.

    Indexing Criteria:

  • Content served to Northern Light spiders should be relevant to the content shown regular visitors. Northern Light will tolerate (read may be necessary) a much higher degree of keyword repetition and overall keyword density than other spidering engines.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Infrequent spidering.

    IP Address Stability:
    Rarely introduces new IP Addresses for it's spiders.

    Decloaking Hazards:

  • Will use IP Addresses from EXODUS for spidering at times.

    Search Engine: Inktomi
    Stock UA: Slurp
    URL: [hotbot.com] [canada.com] [anzwers.com] (<--Partial list)

    Sensitivity To Spam:
    Sensitive. Will bury pages that show different content per visit, since it's spiders visit often, the same content should be shown it's spiders on short successive requests by it's multitude of spiders.

    Indexing Criteria:

  • Content served to Inktomi spiders should be relevant to the content shown regular visitors. Inktomi will tolerate some degree of higher keyword repetition and overall keyword density, but not irrelevant content. Will accept cloaked pages through paid placement as well as free submission.
  • Spider is occasionally tripped up by sites with shared IP's or bad DNS entries.
  • No penalty for cloaking is applied if these criteria are met.
  • Currently burying pages that are submitted through the free ADD URL page.

    Spidering Frequency:
    Annoyingly frequent. Will generally only spider submitted pages, rarely following links. Multiple spiders will usually arrive requesting the same page, with each spider using a different IP Address and a mix of User Agents, the index page tends to be requested most.

    IP Address Stability:
    Frequent introduction of new IP Addresses for it's spiders, although lately the frequency is declining.

    Decloaking Hazards:

  • Multiple spider visits requesting the same page within short time periods.
  • Will use IP Addresses from EXODUS for spidering at times.
  • Huge number of IP's are assigned to it's spiders, making IP Address list maintenance more difficult.
  • Uses a variety of User Agents including the standard browser agent Mozilla/3.0

    Search Engine: Lycos
    Stock UA: Lycos_Spider_(T-Rex)
    URL: [lycos.com]

    Sensitivity To Spam:
    Moderately sensitive.

    Indexing Criteria:

  • Content served to Lycos spiders should be relevant to the content shown regular visitors.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Sporadic spidering.

    IP Address Stability:
    Infrequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • Will use IP Addresses from EXODUS for spidering at times.

    Search Engine: FAST/Alltheweb
    Stock UA: FAST-WebCrawler/x.x
    URL: [alltheweb.com]

    Sensitivity To Spam:
    Moderately sensitive, but becoming more sensitive to it.

    Indexing Criteria:

  • Content served to Fast/Alltheweb spiders should be relevant to the content shown regular visitors.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Infrequent but heavy spidering, this spider is becoming better behaved in this regard.

    IP Address Stability:
    Infrequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • Will use IP Addresses from EXODUS for spidering at times.

    Search Engine: Google
    Stock UA: Googlebot/x.x
    URL: [google.com]

    Sensitivity To Spam:
    Sensitive(Moderately and climbing), becoming more sensitive to it.

    Indexing Criteria:

  • Content served to Google spiders should be relevant to the content shown regular visitors.
  • Cloaking on non promotional domains tends to do better than straight promotional domains. Likely due to the boost from a real domain appearing in directory listings.
  • All cloaked pages should have links to each other as a "real" site would.
  • No penalty for cloaking is applied if this criteria is met.

    Spidering Frequency:
    Adequate spidering, and fairly thorough. Initial visit is usually light, with a through follow up spidering after that. Tends to repeat this pattern monthly. Very thorough spider.

    IP Address Stability:
    Frequent introduction of new IP Addresses for it's spiders.

    Decloaking Hazards:

  • Will use IP Addresses from EXODUS for spidering at times.
  • Will use non stock user agent.
  • Caches the spidered page by default unless prevented by a "NOCACHE" tag.
  • Not a decloaking hazard but be forewarned, appears responsive to user complaints for pages that violate it's spam guidelines.

    Directories: Yahoo! - LookSmart -- Open Directory Project
    Stock UA: Not Relevant
    URL: [yahoo.com] -- [looksmart.com] -- [dmoz.org]

    Sensitivity To Spam:
    Very sensitive - sites are human reviewed.

    Indexing Criteria:

  • These are not spidering engines, therefore no indexing takes place. They should review the pages you show regular visitors, follow their rules exactly when submitting a site for review.
  • No penalty for cloaking, due to the nature of review, directories are largely oblivious to cloaking.

    Spidering Frequency:
    Spiders from directories (that use spiders) perform link checking duties only.

    IP Address Stability:
    Not Relevant.

    Decloaking Hazards:
    Not Relevant.

  • 2_much

    7:31 am on Mar 5, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    "Your additions, corrections, and suggestions are welcome. "

    Does "WOW" count as a suggestion??? Great info. Air!

    tedster

    7:47 am on Mar 5, 2001 (gmt 0)

    WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



    Thanks so much for all the work you are sharing here.

    >> Inktomi: Will generally only spider submitted pages, rarely following links

    How does this relate to the current discussion [webmasterworld.com] about letting Ink first find new pages on its own, through inbound links?

    WebGuerrilla

    9:24 am on Mar 5, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Great stuff Air. It all is spot on with my experiences.

    Xoc

    9:31 am on Mar 5, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Nice post!

    I have to disagree about Northern Light spidering infrequently though. I see Gulliver on a pretty regular basis. They have come by and indexed pages on one site every day this month.

    makemetop

    9:39 am on Mar 5, 2001 (gmt 0)



    Many thanks Air. Invaluable information as I move in to cloaking some sites (with a little trepidation). Taking pride of place on the wall in front of me!

    Air

    3:08 pm on Mar 5, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Tedster,
    >How does this relate to the current discussion about letting Ink
    >first find new pages on its own, through inbound links?

    It is the same issue, cloaking doesn't affect it, it may accentuate it because some don't give as much thought to links on/to their cloaked pages as with "real" pages.


    Xoc,
    >I have to disagree about Northern Light spidering infrequently though.

    Yeah I've heard from some that Gulliver is very active on their site, while most seem to see it only on occasion. Seems to spider more often if your content changes frequently. Is this your experience also?

    Xoc

    5:12 pm on Mar 5, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    A week ago, I implemented some new technology that causes the Last-Modified http header to always return 18 hours ago. Before that, there was no Last-Modified date getting reported because IIS doesn't normally return Last-Modified dates on Active Server Pages, because it can contain dynamic content. One advantage of my solution is that the last-modified field of a SERP will always tell me exactly when it spidered the page.

    Maybe that makes a difference. I don't make changes to the existing content all that often, although I'm adding new pages frequently. I don't submit to Northern Light, so that doesn't explain it.

    Fast is by all the time, too, but I submit there.

    This is on three different web sites.

    Air

    5:36 pm on Mar 5, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    That starts to confirm it for me, NL must use the date as a trigger for more frequent spidering. When you consider who they are really serving it makes sense that they would do it that way. To some extent other engines do this too, but they seem to key on page size and/or content change.

    Wouldn't mind hearing back if you see any changes in spidering from other engines, might help determine which are sensitive to Last Modified date alone as a trigger for more frequent spidering.

    If as you said, your content doesn't change all that often, you may just have a controlled experiment on your hands.

    startup

    2:01 pm on Mar 6, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member




    Fast:
    Spidering Frequency:
    "Infrequent but heavy spidering,"

    Have you been able to determine if the heavy spidering is a result of submitting many pages, or is the spider following links?

    Xoc

    4:35 pm on Mar 6, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Not really. I submit every page on my sites over time. But Fast has spidered to pages that I haven't submitted yet, so they are following links.

    startup

    4:57 pm on Mar 6, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    What I am trying to determine is. Will the spider follow the link on its first visit or does it return shortly after to follow the links, or does it wait until the next spidering cycle to crawl the links.
    I am in the process of building at new site and I want to submit the pages as they are built, but I don't want the spider to get a 404 after it follows a new link to a page I am working on.

    Brett_Tabke

    4:58 pm on Mar 6, 2001 (gmt 0)

    WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



    Fast follows all links. They have hit _every_ link I have anywhere on the planet. It just makes me think they are building up to a big release that is hubs/authorities link vector (aka: pagerank) based.

    Nice post Air.

    han solo

    5:46 pm on Mar 6, 2001 (gmt 0)

    10+ Year Member



    First, I'll add: very, very nice post Air. This goes into my research file for further analysis...

    And I'd like to add to what Brett said about Fast: I have seen some ranks that I know are derived from the hubs/authorities model.

    The only question I have been wondering, is that for topic A, would you be able to rank better with authority on A, or hub for A?

    Cheers,

    Han Solo

    Air

    12:22 am on Mar 7, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    >Have you been able to determine if the heavy spidering is a result
    >of submitting many pages, or is the spider following links?

    I guess BT has answered it, but just to add one more yes, it follows links relentlessly.

    Brett_Tabke

    5:57 am on Mar 8, 2001 (gmt 0)

    WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



    I looked around over at Danney's place for awhile but couldn't find that article on Northern Light. About two years ago, DS had a story on Northern Lights usage of page Dates.

    grnidone

    8:15 pm on May 15, 2001 (gmt 0)



    This is a great thread and well worth a flag.

    I am wondering how often these attitudes change, or if they do.

    -G

    Hunter

    12:41 am on May 16, 2001 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    *Bam*..or does that just work while cooking?

    Brett_Tabke

    6:14 am on May 17, 2001 (gmt 0)

    WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



    They seem to change like the wind grnidone. About the only constant has been av's relentless anit-seo efforts. It's really strange too, because av could benefit the most of any engine by working with the seo community. While Ink, Yahoo, Looksmart, Goto, and About.com, take away from the SEO community (eg: digging into our pockets), only Av, Google, and Fast could work with the community for their advantage. Google and Fast appear to have taken those types of approaches, while AV has all but declared public war on seo.
     

    Featured Threads

    Hot Threads This Week

    Hot Threads This Month