Forum Moderators: open

Message Too Old, No Replies

Which spiders do you want crawling your site?

         

ktd121

8:55 pm on Jun 28, 2003 (gmt 0)

10+ Year Member



What are some of spiders you want to see crawl your site? Which ones are the important ones?

wilderness

9:55 pm on Jun 28, 2003 (gmt 0)

jdMorgan

6:15 am on Jun 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ktd121,

Here are the ones I allow without much in the way of checking on them:

Ask Jeeves
ExactSeek
FAST-WebCrawler
FAST FirstPage retriever
Fluffy the spider
Gigabot
Googlebot
Googlebot-Image
ia_archiver
Libby
Lycos_Spider
MARTINI
Mercator
MSNBOT
NutchOrg
Openfind data gatherer
polybot
Pompos
Robozilla
Scooter
Scrubby
Slurp
surfsafely
Teoma
Teradex Mapper
THUNDERSTONE
Vagabondo
Zealbot
Zyborg

Which are most important depends on your market and location, but Googlebot, FAST, Slurp(Inktomi), Scooter(AltaVista), and Ask Jeeves/Teoma are the must-haves for many.

But if you ask 100 webmasters, you'll get a hundred different answers about "which do you want?"

Jim

ktd121

4:31 am on Jun 30, 2003 (gmt 0)

10+ Year Member



There are actually spiders you DON'T want crawling your site?

wilderness

5:21 am on Jun 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are actually spiders you DON'T want crawling your site?

There are actually people you DON'T want in your home? ;)

ktd
I realize your realtively new here.
If you read the link I provided in the second mail of this thread "A Close to Perfect htaccess" you will see that there are more than a few spiders which are not desirable by many webmasters.

In addition this thread offers some explanations:
[webmasterworld.com...]

As does an earlier reply of mine to sanuk in another thread tonight.

Don

ktd121

5:56 am on Jun 30, 2003 (gmt 0)

10+ Year Member



Why wouldn't you want these spiders crawling your site? What harm do they do? Can they actually hurt your rankings?

jdMorgan

6:22 am on Jun 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ktd,

> Why wouldn't you want these spiders crawling your site? What harm do they do? Can they actually hurt your rankings?

Unless they place an extreme load on your server, no, they can't generally hurt your rankings. However, they might:

  • Burn up your bandwidth budget needlessly for no benefit. (How does 20 pages per second on a dynamic site with 2,500,000 possible pages sound? It would only take 35 hours...)
  • Slow down your server for legitimate visitors, making them go elsewhere.
  • Fill your log files with errors while attempting to fetch non-existent or secured files.
  • Pollute your log file analysis.
  • Download all of your photos - not good if you're a professional photographer trying to sell them.
  • Download all your files and sell them to someone who competes with you... Who then uses them to hurt your rankings.

    There are hundreds of other potential problems. Some of the banned user-agents are *very* nasty - either by intent, or because they are very badly-coded.

    I allow Google, Slurp, and the few others listed above. Other than that, I want to know who they are and what they want before they come in.

    However, it is an established tenet of this forum that you may do differently on your site if you wish. Your site may be in a completely different market segment than mine, so who am I to tell you what to do? However, the title of this thread was: Which spiders do you want... and I answered for me.

    HTH,
    Jim

  • ktd121

    7:14 am on Jun 30, 2003 (gmt 0)

    10+ Year Member



    LinkChecker
    Alexa (IA Archiver)
    Road Runner: The ImageScape Robot
    Inktomi Slurp
    IBM_Planetwide
    Scooter (AltaVista)
    MSIECrawler
    Googlebot (Google)
    Fast-Webcrawler (AllTheWeb)
    Fluid Dynamics Search Engine robot
    WISENutbot (Looksmart)
    Unknown robot (identified by 'crawl')
    BaiDuSpider
    Jeeves
    The World Wide Web Worm
    larbin
    Unknown robot (identified by 'robot')
    The Python Robot

    Those are the robots that are shown crawling my site in my site stats. Do you guys see anything i should be worried about?

    jdMorgan

    5:32 pm on Jun 30, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    ktd121,

    larbin is one I absolutely won't allow. Try the WebmasterWorld site search, using those user-agent names as your search phrase, or just look through the back threads on this forum. Some of those others are "iffy" depending on what kind of site you have.

    Some of us are permissive, and some have an almost zero-tolerance policy for 'bots which do not identify themselves properly, or do not fetch and obey robots.txt. There is also a nice free script posted here for detecting and automatically blocking rogue 'bots - try a search for "Ban malicious visitors Perl Script."

    Jim