homepage Welcome to WebmasterWorld Guest from 54.81.80.46
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Do you have robot.txt on your site?
jbayabas




msg:4113700
 12:24 am on Apr 11, 2010 (gmt 0)

What's in it? I have this:

User-agent: Mediapartners-Google*
Disallow:


Do I need to put robot.txt on my site?

 

leadegroot




msg:4113819
 10:11 am on Apr 11, 2010 (gmt 0)

If the robots.txt file is empty you don't have to have it - but I tend to use an empty one just to keep the errors log empty.

If your only entry is 'this bot has full access' then you don't need one.

(Note that the file is called robots.txt - note the 's' - if you misname the file the bots will not find it)

Edge




msg:4113852
 1:31 pm on Apr 11, 2010 (gmt 0)

User-agent: Googlebot
Disallow: /bunch-of-stuff
Allow: /

User-agent: Googlebot-Mobile
Disallow: /bunch-of-stuff
Allow: /


User-agent: Mediapartners-Google
Disallow: /bunch-of-stuff
Allow: /

User-agent: Googlebot-Image
Disallow: /

User-agent: asterias
Disallow: /

User-agent: aibot
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: asterias
Disallow: /

User-agent: BackDoorBot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: Bloodhound
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: Bullseye
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: CherryPickerSE
Disallow: /

User-agent: CherryPickerElite
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: Crescent Internet ToolPak
Disallow: /

User-agent: combine
Disallow: /

User-agent: Copernic
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Down2Web
Disallow: /

User-agent: dumbot
Disallow: /

User-agent: e-collector
Disallow: /

User-agent: Email
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: Enterprise_Search
Disallow: /

User-agent: es
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: Exabot
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: Francis
Disallow: /

User-agent: FreeFind
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: grub
Disallow: /

User-agent: grub-client
Disallow: /

User-agent: Googlebot
Disallow: /*.gif$

User-agent: Hatena Antenna
Disallow: /

User-agent: Harvest
Disallow: /

User-agent: Heritrix
Disallow: /


User-agent: hloader
Disallow: /

User-agent: htmlgobble
Disallow: /

User-agent: httplib
Disallow: /

User-agent: HTTrack
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: JavaBee
Disallow: /

User-agent: JoBo
Disallow: /

User-agent: Java
Disallow: /

User-agent: Jetbot/
Disallow: /

User-agent: Jetbot
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Larbin
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: moget
Disallow: /

User-agent: naver
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Nutch
Disallow: /

User-agent: Offline
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: Openfind data gathere
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: PerMan
Disallow: /

User-agent: PentonMediabot
Disallow: /

User-agent: psbot
Disallow: /

User-agent: ProPowerBot
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: Robofox
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: SiteVigil
Disallow: /

User-agent: Sohu
Disallow: /

User-agent: tarspider
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: Teleport Pro
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Twiceler
Disallow: /


User-agent: URL_Spider_Pro
Disallow: /

User-agent: w3mir
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: webbandit
Disallow: /

User-agent: WebCapture
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: webmirror
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: Webster
Disallow: /

User-agent: Web Downloader
Disallow: /

User-agent: WebFetcher
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: Wget
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebWasher
Disallow: /

User-agent: webvac
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: Xenu's Link Sleuth
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

tangor




msg:4113929
 5:43 pm on Apr 11, 2010 (gmt 0)

Edge... it's been my experience if google is allowed in they assume all the rest of their little bots are allowed, too... hence if you DON'T want Googlebot-Mobile (example) you'd have to disallow that one.

I generally whitelist (allow) a handful of useful bots and disallow all others. Makes for a much shorter robots.txt !

Edge




msg:4114502
 6:56 pm on Apr 12, 2010 (gmt 0)

Each Googlebot crawls different data/media. There are places I don't want a particualr bot and other places that are OK.

Different rules for each google bot.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved