homepage Welcome to WebmasterWorld Guest from 54.161.192.130
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Do you have robot.txt on your site?
jbayabas

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4113942 posted 12:24 am on Apr 11, 2010 (gmt 0)

What's in it? I have this:

User-agent: Mediapartners-Google*
Disallow:


Do I need to put robot.txt on my site?

 

leadegroot

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4113942 posted 10:11 am on Apr 11, 2010 (gmt 0)

If the robots.txt file is empty you don't have to have it - but I tend to use an empty one just to keep the errors log empty.

If your only entry is 'this bot has full access' then you don't need one.

(Note that the file is called robots.txt - note the 's' - if you misname the file the bots will not find it)

Edge

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4113942 posted 1:31 pm on Apr 11, 2010 (gmt 0)

User-agent: Googlebot
Disallow: /bunch-of-stuff
Allow: /

User-agent: Googlebot-Mobile
Disallow: /bunch-of-stuff
Allow: /


User-agent: Mediapartners-Google
Disallow: /bunch-of-stuff
Allow: /

User-agent: Googlebot-Image
Disallow: /

User-agent: asterias
Disallow: /

User-agent: aibot
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: asterias
Disallow: /

User-agent: BackDoorBot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: Bloodhound
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: Bullseye
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: CherryPickerSE
Disallow: /

User-agent: CherryPickerElite
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: Crescent Internet ToolPak
Disallow: /

User-agent: combine
Disallow: /

User-agent: Copernic
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Down2Web
Disallow: /

User-agent: dumbot
Disallow: /

User-agent: e-collector
Disallow: /

User-agent: Email
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: Enterprise_Search
Disallow: /

User-agent: es
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: Exabot
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: Francis
Disallow: /

User-agent: FreeFind
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: grub
Disallow: /

User-agent: grub-client
Disallow: /

User-agent: Googlebot
Disallow: /*.gif$

User-agent: Hatena Antenna
Disallow: /

User-agent: Harvest
Disallow: /

User-agent: Heritrix
Disallow: /


User-agent: hloader
Disallow: /

User-agent: htmlgobble
Disallow: /

User-agent: httplib
Disallow: /

User-agent: HTTrack
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: JavaBee
Disallow: /

User-agent: JoBo
Disallow: /

User-agent: Java
Disallow: /

User-agent: Jetbot/
Disallow: /

User-agent: Jetbot
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Larbin
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: moget
Disallow: /

User-agent: naver
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Nutch
Disallow: /

User-agent: Offline
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: Openfind data gathere
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: PerMan
Disallow: /

User-agent: PentonMediabot
Disallow: /

User-agent: psbot
Disallow: /

User-agent: ProPowerBot
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: Robofox
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: SiteVigil
Disallow: /

User-agent: Sohu
Disallow: /

User-agent: tarspider
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: Teleport Pro
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Twiceler
Disallow: /


User-agent: URL_Spider_Pro
Disallow: /

User-agent: w3mir
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: webbandit
Disallow: /

User-agent: WebCapture
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: webmirror
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: Webster
Disallow: /

User-agent: Web Downloader
Disallow: /

User-agent: WebFetcher
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: Wget
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebWasher
Disallow: /

User-agent: webvac
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: Xenu's Link Sleuth
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4113942 posted 5:43 pm on Apr 11, 2010 (gmt 0)

Edge... it's been my experience if google is allowed in they assume all the rest of their little bots are allowed, too... hence if you DON'T want Googlebot-Mobile (example) you'd have to disallow that one.

I generally whitelist (allow) a handful of useful bots and disallow all others. Makes for a much shorter robots.txt !

Edge

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4113942 posted 6:56 pm on Apr 12, 2010 (gmt 0)

Each Googlebot crawls different data/media. There are places I don't want a particualr bot and other places that are OK.

Different rules for each google bot.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved