homepage Welcome to WebmasterWorld Guest from 174.129.130.202
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
A list of american search engines.
search engine list. American.
Michel Samuel




msg:3357169
 9:46 am on Jun 3, 2007 (gmt 0)

I need to generate a new robots.txt file and I pretty much need to deny american search engine traffic.

Does anyone have a list?

 

jbinbpt




msg:3357199
 11:04 am on Jun 3, 2007 (gmt 0)

A Google search brings up many lists. Could you explain your reasons for foe excluding American SE's.

goodroi




msg:3357282
 3:02 pm on Jun 3, 2007 (gmt 0)

technically google.com is an american search engine, but it can also generate significant international traffic

i think you might want to instead block american ips so you dont get visitors from america. let us know what you are trying to achieve to make sure we are give the right suggestions.

Michel Samuel




msg:3357831
 10:18 am on Jun 4, 2007 (gmt 0)

Actually I have blocked the entire world save for my target market.

My reason for excluding the american search engines is to better define where I get my traffic from. To be honest their are too many bots and even with the 404 hits on my site are too much traffic. I can't process american orders and I can't ship my product to the united states.

So the I lower my expenses on bandwidth and raise my overall profit margin.

I thought this would be good opportunity to greate a projet of a master search engine list by country.

This method people could further define what traffic they want by where it comes from.

goodroi




msg:3358331
 7:31 pm on Jun 4, 2007 (gmt 0)

This method people could further define what traffic they want by where it comes from.

i understand what you are saying and it is helpful to have the option to block traffic from different geographic areas. some products can not be exported for a variety of reasons.

the unique issue with banning the american marketplace is that many search engines started in the us. thus the .com domain is intended for generic searches and not only specific to a country like the .co.uk, .de, .fr etc. domains. for example google.com and yahoo.com are mostly american users but they do have significant international traffic since those domains are the generic domain.

my vote would be for blocking by ip. imho this would provide better geographic control.

Michel Samuel




msg:3359839
 9:53 am on Jun 6, 2007 (gmt 0)

I have a geoip script that blockÚs my sites.
it works on a permission instead of deny basisis.

(But it is not perfect.)

What I have done is located the IP addresses to the bots that I do not want and deny them in my .htaccess file. In my custome 403 page I use the " option explicit response.Status " and also send a 404.

the bandwidth falle has been very noticable.

I am only currently allowing the bots for these search engines
HenriLeRobotMirago, seek and voila.

Profit margin has increased.

Next project I must compile a list of well known proxy servers in Canada, France, Switzerland, luxemburg and belgium. Once I do that i think I will have finished.

webdoctor




msg:3361391
 7:49 pm on Jun 7, 2007 (gmt 0)

google.com and yahoo.com are mostly american users but they do have significant international traffic since those domains are the generic domain.

Does google.com really have *significant* international traffic?

I've tried visiting www.google.com when visting various European countries and without exception I've been redirected to the relevant www.google.TLD (.co.uk / .de / .fr / .it / .se / ...)

Do you have any statistics on non-US users of www.google.com?

goodroi




msg:3362087
 1:22 pm on Jun 8, 2007 (gmt 0)

sorry webdoctor, i dont know of any detailed public stats on google.com users.

just to be clarify when i said significant i did not mean a majority of traffic. google.com and yahoo.com are not composed of 100% american traffic.

incrediBILL




msg:3375262
 12:49 am on Jun 22, 2007 (gmt 0)

I need to generate a new robots.txt file and I pretty much need to deny american search engine traffic.

You really don't need a list to block.

After your list of allowed robots just add this and all the rest will be blocked:

User-agent: *
Disallow: /

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved