homepage Welcome to WebmasterWorld Guest from 54.243.17.133
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to block this URLs
vreelmistee




msg:3690215
 6:46 am on Jul 4, 2008 (gmt 0)

Hi All,

i have around 1000+ pages indexed in google for my site. Some affiliate campaign also going on for my site. Few of affiliate URL indexed in google from this affiliate campaign.

I already implement this code in my robots.txt

-----------------------------------
Disallow: /?agent_camp=
-------------------------

but still this affiliate URL crawled.
Can anyone suggest how to stop crawler to crawl this type of URL.

http://www.example.com/?utm_source=example.com&utm_medium=cpc
http://www.example.com/index.html?agent_add=15975345&agent_bann=12458475
http://www.example.com/index.html?agent_add=15975345&agent_bann=12495751
http://www.example.com/index.html?agent_add=15975345&agent_bann=20423001
http://www.example.com/index.html?agent_add=16643001&agent_bann=213%2086001
http://www.example.com/index.html?agent_add=16643001&agent_bann=213%2087001
http://www.example.com/index.html?agent_add=16643001&agent_bann=213%2088001
http://www.example.com/index.html?agent_add=21562001&agent_bann=21584001
http://www.example.com/index.html?agent_add=21880001&agent_bann=21881001

[edited by: encyclo at 4:30 pm (utc) on July 5, 2008]
[edit reason] switched to example.com, fixed formatting [/edit]

 

Receptional Andy




msg:3691063
 1:59 pm on Jul 5, 2008 (gmt 0)

Robots exclusion is prefix matching, and the major engine support wildcards, so you can use lines like the below to block your affiliate URLS:

User-agent: *
Disallow: /*?agent_add
Disallow: /*?utm_source

Some would advise specifying the user-agent rather than blocking all spiders, since some don't support wildcards. I don't bother since the smaller engines are not significant and I believe most of them will treat the asterisk literally in any case.

g1smd




msg:3696323
 2:26 pm on Jul 11, 2008 (gmt 0)

Your rule does not block the URLs you want to block.

You need to "match all from the left" until you have specified enough to cover all the URLs that need to be blocked, without still matching any URLs that need to be indexed.

See also: [webmasterworld.com...] (half way down the page).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved