Welcome to WebmasterWorld Guest from 3.234.214.113

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Using Robots.txt to Exclude Duplicate Database pages

     
2:09 pm on Nov 11, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 3, 2004
posts:69
votes: 0


I noticed many variations of the database pages are indexed and usually turn supplemental. Would it be a good idea to add these by type to Disallow in robots.txt?

Urls like
Disallow: /*reply_to_ad.cfm*
Disallow: /*sort_by=*
Disallow: /*my_chk_list*
Disallow: /*&lst_start=*
Disallow: /*session_key*

These allow result in supplements and are probably a drag on my ranking for the most important pages. Good idea? Any more?

11:51 pm on Nov 11, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I think this is a solid approach - and use it on several websites. I would rather CHOOSE which url Google indexes for a given bit of content, and in some cases, even which sorted or filtered version of the data gets spidered. Why let googlebot run in circles when you can help it to go straight for the good stuff?
12:25 am on Nov 12, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Yes, do add those, but ONLY in the User-agent: Googlebot section - other bots do not yet understand wildcard URLs.

Also be aware that if you have a User-agent: Googlebot section, that ALL instructions for Google must go in that section. The User-agent: * section is completely ignored by Google when the User-agent: Googlebot section is included.

12:44 am on Nov 12, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Actually, Yahoo announced last Friday that slurp now supports wildcards in robots.txt, too.

[webmasterworld.com...]

5:03 pm on Nov 12, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


OK, I can see the robots.txt file getting longer and longer with repetition per agent.
10:06 pm on Nov 12, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 7, 2005
posts:636
votes: 0


I added some wildcard exclusions to my robots file a week or two ago to manage my database driven forum installation. I am already seeing some very promising improvements in traffic. I agree with Tedster... this appears to be a solid method. It is also compliant, white hat and uncomplicated (once you've figured exactly which URL's you should be indexing).

[edited by: Asia_Expat at 10:07 pm (utc) on Nov. 12, 2006]

10:15 pm on Nov 12, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Read back a few months to see how I also managed to get a 40 000 thread forum (one that exposed more than 10 URLs for every thread, as well as exposing almost another half a million "you are not logged in" pages) reindexed as 40 000 threads with one URL per thread, and a few thousand thread index pages.

It has taken just over a year for everything to fall into place. Previously it had about 750 000 indexed URLs, and very many were marked as Supplemental.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members