Welcome to WebmasterWorld Guest from 54.196.232.162

Forum Moderators: goodroi

Message Too Old, No Replies

What's with the robots.txt of this site?

     
5:01 am on Mar 10, 2009 (gmt 0)

Preferred Member

5+ Year Member

joined:Aug 25, 2007
posts:531
votes: 0


A little amusing.

Also somehow managed to get 2,500,000 pages indexed despite having

User-agent: *
Disallow: /

[edited by: CWebguy at 5:03 am (utc) on Mar. 10, 2009]

11:52 am on Mar 11, 2009 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


brett usses the robots.txt file for "multiple" purposes. the real secret why WebmasterWorld has over 2.5 million pages indexed is because it is heavily used and well-linked.
1:16 pm on Mar 11, 2009 (gmt 0)

Senior Member

joined:Jan 27, 2003
posts:2534
votes: 0


The notes explain the situation, and make the code available: [webmasterworld.com...]
5:25 pm on Mar 11, 2009 (gmt 0)

Preferred Member

5+ Year Member

joined:Aug 25, 2007
posts:531
votes: 0


even with the disallow it still gets indexed? So

if (pagerank>5){bots do whatever they want}?
;)

[edited by: CWebguy at 5:28 pm (utc) on Mar. 11, 2009]

7:00 pm on Mar 11, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


I think you're missing the point: robots.txt is generated on-the fly by a script here, and different user-agents see different robots.txt directives. If you are a genuine Googlebot from a valid IP address range, you don't see the "Disallow: /" at all.

If you are a browser, you get the "bot blog page."

Jim

7:02 pm on Mar 11, 2009 (gmt 0)

Preferred Member

5+ Year Member

joined:Aug 25, 2007
posts:531
votes: 0


Gotcha ;)