What's with the robots.txt of this site?

Forum Moderators: goodroi

Message Too Old, No Replies

What's with the robots.txt of this site?

CWebguy

5:01 am on Mar 10, 2009 (gmt 0)

A little amusing.

Also somehow managed to get 2,500,000 pages indexed despite having

User-agent: *
Disallow: /

[edited by: CWebguy at 5:03 am (utc) on Mar. 10, 2009]

goodroi

11:52 am on Mar 11, 2009 (gmt 0)

brett usses the robots.txt file for "multiple" purposes. the real secret why WebmasterWorld has over 2.5 million pages indexed is because it is heavily used and well-linked.

Receptional Andy

1:16 pm on Mar 11, 2009 (gmt 0)

The notes explain the situation, and make the code available: [webmasterworld.com...]

CWebguy

5:25 pm on Mar 11, 2009 (gmt 0)

even with the disallow it still gets indexed? So


if (pagerank>5){bots do whatever they want}?

;)

[edited by: CWebguy at 5:28 pm (utc) on Mar. 11, 2009]

jdMorgan

7:00 pm on Mar 11, 2009 (gmt 0)

I think you're missing the point: robots.txt is generated on-the fly by a script here, and different user-agents see different robots.txt directives. If you are a genuine Googlebot from a valid IP address range, you don't see the "Disallow: /" at all.

If you are a browser, you get the "bot blog page."

Jim

CWebguy

7:02 pm on Mar 11, 2009 (gmt 0)

Gotcha ;)