Also somehow managed to get 2,500,000 pages indexed despite having
User-agent: * Disallow: /
[edited by: CWebguy at 5:03 am (utc) on Mar. 10, 2009]
goodroi
11:52 am on Mar 11, 2009 (gmt 0)
brett usses the robots.txt file for "multiple" purposes. the real secret why WebmasterWorld has over 2.5 million pages indexed is because it is heavily used and well-linked.
Receptional Andy
1:16 pm on Mar 11, 2009 (gmt 0)
The notes explain the situation, and make the code available: [webmasterworld.com...]
CWebguy
5:25 pm on Mar 11, 2009 (gmt 0)
even with the disallow it still gets indexed? So
if (pagerank>5){bots do whatever they want}?
;)
[edited by: CWebguy at 5:28 pm (utc) on Mar. 11, 2009]
jdMorgan
7:00 pm on Mar 11, 2009 (gmt 0)
I think you're missing the point: robots.txt is generated on-the fly by a script here, and different user-agents see different robots.txt directives. If you are a genuine Googlebot from a valid IP address range, you don't see the "Disallow: /" at all.
If you are a browser, you get the "bot blog page."