Convergence - 8:43 pm on Jun 30, 2013 (gmt 0)
you won't see the pages - only the urls.
:) Semantics, I meant URLs/links, search results. (it was 3AM when I posted)
and when you say "the pages are gone" are you sure they aren't filtered out?
try adding &filter=0 to the google search url and see if those urls reappear.
Next time we look for blocked directories in the Google, we'll give the &filter=0 a go.
i don't think you want googlebot requesting robots.txt first for every resource requested.
iirc googlebot caches robots.txt for up to 24 hours.
No, we wouldn't. Point I was trying to make was the Googlebot doesn't check robots.txt very often. It will go from internal link to internal link and so on...
what was the elapsed time for those 147 requests of robots.txt?
From 1st of June until I posted.
the part i see missing is where you have verified that googlebot has actually requested a url in the /merchant/ directory and if so that you checked the IP of the visitor to verify that it is in fact googlebot and not a spoofed user agent.
Yes. Have verified. Watched it live. Saw it with my one good eye. Checked the referrer, and IP. It's the Googlebot.
it has been mentioned numerous times in this thread that the noindex directive is irrelevant when you have excluded googlebot from crawling that url.
You can't set a per-page quota. Well-behaved small robots pick up robots.txt at the start of each separate visit. Large robots-- and you can hardly get bigger than the googlebot-- read robots.txt, spread it around to their fellow googlebots, and hold it for up to 24 hours.
:) Yes, fully aware - see above response to phranque.
As I understand it, "nofollow" doesn't mean "pretend you haven't seen this link". It just means "I make no claims about the quality of the material I'm linking to".
That is one "definition". Also, "it's a paid link" or "don't pass on page juice", or "don't follow" - depending on what bot we're talking about.
From Matt Cutts:
How does Google handle nofollowed links?
In general, we don't follow them. This means that Google does not transfer PageRank or anchor text across these links. Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap. Also, it's important to note that other search engines may handle nofollow in slightly different ways.