Google SERPs shows the 1 page excluded by robots.txt and no other!

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google SERPs shows the 1 page excluded by robots.txt and no other!

penders

3:18 pm on Jan 14, 2010 (gmt 0)

A new website with just a few pages. Sitemap file submitted (and accepted) with all the pages that should be indexed. But the only page that so far appears in Google SERPs is the 1 page that should be excluded by robots.txt (the privacy policy) and which does not appear in the Sitemap.xml file! Webmaster tools correctly shows that 1 page as being excluded by robots.txt.

That 1 page is not properly indexed by G, the title and description are omitted, but the link is still present. It's made worse by the fact that none of the 'real' pages so far appear in the SERPs.

Why does that 1 page appear? What other measures should/could have been taken in order to prevent a page (or even its link) from appearing at all in the SERPs?

Thanks.

tedster

5:56 pm on Jan 14, 2010 (gmt 0)

You mean the home page isn't showing? That sounds like a bug somewhere. Of course a google bug is always possible, but given what you are seeing with the privacy page, I wonder if your robots.txt syntax is sound - and saying what you intend it to say.

Webmaster Tools has a robots.txt checking utility - it can be quite helpful.

penders

7:39 pm on Jan 14, 2010 (gmt 0)

You mean the home page isn't showing?

Well, it wasn't, but it is now! A few hours later and the home page is now at the top of the list (doing a regular search for the company name), and the privacy page has dropped off the SERPs! Good.

However, if I do a site search (ie. site:www.example.com) and include omitted results then the privacy page (or at least a link to it) is still there. No matter really in this case, but why is it there at all?

Webmaster Tools has a robots.txt checking utility...

Yes, Webmaster Tools reports that the privacy page/URL is restricted by robots.txt. However, it is not preventing the URL from appearing in google SERPs. The title and description are not shown and the page does not appear to have been spidered by Google (no record in my stats).

TheMadScientist

7:50 pm on Jan 14, 2010 (gmt 0)

why is it there at all?

Because it's excluded in robots.txt and there's a link to it somewhere on the web.

Disallow in robots.txt is NOT the same as noindex. Disallowing keeps them from spidering the page (in theory, there have been reports they do anyway occasionally, but it could simply be a glitch or incorrect robots.txt when they do), but since there is a link to it, they consider it a page, so they show the URL only link. It's not an error. It's something they clearly state they do. If you do not want the page to show in the index (results) then use the URL removal tool, or allow them to crawl the page and put a noindex tag on it and it will disappear.

(Note, however, that while Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web.)

Removing Your Own Content from Google [google.com]

jdMorgan

8:13 pm on Jan 14, 2010 (gmt 0)

Unless you have a good number of links from other sites pointed to your domain, I'd say what you're seeing is Google's URL-only linking behavior described above, combined with insufficient time allowed for Google to find links and spider your site.

Instead of fretting about this for a month, work on content and back-links. Once Google finds a few links, things will sort themselves out.

Jim

penders

3:15 pm on Jan 15, 2010 (gmt 0)

Because it's excluded in robots.txt and there's a link to it somewhere on the web. ... Disallow in robots.txt is NOT the same as noindex. ...

No links elsewhere on the web (I'm pretty sure) - just quite a few internal ones. But yes, thanks for the explanation. It was that 1 URL getting listed before all the others that threw me initially.

Many thanks for the clarity - no fret. ;)

supafresh

3:56 pm on Jan 15, 2010 (gmt 0)

If google follows a link to your website it checks the robots.txt downloads your rules and then spiders your site.

If you have enough links to the page google might show the URL as a result in a search query, just remove it in Webmaster tools.

TheMadScientist

5:47 pm on Jan 15, 2010 (gmt 0)

No links elsewhere on the web (I'm pretty sure) - just quite a few internal ones.

Just to overly clarify:
By somewhere on the web, I meant somewhere on the web, which includes your website...

It was that 1 URL getting listed before all the others that threw me initially.

My guess is it's because there's really not much to process, score, rank, organize, attempt to determine the meaning of, etc. since it's only a URL they're listing. No HTML to parse. No language to attempt to decipher. No 'similar pages' to compare to. IOW: It's just a URL, which means it takes less time to process than a page.