Blocked through robots.txt . will the page be crawled ever

Forum Moderators: goodroi

Message Too Old, No Replies

Blocked through robots.txt . will the page be crawled ever

getxb

6:14 am on Aug 23, 2007 (gmt 0)

While exchanging links with a site I found a particular sub category is totally blocked through robots.txt. The webmaster said the pages are blocked since we wanted to stop PR Leakage (though I didnt find any external links from those pages from where the PR might get leaked).

But he also assured me that he is submitting those pages in web directories and search engines to gain PR.

Now my question is:

1. If the pages are blocked then why is he submitting the pages in web directories and search engines?
2. If he place my link in one of those pages will I get any benefit in the long run?

Am totally confused. As I know he is a pretty experienced webmaster.

Please suggest.

Regards,
getxb

new_seo

9:02 am on Aug 23, 2007 (gmt 0)

A blocked url (by robots.txt) can be crawl able from external links and it will show in the search result when you search by site:domain.com in Google.
But no data will be available,only the URL will show in the search result.

My knowledge about this ends here,don't have any idea whether google will pass some value to that blocked URL or not. :)
But I don't think back links from these kind of pages can be helpful.

goodroi

12:24 pm on Aug 23, 2007 (gmt 0)

1. If the pages are blocked then why is he submitting the pages in web directories and search engines?

I am going to assume the pages are blocked to all bots. If that is the case there is no good reason to submit that page to search engines. In the past fake metatags were used to confuse the competition and they blindly started copying the fake metatags. Just because someone does something doesn't mean it is a good thing.

2. If he place my link in one of those pages will I get any benefit in the long run?

The only benefit you will get is direct traffic from his visitors or branding for your domain name. Since the page with your link is blocked to search engines they will never see your link on that page and therefore you will get no link popularity love in the search engines.

This is a good way to cheat when doing reciprocal links. Many webmasters will not check your robots.txt file and will link to you simply because they visually see your link back to them. In the eyes of the search engine it is a one way link to your site since they can not index the page on your site with the link back to the other people.

g1smd

9:52 am on Aug 24, 2007 (gmt 0)

>> A blocked url (by robots.txt) can be crawlable <<

No. The robots.txt excusion says to not fetch the page at all. Good bots should obey that (bad bots ignore it and other methods are needed to stop them accessing the page).

However, if there are links to that page from some other site, then Google can still show that page as a URL-only entry in the SERPs.

Yahoo goes one further for URL-only entries and "invents" a title using the anchor text of one of the incoming links.

If you don't want even the URL in the SERPs then you instead need a <meta name="robots" robots="noindex"> tag on the page so that the page is fetched and the bot is then told to not index it all. In that case you must not exlude the URL in robots.txt.

eimee

6:02 am on Aug 31, 2007 (gmt 0)

A robots.txt file provides restrictions to search engine robots (known as "bots") that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.
You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file.

If he is submitting (blocked page by robots.txt) page to search engines we won't get reasonable effect on that. because spider cant see this page only the visitors can see and you may get direct traffic only not for any long term benefit.
Let me know any confusion on that