homepage Welcome to WebmasterWorld Guest from 23.22.173.58
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Blocked through robots.txt . will the page be crawled ever
getxb

5+ Year Member



 
Msg#: 3429641 posted 6:14 am on Aug 23, 2007 (gmt 0)

While exchanging links with a site I found a particular sub category is totally blocked through robots.txt. The webmaster said the pages are blocked since we wanted to stop PR Leakage (though I didnt find any external links from those pages from where the PR might get leaked).

But he also assured me that he is submitting those pages in web directories and search engines to gain PR.

Now my question is:

1. If the pages are blocked then why is he submitting the pages in web directories and search engines?
2. If he place my link in one of those pages will I get any benefit in the long run?

Am totally confused. As I know he is a pretty experienced webmaster.

Please suggest.

Regards,
getxb

 

new_seo

5+ Year Member



 
Msg#: 3429641 posted 9:02 am on Aug 23, 2007 (gmt 0)

A blocked url (by robots.txt) can be crawl able from external links and it will show in the search result when you search by site:domain.com in Google.
But no data will be available,only the URL will show in the search result.

My knowledge about this ends here,don't have any idea whether google will pass some value to that blocked URL or not. :)
But I don't think back links from these kind of pages can be helpful.

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3429641 posted 12:24 pm on Aug 23, 2007 (gmt 0)

1. If the pages are blocked then why is he submitting the pages in web directories and search engines?
I am going to assume the pages are blocked to all bots. If that is the case there is no good reason to submit that page to search engines. In the past fake metatags were used to confuse the competition and they blindly started copying the fake metatags. Just because someone does something doesn't mean it is a good thing.

2. If he place my link in one of those pages will I get any benefit in the long run?
The only benefit you will get is direct traffic from his visitors or branding for your domain name. Since the page with your link is blocked to search engines they will never see your link on that page and therefore you will get no link popularity love in the search engines.

This is a good way to cheat when doing reciprocal links. Many webmasters will not check your robots.txt file and will link to you simply because they visually see your link back to them. In the eyes of the search engine it is a one way link to your site since they can not index the page on your site with the link back to the other people.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3429641 posted 9:52 am on Aug 24, 2007 (gmt 0)

>> A blocked url (by robots.txt) can be crawlable <<

No. The robots.txt excusion says to not fetch the page at all. Good bots should obey that (bad bots ignore it and other methods are needed to stop them accessing the page).

However, if there are links to that page from some other site, then Google can still show that page as a URL-only entry in the SERPs.

Yahoo goes one further for URL-only entries and "invents" a title using the anchor text of one of the incoming links.

If you don't want even the URL in the SERPs then you instead need a <meta name="robots" robots="noindex"> tag on the page so that the page is fetched and the bot is then told to not index it all. In that case you must not exlude the URL in robots.txt.

eimee

5+ Year Member



 
Msg#: 3429641 posted 6:02 am on Aug 31, 2007 (gmt 0)

A robots.txt file provides restrictions to search engine robots (known as "bots") that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.
You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file.

If he is submitting (blocked page by robots.txt) page to search engines we won't get reasonable effect on that. because spider cant see this page only the visitors can see and you may get direct traffic only not for any long term benefit.
Let me know any confusion on that

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved