homepage Welcome to WebmasterWorld Guest from 54.197.171.109
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt - Working or not ?
glennk




msg:3820960
 12:43 pm on Jan 7, 2009 (gmt 0)

About a month ago I came to the forum to ask for advice on robots.txt to disallow wap2 versions of pages in my forum. The advice was my whole robots.text was wrongly set up.

I changed the robots.txt as advised but am seeing weird results now in google serps. I would like to know if the advice I was given is correct and if this robots.txt will work properly for google. Also I would like to be sure that I have not dissallowed any of the forum topics

<snip>

Secondly can anyone explain what is happening with my google site search results as described below

example.co.uk returns around 500 pages
example.co.uk/forum returns around 25,000 pages including ones which webmaster tools say are disallowed

However searching for non competitive keywords from any of these 25000 pages do not show up in google.

Before messing with the robots.txt I used to rank well on non competative-local keywords on about 1000 forum pages.

Im pulling my hair out so any help would really be appreciated.

[edited by: goodroi at 1:44 pm (utc) on Jan. 7, 2009]
[edit reason] please no specific urls [/edit]

 

phranque




msg:3820974
 1:04 pm on Jan 7, 2009 (gmt 0)

do you have a webmaster tools account?
there is a tool to test if a url is allowed by your robots.txt

bava_seo




msg:3820986
 1:14 pm on Jan 7, 2009 (gmt 0)

check webmaster tool to know whether u have placed robot.txt file correctly or not

glennk




msg:3821148
 4:29 pm on Jan 7, 2009 (gmt 0)

Hi there,

Yes I have webmaster tools and it shows lists of pages that are disallowed. However many of those disallowed pages are showing amongst the 25,000 in the site:domain/forum check

Would any of you guys be willing to cast an eye over my robots.txt just to put my mind at rest that all is well and I havent blocked any of the site topics by mistake.

Are you able to explain the discrepancy between site:domain which shows far less than 1000 pages and site:domain/forum showing 25000 pages ?

goodroi




msg:3821170
 4:59 pm on Jan 7, 2009 (gmt 0)

Since you are talking about over 25,000 pages it will probably take googlebot some time to revisit (or I should say attempt to revisit) all of those pages to discover they are now blocked and to drop them from their system. If you want to speed up that process increase the amount of links pointing to those pages. Also please remember that Google's site: search is not always accurate. They have said that the total search results number is more of an estimate.

If you have used Google's robots.txt tool and it verified that your robots.txt is correctly allowing and disallowing what you want then you just need to wait. While you are waiting go work on your internal and external links that will help your search rankings and probably also increase the speed which google indexes your pages.

glennk




msg:3821216
 5:38 pm on Jan 7, 2009 (gmt 0)

Hi Gooroi,

Thanks for your reply. I am still a little confused though. Ill try explain why.

I closely monitor my site stats, webmaster tools, analytics, and Hit tail. I regularly check to see how many pages I have listed in google. In the 4 years my site has been running, the maximum amount of pages Ive had listed is 3,500 which was made up of around 500 pages from the main domain and 3000 from the forum. In more recent times I had roughly 1,500 pages listed from the whole forum and they all featured very well for local low competition keyword searches.

The thing that has me alarmed at the moment is the figure of 25000 pages listed for the forum. Since I disallowed the wap 2 pages I expected them to be deindexed and replaced with normal topics. The figure of 25000 has really thrown me and that fact they are not being returned in searches has me even more concerned.

Also just to put my mind at rest would you possibly take a few minutes look at my robots.txt to check there is no glaring errors.

g1smd




msg:3821315
 7:42 pm on Jan 7, 2009 (gmt 0)

I documented a load of Duplicate Content issues with forums back in 2004 or 2005 or so. I think I might have also included some example robots.txt files to help counteract some of the problems.

phranque




msg:3821659
 5:27 am on Jan 8, 2009 (gmt 0)

you are probably looking at "URLs restricted by robots.txt" in the Web crawl Diagnostics tab.
have you also checked the "Analyze robots.txt" option in the Tools menu?

the pages at those urls are already indexed.
someone might actually be linking to those urls.
you have to decide what signal you really want to send the bot.
for example, you can tell the bot "go away" (Disallow) or "there's nothing here" (404) or "go somewhere else" (301/302) or "take this" (200).
there are other response options, but Disallow doesn't mean "Forget about it".

glennk




msg:3821828
 12:40 pm on Jan 8, 2009 (gmt 0)

Thanks for that, most helpful. I hadnt realised there was an analyse tool.

It appears my pages are allowed.

Just one more question with regards to specific forum pages. Is it normal for forum pages to be viewed as a directory ?

The analysis tool says :

/forum/example1/example2/ Allowed

Detected as a directory; specific files may have different restrictions

[edited by: goodroi at 2:05 pm (utc) on Jan. 8, 2009]
[edit reason] examplified [/edit]

bava_seo




msg:3822645
 12:15 pm on Jan 9, 2009 (gmt 0)

HMM.. valuable information

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved