Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Blocked urls according to GWT

         

onlinesource

4:28 am on Jan 7, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



According to GTW, I have almost 20 blocked urls via my site as non-www and my list with www, I have 120 blocked urls. Not sure what percentage of these are legit pages or posts or what are pages I want to filter away like admin content? Is there a tool to scan your site and see what pages the Googlebot can not access?

batface

7:05 am on Jan 7, 2014 (gmt 0)

10+ Year Member



Have you set any directories to noindex in your robots.txt? You can check sample of URLs in GWT against your robots and make assumptions if the numbers tally up.

Otherwise the old fashioned way is the best to check. Create a list of all URLs indexed; create a list of all actual URLs using a third party tool like ScreamingFrog; compare the 2 lists in Excel removing all duplicates from the ScreamingFrog side.

The list of indexed URLs may also give you insight into URLs you don't want indexed but are.

phranque

12:38 pm on Jan 7, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



try "fetch as googlebot" in GWT

onlinesource

3:06 pm on Jan 7, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



I've fetched the site several times in GWT. How would I see a list of all urls that Google has scanned?

For the time being, I've gone and copied text from specific pages and then searched for that text in Google. If the page appears, I assume that Googlebot is scanning the page and it's content.

Still, the difference in missing urls from www to non-www scares me (since it is technically the same site).. Then again, these could be anything but I just wish there was a way to download a complete list of urls that Google scans.

I have an online SEO software program that allows me to scan a limited number of links and check them for Googlebot issues. It does no scan them all but from what I do see, Googlebot is not restricted. Still, I would like to be see WHAT is being blocked, so I can be 100% sure it is nothing important.

lucy24

7:08 pm on Jan 7, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the difference in missing urls from www to non-www scares me

You've just recently sorted out the with/without thing in wmt (different thread), right? Everything in wmt operates on some delay, varying from two-three days to a week or more. (Possibly much more, in the case of keywords.) So it's entirely possible that you don't have a problem at all; it's just the usual lag in wmt catching up.

phranque

1:49 am on Jan 8, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Use Xenu linksleuth to crawl your site.

Check your server access logs to see what responses Google got for various requests.