Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot not scanning www site

         

onlinesource

5:57 pm on Jan 10, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



I know I've opened up several threads on my issues but just so there is no confusion, I wanted to start another. So, as a recap... my SEO company noticed back in the summer, that Googlebot wasn't crawling www.example.com for me. They were crawling example.com, which is how I had officially set my cart up and how it was set in Google Webmaster Tools.

What I did this week was officially change my domain to www.example.com. Before if you tried to access www.example.com, the cart redirected you to example.com. Now, the opposite: you try to access example.com, it redirects you to www.example.com. So, the listed site is now: www.example.com.

I then went to Google Webmaster Tools and added www.example.com to my list of domains. I then went into SITE SETTINGS for both and asked Google to view the site with www, regardless.

Now, about a week later, Googlebot has finally crawled www.example.com and I get this error message.


The requested URL /search?q=cache:6DoDVIBPTCkJ:www.example.com/+&cd=1&hl=en&ct=clnk&gl=us was not found on this server.


What is going on here? Googlebot had no issue scanning example.com but can not scan www.example.com even when I take these necessary steps. I also went to Bing.com today and noticed that they have crawled www.example.com too with no errors. Yahoo? Same thing... crawled with no errors. Nobody is having any issues with me switching from example.com to www.example.com, just Google and it's bot.

I need some help. When I go into Google Webmaster Tools, and visit Crawl > Blocked URLS for both the www.example.com and example.com settings, I noticed today that www.example.com has 147 blocked urls and example.com has 208. If this is the SAME site, why is there a 61 blocked url difference between sites? That seems strange.

This is what both sites show in GTW as the current robots.txt content (scanned as of 10 hours ago):


User-agent: *
Disallow: /addons/
Disallow: /cgi-bin/
Disallow: /blog/
Disallow: /controllers/
Disallow: /core/
Disallow: /info_pages/
Disallow: /install33
Disallow: /files/
Disallow: /js/
Disallow: /lib/
Disallow: /new/
Disallow: /old/
Disallow: /payments/
Disallow: /production
Disallow: /schemas/
Disallow: /shippings/
Disallow: /skins/
Disallow: /var/
Disallow: /admin1310.php
Disallow: /config.php
Disallow: /config.local.php
Disallow: /init.php
Disallow: /prepare.php
Disallow: /shippingkit
Disallow: /*?
Disallow: /store_closed.html
Sitemap: http://www.example.com/sitemap.xml


I did try to fetch both sites again, and I am getting a response of: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 which could be a problem? Not sure.

Basically, it looks to me like Googlebot is not crawling the www version of my site. I don't know if that is because something in my shopping cart software (Cs Cart 3.0.6 PROFESSIONAL) is blocking Googlebot specifically (again Yahoo or Bing are no issue) or it is a server issue? My sever claims that Cs-Carts needs to be formatted to view the cart with or without www which they say it is not. In a way it is, because Yahoo or Bing can crawl it both ways, you know? I'm starting to believe something is being blocked with Google in particular.

My shopping cart was listed for about two years as www.example.com. Then about four years ago, I changed it to example.com when I updated from version 2.0 to 3.0 and had to build a new cart. It may be related.

Thank you for any help that you can provide.

[edited by: phranque at 4:51 am (utc) on Jan 11, 2014]
[edit reason] exemplified domain [/edit]

phranque

9:46 pm on Jan 10, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Do you have a canonical hostname redirect?

netmeg

11:51 pm on Jan 10, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why are you going back and forth between www and non-www? What's wrong with non-www if that's how Google had it indexed?

onlinesource

12:40 am on Jan 11, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have setup canonical urls for products, newsletter postings and categories. I was not aware of a hostname redirect. I will have to look into whether I have set that up or not?

aakk9999

1:33 am on Jan 11, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you are talking about the same site as in this thread [webmasterworld.com...] then it would appear that you do have canonical hostname redirect, because you say there:

My main site is now www. So if you try to type in my site without a www, it directs you to www.example.com.


But netmeg is right - why did you change it if it was indexed without www before?

lucy24

2:12 am on Jan 11, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<ot>
Disallow: /admin1310.php
Disallow: /config.php
Disallow: /config.local.php
Disallow: /init.php
Disallow: /prepare.php

Why are you telling robots the names of your protected files?
</ot>

onlinesource

6:30 am on Jan 11, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



ot>
Disallow: /admin1310.php
Disallow: /config.php
Disallow: /config.local.php
Disallow: /init.php
Disallow: /prepare.php

Why are you telling robots the names of your protected files?
</ot>


I though by saying Disallow, I'm telling the bot to ignore these files and folders because I do not want them to appear in search results? Am I missing something?

netmeg

1:59 pm on Jan 11, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep. Read up on the difference between crawling and indexing.

Personally, I try NOT to make Google think. I lay it out how I want it and then I minimize changes.