I know I've opened up several threads on my issues but just so there is no confusion, I wanted to start another. So, as a recap... my SEO company noticed back in the summer, that Googlebot wasn't crawling www.example.com for me. They were crawling example.com, which is how I had officially set my cart up and how it was set in Google Webmaster Tools.
What I did this week was officially change my domain to www.example.com. Before if you tried to access www.example.com, the cart redirected you to example.com. Now, the opposite: you try to access example.com, it redirects you to www.example.com. So, the listed site is now: www.example.com.
I then went to Google Webmaster Tools and added www.example.com to my list of domains. I then went into SITE SETTINGS for both and asked Google to view the site with www, regardless.
Now, about a week later, Googlebot has finally crawled www.example.com and I get this error message.
The requested URL /search?q=cache:6DoDVIBPTCkJ:www.example.com/+&cd=1&hl=en&ct=clnk&gl=us was not found on this server.
What is going on here? Googlebot had no issue scanning example.com but can not scan www.example.com even when I take these necessary steps. I also went to Bing.com today and noticed that they have crawled www.example.com too with no errors. Yahoo? Same thing... crawled with no errors. Nobody is having any issues with me switching from example.com to www.example.com, just Google and it's bot.
I need some help. When I go into Google Webmaster Tools, and visit Crawl > Blocked URLS for both the www.example.com and example.com settings, I noticed today that www.example.com has 147 blocked urls and example.com has 208. If this is the SAME site, why is there a 61 blocked url difference between sites? That seems strange.
This is what both sites show in GTW as the current robots.txt content (scanned as of 10 hours ago):
User-agent: *
Disallow: /addons/
Disallow: /cgi-bin/
Disallow: /blog/
Disallow: /controllers/
Disallow: /core/
Disallow: /info_pages/
Disallow: /install33
Disallow: /files/
Disallow: /js/
Disallow: /lib/
Disallow: /new/
Disallow: /old/
Disallow: /payments/
Disallow: /production
Disallow: /schemas/
Disallow: /shippings/
Disallow: /skins/
Disallow: /var/
Disallow: /admin1310.php
Disallow: /config.php
Disallow: /config.local.php
Disallow: /init.php
Disallow: /prepare.php
Disallow: /shippingkit
Disallow: /*?
Disallow: /store_closed.html
Sitemap: http://www.example.com/sitemap.xml
I did try to fetch both sites again, and I am getting a response of: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 which could be a problem? Not sure.
Basically, it looks to me like Googlebot is not crawling the www version of my site. I don't know if that is because something in my shopping cart software (Cs Cart 3.0.6 PROFESSIONAL) is blocking Googlebot specifically (again Yahoo or Bing are no issue) or it is a server issue? My sever claims that Cs-Carts needs to be formatted to view the cart with or without www which they say it is not. In a way it is, because Yahoo or Bing can crawl it both ways, you know? I'm starting to believe something is being blocked with Google in particular.
My shopping cart was listed for about two years as www.example.com. Then about four years ago, I changed it to example.com when I updated from version 2.0 to 3.0 and had to build a new cart. It may be related.
Thank you for any help that you can provide.
[edited by: phranque at 4:51 am (utc) on Jan 11, 2014]
[edit reason] exemplified domain [/edit]