Forum Moderators: Robert Charlton & goodroi
What can be the problem and what can be done so that the updated version may be visible?
1. The cached page and the actual content that Google uses to calculate rankings are not always in synch.
2. Right now, Google appears to be doing some significant restructuring, from what I can see. Spidering is not in a normal pattern.
3. If you haven't already opened a Webmaster Tools account for your site, I'd suggest doing that. the feedback can be helpful, as well as the verified channel for communicating with Google.
The message says that there are errors in accessing it. Is there a way to delete the old sitemap and create a new one all over again? I am not sure how it should be created again so that it functions properly. :(
The sitemap is hosted on your server. It had to be built by your company so the answer is yes just rebuild it and upload the new sitemap with the same name.
You don't have to do anything in Google sitemaps as the new one will be crawled and updated.
Often when the cache date drops back to one from many months ago, it indicates that the page of content is now being indexed under a different URL to the one you are looking at. Look for internal Duplicate Content issues.
Also, I do not have a robots.txt on my server. This question may be very elementary but I need to know -
1) Do all site have a robots.txt file?
2) If yes, how does one get it and what should it include?
3) If not, why are some pages under the list of unreachable urls with the following details - robots.txt unreachable
Is there something I can do to ensure that all the pages are crawled?
ALso, if I do not want some pages on my website to be crawled, I should not link them from anywhere, right?
Any help will be appreciated!
I just checked the status of my sitemap and web crawl under Webmaster tools and noticed that quite a few pages are listed under "Unreachable URLs". What does this mean exactly?
I believe that relates to timeouts and internal server errors (500). Perhaps your server was just particularly slow or down when Googlebot tried to access it. Those URLs should just get respidered automatically.
[google.com...]
[edited by: Bones at 2:26 pm (utc) on Sep. 3, 2007]
Do all site have a robots.txt file?
For more about just this part of your questions, check out the concluding post on this thread by Jim Morgan....
Google and having *no* robots.txt file
could this be hurting your site?
[webmasterworld.com...]
Note that many of the links on the thread are broken, but the basic information in Jim's post still applies, and I haven't seen it done better.
As I understand it (which may be imperfectly), if you use a custom 404 file and you don't have a robots.txt, GoogleBot will look for robots.txt, waste a lot of server resources, and you'll get a lot of 404 error messages in your log files.
It's better to have at least an empty robots.txt file, allowing everything, than no file at all. For Apache, you must also use the path syntax for a custom 404 as indicated in Jim's post.
There's a robots.txt forum on WebmasterWorld, which should have answers to your other questions in its library.
ALso, if I do not want some pages on my website to be crawled, I should not link them from anywhere, right?
That's not sufficient to block crawling. Here's a recent discussion on this....
Why is Google indexing my entire web server?
[webmasterworld.com...]