Forum Moderators: open
I submitted the home.asp page to google manually because I was worried about the flash intro. This may be why it is being spidered and the rest aren't? I manually submitted 2 more pages this morning to test this theory, but obviously I'm yet to see if they'll been spidered.
Thanks.
[webmasterworld.com...]
Myself, I have several forms on my site and the form pages (.php) have been added to the Google index despite being excluded in the robots.txt file. Just the URL's are showing for the scripted php pages, no title or description (they don't have either).
I feel that Googlebot does rip URL's out of forms and javascripts and sticks them in the index whether we want them to or not. Googleguy said recently that they will index pages that are linked to even if they are excluded by the robots.txt file, but they will not show the title and description.
This would be simpler if someone took a look at the site in question, Glenn Murray, please sticky me your web address and I'll have a look.
As for forms, it would make absolutely no sense for Google to follow a form. In most cases, you would have to fill out a form to get correct results.
Let's take a really simple form, here's one [google.com...]
The spider would have no way to know what to put in that field to get back useable information. It returns different information for each thing that you enter in the form. And if you enter nothing, many forms only return an error page. Not exacly useful information for a search engine to index.
If you have a form result page in the index, there is probably a link from someone to that specific result. Like if you put in a link to [google.com...]
I may be off base here but if you do a search for your missing url without using the preceding "www." do you get any results? If so, I think this may indicate a penalty.
I've discovered in some searches that there are often pages in the results which show only the URL without "www." before them. However, if you perform a search for the same URL with the "www.", you will get the same message, "Sorry, no information is available for the URL www....". The page is actually still in the index ... but may have been penalized for whatever reason.
I just noticed this as recenly as yesterday for several of my own pages. Not sure if its an actual penalty or not though. I just changed my hosting company and the site was down for a while. (Not a smooth transition) So I don't know if that is what caused it or not and no way to find out except to give it some time to see if the robots pick up the missing pages.
Perhaps GG or another Google rep can help with an answer here? No problems with any of the other engines on those missing pages, so its a bit of a mystery.
I have 92 pages showing in the index where all of the pages in the site were indexed a month ago. I'm as confused as you!
Thanks for the suggestion - I learned something from it. However (perhaps fortunately), a search without the www returns nothing, so it doesn't look like that's it. Mike tells me that it might be the fact that our urls are dynamic and use "id=****" etc. to identify the topics. He suggested using "page=xxx" instead. We're changing this. He also suggested adding a site map and we're doing this too.
Fingers crossed.
Cheers.