Forum Moderators: open

Message Too Old, No Replies

URLs not recognised by Google

         

Glenn Murray

12:02 am on Apr 5, 2004 (gmt 0)

10+ Year Member



Hi everyone. For all but 2 of my website's pages, if I copy the url into Google and do a search, Google returns "Sorry, no information is available for the URL www...." This indicates to me that these pages
haven't been spidered yet, but this doesn't make sense. The top level page has been spidered, and it has text links to the rest of these pages. It's not a new website, it's regularly updated, and we don't have any javascript in the source. Anyone have any
idea what could be going wrong?

jpell

1:51 am on Apr 6, 2004 (gmt 0)

10+ Year Member



Welcome to WW Glenn,
Were these pages there when your site was crawled for the first time?
If so, then I can't offer an opinion. If not, just hang in there, they will be indexed eventually.
JPell

Glenn Murray

2:10 am on Apr 6, 2004 (gmt 0)

10+ Year Member



Hi and thanks for your response. Yep, the pages were there when the site was spidered. I'm beginning to think it has something to do with the splash page. The first page people go to (the highest level page) is a Flash animation. It has all the meta data before the flash in the code, it's being spidered ok, and it is PR4. This page links to the first real content page which is a home.asp, and it's being spidered ok too. These are the only 2 pages "recognised" by google at the moment.

I submitted the home.asp page to google manually because I was worried about the flash intro. This may be why it is being spidered and the rest aren't? I manually submitted 2 more pages this morning to test this theory, but obviously I'm yet to see if they'll been spidered.

Thanks.

abates

2:16 am on Apr 6, 2004 (gmt 0)

10+ Year Member



How long has the site been up?

Glenn Murray

2:39 am on Apr 6, 2004 (gmt 0)

10+ Year Member



The entire site has been up for a year or two. Most of the pages are at least a year old. I've recently edited a lot of them (approx 6 weeks ago). The top level page (the Flash intro) was submitted at least 8 weeks ago, probably longer. I submitted the home.asp to Google about 6 weeks ago.

Cheers.

rainborick

2:53 am on Apr 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Try entering:

site:www.****.com -asdf

which will bring up all of the pages from that URL that are in the index.

Glenn Murray

3:18 am on Apr 6, 2004 (gmt 0)

10+ Year Member



Thanks! This displays 7 results, 2 of which are the pages mentioned above. The rest are elements of these pages like menu structures, etc. None of the other pages are listed.

Appreciate your help.

BigDave

3:26 am on Apr 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are your links to those other pages of the for <a href='www.example.com/page.html">?

Google will not follow javascript links or <form> links.

MikeBeverley

10:28 am on Apr 6, 2004 (gmt 0)

10+ Year Member



BigDave, there were several threads recently about Google indexing URL's in javascripts let alone the information regarding Google's crawling of javascripts for spam techniques:

[webmasterworld.com...]

Myself, I have several forms on my site and the form pages (.php) have been added to the Google index despite being excluded in the robots.txt file. Just the URL's are showing for the scripted php pages, no title or description (they don't have either).

I feel that Googlebot does rip URL's out of forms and javascripts and sticks them in the index whether we want them to or not. Googleguy said recently that they will index pages that are linked to even if they are excluded by the robots.txt file, but they will not show the title and description.

This would be simpler if someone took a look at the site in question, Glenn Murray, please sticky me your web address and I'll have a look.

BigDave

4:23 pm on Apr 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, and I participated in those threads. Google will even pick up an URL from within text. . . but, those URLs will not get any PageRank, and will not be a high priority to crawl.

As for forms, it would make absolutely no sense for Google to follow a form. In most cases, you would have to fill out a form to get correct results.

Let's take a really simple form, here's one [google.com...]

The spider would have no way to know what to put in that field to get back useable information. It returns different information for each thing that you enter in the form. And if you enter nothing, many forms only return an error page. Not exacly useful information for a search engine to index.

If you have a form result page in the index, there is probably a link from someone to that specific result. Like if you put in a link to [google.com...]

Liane

10:49 pm on Apr 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Glenn Murray,

I may be off base here but if you do a search for your missing url without using the preceding "www." do you get any results? If so, I think this may indicate a penalty.

I've discovered in some searches that there are often pages in the results which show only the URL without "www." before them. However, if you perform a search for the same URL with the "www.", you will get the same message, "Sorry, no information is available for the URL www....". The page is actually still in the index ... but may have been penalized for whatever reason.

I just noticed this as recenly as yesterday for several of my own pages. Not sure if its an actual penalty or not though. I just changed my hosting company and the site was down for a while. (Not a smooth transition) So I don't know if that is what caused it or not and no way to find out except to give it some time to see if the robots pick up the missing pages.

Perhaps GG or another Google rep can help with an answer here? No problems with any of the other engines on those missing pages, so its a bit of a mystery.

I have 92 pages showing in the index where all of the pages in the site were indexed a month ago. I'm as confused as you!

Glenn Murray

1:11 am on Apr 8, 2004 (gmt 0)

10+ Year Member



Hi Liane,

Thanks for the suggestion - I learned something from it. However (perhaps fortunately), a search without the www returns nothing, so it doesn't look like that's it. Mike tells me that it might be the fact that our urls are dynamic and use "id=****" etc. to identify the topics. He suggested using "page=xxx" instead. We're changing this. He also suggested adding a site map and we're doing this too.

Fingers crossed.

Cheers.