Forum Moderators: open

Message Too Old, No Replies

pages in index without cache

does this mean a manual submit

         

needinfo

4:00 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



I 've just noticed a lot of my pages which I did not want indexing are in the index but no cache is shown. I seem to remember that this was a good indication that those pages were manually submitted via Google submit a site link. Can anybody please confirm that for me.

Also if I put a NOINDEX tag on the pages will they be dropped during the next update?

needinfo

7:32 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



Is there anybody who can quickly confirm this. I 'm sure I've seen it mentioned before but i cannot find it on "site search" anywhere.

jonrichd

7:34 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



I believe this will occur when Googlebot finds a link to the pages, but for whatever reason hasn't been able to crawl them.

This could be because there wasn't enough time left in the crawl cycle, the server was down when the bot came, or because the robots.txt forbid crawling those pages.

Sounds like the latter may be the case for you.

needinfo

7:51 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



jonrichd
I doubt that Google would have found these pages via links as they are linked to via a form and as far as I'm aware Googlebot cannot follow a form.

jonrichd

8:00 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



It's possible that the "Add URL" form was used to submit the links, but I don't think that the missing descriptions have anything to do with the fact that Add URL was used.

OTOH, I don't see why you would have used Add URL to submit a page that you didn't want indexed. Is there valuable content on these pages that someone else might have felt was worth linking to directly, thereby bypassing the form? I ran a site once where we wanted users to register before viewing certain content. Ultimately, though, the information showed up in Google's index.

There have been other reports in this forum in the past saying that Google had indexed pages that the owner had not meant to be public. There is speculation that it might be able to get pages from toolbar requests.

needinfo

8:05 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



jon,
I didn't submit the pages, I am starting to think possibly one of my competitors did. they know I do not want that info public.
If I put these pages in a robots.txt file to NOT be indexed will they be dropped from the index at the next crawl do you know?

rfgdxm1

8:10 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to consider that these pages were discussed on a website bulletin board, etc. and Google found the link to them there. Using robots.txt may not work. That merely prevents Google from spidering.

jonrichd

8:12 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



I would definitely put them in robots.txt as not to be indexed. (Actually, I thought you had already done this). What I don't know is if putting them in robots.txt will prevent the page from showing up in the index as it does now, without a description, because of inbound links.

If that's what happens, and you really want people to register before viewing the page, you will have to implement some logic (perhaps something that looks at the referer to determine it's not a search engine) to force them to the registration page. Of course there are ways around this, too.