Forum Moderators: open
My site just came online about 6 weeks ago..I saw the PR on the homepage go from a PR1 to a PR4 I think on this crawl with just my DMOZ link (it has not picked up my yahoo or other links from that it looks like) My Main brand sections have gone to a PR3, the individual model sections - well most are a white bar with no pr.. which tells me it didnt get crawled.. Is google unreliable with its deep crawls? Why would this happen? I have no fancy script or anything on my site to prevent it from crawling the entire site???
Google didnt deep crawl..
Google did deep craw om my site, google found the extra 1.6k of pages this month.
So maybe google crawled a day before you put up your new pages?
what extention do these files have, not all files get indexed in google even if the have backlinks, for example
aaa.htmlt
will never be crawled by google.
[edit: added line]
Could it be the PR to the main section is only a PR3? I dont see that being it though as I looked at one of my competitors sites which has a somewhat similiar layout to somewhat of an extent and all his pages have been crawled and have some kind of a PR rating.. I have seen links off a page of PR3 get crawled.. I wonder whats up with this? I cant even check my logs to see if I was deep crawled because it is a yahoo store and I dont have access to my raw logs like that... Uggh frustrating this can be sometimes!
note: if you did this just yesterday, you will have to wait for the next google crawl.
note1: pr3 is most exelent for the purpose of just getting listed in the database, this months crawl shows me that google deepcrawls pr3 lackbacks also cause i added 1 link from a pr3 site to 1.6k of new files linking to each other. There all show up in www2.google.com
i think your'e a few day's late the crawl has already een done a few days ago
.
[edit:added note]
google only shows: .asp .htm .html .cgi .pdf and some more. [i put a link in my profile, showing exactly which extensions google crawls]
but if you have an extension other that in the list, google does not crawl it, since it doesn't know what this file is.
[edited by: ikbenhet1 at 6:19 pm (utc) on Aug. 23, 2002]
I still can't say yet for the new html files,
I't will become clear somwhere around monday, then the pr will be calculated.
Even if these sites get a minimal of pr that will boost me up good, cause each one of the 1.6k of html files are linking to my 10 best domains, and each html has 20 links linking to other files within those 1.6k
but i will know for sure what effect it has in a few days,for now it looks good.
google only shows: .asp .htm .html .cgi .pdf and some more. [i put a link in my profile, showing exactly which extensions google crawls
Hmmm, that does not seem reasonable.
Can anybody else please confirm this ?
I would think googlebot uses the Content-Type field of the response header to detect the type of the document.
i put a example page in my profile, with is crawled by google, so we can check it out.
now look at the navigation-bar, it starts with the link 'whatsnew'.
You see it? ok.
This file ends with .mnsw and is therfore is not indexed by google.
(check it please type the url in google and see, it's not crawled)
now you also see in the navigation bar a 'become a member' link, ending with blablabla.com/join
(please type the url in google and see, it's crawled)
this is proof for me.
This page is crawled, and all other links on this page ending on / or .htm or .html are crawled also, but the links ending on .mnsw and some more are not crawled.
Also there are lotta more extensions that do not get crawled.
Returns the following headers:
Expires: Mon, 11 Jan 1999 01:23:45 GMT
Pragma: No-Cache
Cache-Control: no-cache
I would say that's more than enough not to index the page.
Btw, did not find "become a member" link.
Also did not find any .htm or .html links on that page at all.
And that reference page with extenions lost any credibility once i ran across the line: "jsp - Java Script Page".