Forum Moderators: open
There are tons of links to the site, and I have painstakingly contacted webmasters from sites who listed the old url. They have updated their links, but it doesn't seem to matter.
Anyone have any ideas? I have tried submitting the site to DMOZ, but they have not listed it.
Thanks for any help. I am at my wit's end.
This is NOT a Google error, it's an error within the samspade tool that occurs when you try to GET a web page using HTTP/1.0. I've been looking through all i could find on the subject and it's not really documented. The built in spider, however, also uses HTTP/1.0 (at least i found a reference to this somewhere) and this one is able to fetch the pages, so there's no real problem here, it's a samspade bug.
/claus
http 0.9
09/12/03 12:33:45 Browsing http*//appsci.queensu.ca/
Fetching http*//appsci.queensu.ca/ ...
GET /HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Fri, 12 Sep 2003 16:36:51 GMT
MicrosoftOfficeWebServer: 5.0_Pub
Connection: Keep-Alive
Content-Length: 23638
Content-Type: text/html
Expires: Fri, 12 Sep 2003 16:36:51 GMT
Set-Cookie: ASPSESSIONIDSADQCQTC=JOBJNJPAFNJALKFHMKJBLAKC; path=/
Cache-control: private
http 1.0
09/12/03 12:35:44 Browsing http*//appsci.queensu.ca/
Fetching http*//appsci.queensu.ca/ ...
GET /appsci.queensu.ca/ HTTP/1.0
User-Agent: Sam Spade 1.14HTTP/1.1 404 Object Not Found
Server: Microsoft-IIS/5.0
Date: Fri, 12 Sep 2003 16:38:50 GMT
Content-Length: 4040
Content-Type: text/html
http 1.1
09/12/03 12:36:38 Browsing http*//appsci.queensu.ca/
Fetching http*//appsci.queensu.ca/ ...
GET / HTTP/1.1
Host: appsci.queensu.ca
Connection: close
User-Agent: Sam Spade 1.14HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Fri, 12 Sep 2003 16:39:43 GMT
MicrosoftOfficeWebServer: 5.0_Pub
Connection: close
Content-Length: 23638
Content-Type: text/html
Expires: Fri, 12 Sep 2003 16:39:43 GMT
Set-Cookie: ASPSESSIONIDSADQCQTC=CPBJNJPAFFPPPPKMGFKHMNAO; path=/
Cache-control: private
it does appear to be a bug in samspade... now to find try to find another safe (ie: non-rendering) browser that allows for the different http versions and client ids :(
[edit] delinked urls [/edit]
I suggest you to make a simple test.
redesign the site map in this way :
get rid of all css, stylesheets, onmouseovers and all that stuff. Make a screenshot of the header of your sitemap, turn it into a .jpg or .gif and place it on the top of the page so it keeps the uniform design of the whole site.
List all those links to other pages within the site using only simple html, and use absolute links with "http://..."
No onmouseovers, no css, no other stuff, just simple html.
Get freshbot to the site and watch the logs to see what happens, I am almost sure it will follow the links and in that case you know which path to choose with the site.
I appreciate the suggestions. What makes me feel that it is not the code, however is:
a. I already tried your suggestion - straight html. No javascript at all, no css. No robots.txt either. Link to a site map. I left it like that for two months and watched the logs. The Googlebot came to the index page almost every day, but never went to another page.
b. The other 9 sites that are all using the same template are unaffected by this phenomenon.
Can anyone think of any server config issues that might cause this? We have tried to compare the server to the others and haven't found anything so far, but...
Cheers,
Crow
I recommend that when you talk to potential SEO companies, you keep the following in mind:
1) Make sure the people that you contact actually understand what you are trying to accomplish with your site. There is not a one size fits all system that is applicable for search engine optimization.
2) Ask for references. Not just a list with a description of what they did for a client, but phone numbers to actually contact their clients and learn about the successes from the client first-hand. Speaking to a reference should build a lot of credibility for a SEO firm.
3) Make sure they fully explain what they are going to be doing to your site and how they are going to accomplish the goals of the site. Be wary of any SEO firm that keeps you in the dark on their practices or claims to have any proprietary knowledge. It pretty cut and dry if you are an SEO and there is really nothing to hide (no secrets).
Just some friendly advice...
BrAsS_mOnKeY
Take a look at the posts in:
1) Website Technology Issues: [webmasterworld.com...]
2) Webmaster General: [webmasterworld.com...]
You'll find plenty of threads with titles like "redirect...", "301 redirect...", "302...", "301...", "htaccess redirect" and the like.
Basically it's all about letting the server tell the browser (or whatever user-agent is visiting) that a certain file is moved from one location to another. There are two ways of doing it, a 301-way and a 302-way.
If the file is moved temporarily and will come back, you should use a 302 and if the file is moved permanently and it will not come back you should use a 301. Both are usually controlled by means of a special file called ".htaccess" on the Apache *nix server. The MS IIS server has other methods for doing the same thing.
Neither of these "status codes" has any relation to the problem Crow_Song is reporting, as his server on the problem site (IIS) returns a code 200 which means "no problems, here's the file you wanted".
/claus
I know the usual answer that a dedicated ip is not
needed to be listed in G, which implies that the
host header is sent.
However, a site was recently moved to a dedicated
ip on the same physical server with no other changes.
It's down to one page in a site: -xxxx query.
G had worked it's way up to 140+ pages, while fast.no
had and still has all 170+ pages. This also coincided
with the implementation of 304 if-modified processing
as per G's recent recommendation. The 304 response
was checked using the tools at squid.org.
Is it possible that G notices that a specific ip
is used for only one site and decides not to send
a host header? This site depended on the host
header to gen the right content. Otherwise,404.
+++
Yes - always! Allthough GoogleBot makes a HTTP/1.0 [webmasterworld.com] request, it always sends the host header. Doing HTTP/1.0 instead of HTTP/1.1 doesn't mean that it's forbidden to send the host header request. It just means that HTTP/1.1 compliant clients MUST send the header.
Also, I'm curious as to why links to the old name/old server still remain in Google's index. The old domain was retired almost a year ago, replaced by the new. The new hasn't been indexed, and links to the old are STILL kicking around. Why haven't they been removed by now? Would it be worthwhile to resurrect the old name and put redirects in place? And perhaps to monitor whether or not Google is still trying to visit the old site?
Thanks again guys, for all of the help