Forum Moderators: open
BTW, please follow the following link and see #3 listing.
[google.com...]
When you have a search that has over 2,000,000 pages being returned it is more rare to see URLs with a ? in them.
Yes, it's possible. But it doesn't prove that URLs with a ? won't rank well.
Please, feel free to give your comments on #2 listing: [google.com...]
Sorry I can't give you the link because it must be a post, not a get. But a search on:
WebmasterWorld site search [searchengineworld.com] with the terms 'google dynamic' returns 193 results.
concensus seemed to be that you're correct Beyond. For a while, google would pick up the first level of ? pages, but would not count them as links to other pages (if there were other links). And you could also see that the number of & (ie parameters) were also lower in highly indexed sites. All of this changes month to month...Google has picked up more dynamic sites, but so far I don't think anyone can recognize a definite pattern of how deep it will go, or which ones count as linking pages, etc.
Maybe our buddy Googleguy would like to comment on file extensions and save us the time of making a bunch of test files up to fully verify this.
I'm sure google have a very elegant way of handling this, but they cant just index the page once(ie: exclude the querystring when checking if the page has already been spidered), because then almost all the dynamic content would be lost,
so i figure google combines a per-site limit of pages with a penalty for every dynamic page, perhaps increasing every time google follows another dynamic link from a dynamic page.
Just a guess + my $0.02,
William
Thanks for the thoughts so far. We are thinking of changing the URLs for our shop... It deserves to get more traffic!
As far as Altavista goes, everytime it hits our servers, we hit very high load... we have maybe 20,000 pages of content, and thinking about it, if it indexed the dynamic shop pages, it might kill us completely...
But that is an infrastructure matter...
Does Google catalog these links? And which one? The friendly "www.foo.com/article/32646.html"
or the "www.foo.com/article.asp?id=32646" URL.
And the most important thing : Can i get
penalized for the redirect thingie?
I'm not 100% sure but I think this results in Google been returned with a 404 for any request they make on the "www.foo.com/article/32646.html" page. This probably isn't too good and I wouldn't be surprised if Google didn't index such responses at all.
Let me know if I'm wrong as this would be a great way for us ASP 3.0 guys to deal with this issue (short of writing our own APIs).
Josh
Function HTTPGet(strURL) 'As String
Dim strReturn ' As String
Dim objHTTP ' As MSXML.XMLHTTPRequest
If Len(strURL) Then
Set objHTTP = Server.CreateObject("Microsoft.XMLHTTP")
objHTTP.open"GET", strURL, False
objHTTP.send 'Get it.
strReturn = objHTTP.responseText
End If
HTTPGet = strReturn
End Function
Dim theCorrectURL
** here you put some parsing of the missing url that is returned to 404.asp page. Output is the URL of the asp page you want to load (for ex. article.asp?id=62346&foo=73443 **
theCorrectURL = [foo.com...]
response.write HTTPGet(theCorrectURL)
*TADAAA*
Any other activeX control that reads file via HTTP will do ;-)
Nice!
Brett_Tabke :
HTTP/1.1 200 OK
Server: Microsoft-IIS/4.0
Date: Fri, 25 Jan 2002 15:18:50 GMT
Content-Type: text/html
Set-Cookie: ASPSESSIONIDQGQGGHQK=LFMELLFDIHDMMKKFFFANGGOB; path=/
Cache-control: private
I hope that this it the info you asked for. I used [rexswain.com...]
Dim useragent, found, searchspiders, spider
searchspiders = Array("Googlebot", "ArchitextSpider", "Scooter", "Ultraseek", "InfoSeek", "Lycos_Spider_(T-Rex)", "Gulliver", "FAST-WebCrawler")
found = false
useragent = request.serverVariables("HTTP_USER_AGENT")
For each spider IN searchspiders
If inStr(useragent, spider) Then
found = true
Exit For
End If
Next
IF found Then
response.write HTTPGet("http://www.foo.com/" & newQueryStr)
Else
response.redirect("http://www.foo.com/" & newQueryStr)
End If
response.end
One question, though. For actual errors, i.e., requests for nonexistent pages that don't parse into a query, I need to return an error page AND a 404 result code to the spider. If I just branch to a Response.Write(HttpGet(pagenotfound)) I still return a 200 to the server.
I think I need to include the line
Response.Status = "404 Object Not Found"
but where does this go? I've tried to put it in the error page just before I write the "not found" page, but I'm still returning a 200...
On my dual question above, I've narrowed things down a bit. I've managed to make the 404 code work for truly bogus pages, and I've narrowed the non-function in error referral mode to a server issue (I think). I'll repost in the script forum if I'm still stuck, thanks.