Welcome to WebmasterWorld Guest from 3.209.80.87

Forum Moderators: open

Message Too Old, No Replies

Different PR for Capatalisation

     
2:49 pm on Mar 12, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 21, 2004
posts:48
votes: 0


Just come acros an ASP page which is listed in google as
www.domain.com/pageId=Home&Product=TI

this is fine and appears to have a page rank of 4. However, all the internal links on the site point to

www.domain.com/default.asp?pageid=home&product=ti

Note that all letters in the domain are now lower case. This Page now shows up as PR0 and shows no backlinks.

Does Google and the toolbar really see these as seperate pages or is this just a glitch in the system?

Moff

3:09 pm on Mar 12, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Mar 8, 2004
posts:311
votes: 0


Whilst domain names are case insensitive, the path part of the URL is case sensitive.

Also, strictly speaking, www.domain.com/ is a different page to www.domain.com/default.asp (or www.domain.com/index.html, or whatever). It's only due to the implementation/configuration of the webserver that /default.asp is returned for requests for /.

Jon

3:31 pm on Mar 12, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 21, 2004
posts:48
votes: 0



Does this mean to say that having links to both

www.domain.com/pageId=Home&Product=TI

and

www.domain.com/default.asp?pageid=home&product=ti

then in theory you could end up tripping duplicate content filters?

I knew domain.com/ and domain.com/index.html could be seen as different pages but was under the impression that most of the time Googlebot was smart enough to recognise these as the same page and credit it as a single page.

Moff

5:32 pm on Mar 12, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 8, 2004
posts:196
votes: 0


They are correctly identified as different files by Google because the Unix filesystem is case sensitive. According to Unix, home.html is different than Home.html. Since many sites run Apache/Unix, Google must recognize this.
7:58 pm on Mar 12, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 29, 2000
posts:12095
votes: 0


I've had to deal with a few situations similar to this. And it does go further than just capitalization of filenames.

In one case the site was changed over with regard to filenames when switching to using SSI. All the interior pages linked back to the homepage using index.shtml and there were no internal backlinks accrued to the homepage PR - or showing as backlinks. The situation was remedied by linking using an absolute URL for the homepage throughout the site.

Just recently, I've had to deal with a site where the web designer linked back to the homepage - all from the same pages on the site - 3 different ways. There were links to example.com and www.example.com and index.shtml - all within the site itself, with a lot on the same pages different ways. The site still shows a PR3 when it actually should be a high PR4 based on what the PR of the pages that inbound links are on. The same thing had to be done as in the first case, with all internal links to the homepage changed to www.example.com/ - and some people do prefer to include the forward slash.

What we see on the toolbar is for the most part out-dated and consequently irrelevant as to what the actual PR of pages should be at a particular given time, except for times when backlinks and PR have actually been freshly updated, which is when it's about as accurate as I believe we'll see it.

>>Since many sites run Apache/Unix, Google must recognize this

I don't think it's a matter of there not being the technology to handle the differences, but they are actually technically different pages and there are some who take advantage of that fact. At one time there was some site populating the local search for just about every major US city for a certain search category, as well as using expired domains mixed in to their bundle, that was serving "circle jerks" you couldn't get out of. They used exactly that technique - a radically different number of backlinks to the domain with and without the www

Question about this type of URL, because it's come up before, though in a different context:

>>www.domain.com/default.asp?pageid=home&product=ti

What shows up at Google when using allinurl: to see which of the site pages are included in the index? What bothers me about that kind of URL is wondering what happens when you can also have

www.domain.com/default.asp?pageid=home&product=tj
and
www.domain.com/default.asp?pageid=home&product=tk

I dont know how asp pages are actually generated, but does the above refer all to the same default.asp page or are there multiple ones depending on the product code? We once had a member who was having serious problems getting his site properly indexed, and his URLs looked quite similar to that, with the pagename.asp done in the same way, with different parameters following in the URL.

>>Does Google and the toolbar really see these as seperate pages or is this just a glitch in the system?

It isn't the toolbar that's seeing anything beyond what's being sent to it, and what the toolbar reflects may or may not be an actual representation. I think what we really need to take a closer look at is not so much the toolbar, but how the naming of files affects crawling and subsequent computations for scoring.