homepage Welcome to WebmasterWorld Guest from 54.225.57.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Google penalty for index.html over index.php
... looking for reasons.
internetheaven




msg:201035
 12:29 pm on Sep 27, 2004 (gmt 0)

One of my sites has been going for a while now with plenty of inbound anchor text but has not had anywhere near the results of similar work I've done with other sites. In fact, I'm stuffed away at the bottom as if I've been penalised.

The only thing so far that has stuck out in a review of the site is that if you go to:

[example.com...]
it shows
[example.com...]

BUT - the inquiry forms on my site use:

[example.com...]
[example.com...]
[example.com...]

I know there was some problems Google had with removing sites that didn't have index.html as their main page file name. Is there a problem with having index.php AND index.html in the root?

 

Mr_Roberto




msg:201036
 4:35 pm on Sep 27, 2004 (gmt 0)

Google is likely treating each of the different variations you have as a unique page, and suppressing duplicate content.

ie.
www.widgets.com
www.widgets.com/index.html
www.widgets.com/index.php
www.widgets.com/index.php?param=1
etc. etc.

I'd suggest doing a detailed review and making sure that all internal links to your home page just have the domain name without an index.html etc. When you are serving the page with query results you should consider including a robots meta tag that is noindex, so you won't get duplicate content issues.

internetheaven




msg:201037
 10:54 pm on Sep 27, 2004 (gmt 0)

Okay I guess I didn't explain that very well as that response was nowhere near the question I was asking. I'll try again:

As stated, I already use [example.com...] as the linking URL - this displays the information from the index.html file. Nowhere on the site links to either index.html or index.php

Why on earth would I NOT want Google to index the information on the dynamic pages?

[example.com...]
[example.com...]
[example.com...]

Google is not "treating" all these pages as unique - they ARE unique pages - I honestly can't see where that confusion came from. For Googlebot to index the index.php page itself without a parameter it would have to take a "wild guess" that it exists and try and visit it which I'm not aware that it does.

So, to re-iterate the question - could there be a possible filter for there being both an index.html and index.php page in the root? (This is not a question on duplication, all the pages are 100% unique apart from headers and footers).

ncw164x




msg:201038
 11:00 pm on Sep 27, 2004 (gmt 0)

Nowhere on the site links to either index.html or index.php

Are you saying that you don't have a link back to your home page from every page on your site and surely you have a site map for your site?

sabai




msg:201039
 11:02 pm on Sep 27, 2004 (gmt 0)

Nowhere on the site links to either index.html or index.php

How do you expect google to spider [example.com...] if there are no links to it?

internetheaven




msg:201040
 9:00 am on Sep 28, 2004 (gmt 0)

Are you saying that you don't have a link back to your home page from every page on your site and surely you have a site map for your site?

For pete's sake - am I talking another language here?

I just said each page links to [example.com...] which displays the index.html information. That is standard practice and has been recognised by Google for the past hundred years!

How do you expect google to spider [example.com...] if there are no links to it?

I give up ...

There ARE links to index.php?form=query1, index.php?form=query2 etc. there are NO LINKS to INDEX.PHP on its own but that has nothing to do with it anyway. This isn't a question on spidering or "my pages aren't being indexed" complaint, it's about file names.

Has anyone understood the question, please, just one person to say they understand what the question is so that I can die happy ... you don't even have to know the answer, just say you understand the question ...

netnerd




msg:201041
 9:40 am on Sep 28, 2004 (gmt 0)

internetheaven

Perhaps if you could explain what is wrong more clearly without becoming irate when people try to help you would get a better response.

It sounds to me like your server is set up to redirect to the index.html page from www.example.com through a temporary redirect or something like that. But this is only one possibilty.

Marval




msg:201042
 10:02 am on Sep 28, 2004 (gmt 0)

internetheaven - to answer your original question - no I have not seen this as a penalty or dampening effect - I have a similar setup as do a lot of directory sites and I dont see any of them having trouble with it either.

indianet




msg:201043
 10:56 am on Sep 28, 2004 (gmt 0)

As I understand your question, you have problem with your web pages like :

[example.com...]
[example.com...]
[example.com...]

Google is not taking them as different web pages. You mean google is treating them as same web page.

In my knowledge google recently made some change to query string web pages. I have seen many query string base web page earlier having page rank and good search engine position, but recently those web pages do not have page rank and search engine position.

Certainly google is changing behavior to towards query string base web pages.

Dayo_UK




msg:201044
 11:11 am on Sep 28, 2004 (gmt 0)

I think I understand Internetheavens question ;)

I guess it depends what Google thinks is your index page when it comes a crawling - if it picks up index.html from your root I cant see this being a problem - if it picks up index.php then (assuming this is an empty database page if no variable has been passed and has different links than index.html then you may not be getting the pages you want picked up and indexed?)

If the Google cache shows index.html for http*://www.example.com then perhaps there are other reasons behind the lower rankings?

I would not think (who knows for sure) that just having a index.html and a index.php in the root (or same directory) is a problem - I have some sites where I used to have application_form.html - application_form.pdf and Google definetly sees them as seperate pages.

Lord Majestic




msg:201045
 11:33 am on Sep 28, 2004 (gmt 0)

http://www.example.com/index.php?form=query1
[example.com...]
[example.com...]

Google is not taking them as different web pages. You mean google is treating them as same web page.

I am just trying to understand here - technically speaking 3 URLs above are unique URLs, and they can generate totally different outputs, which therefore makes it unjustifiable to just pick one, but not other 2.

Now if all 3 unique URLs generated similar data (like index.php or index.html), then it might be reasonable to group them together and treat them as one URL requiring end user to click "Similar Pages" link to show these. The link with higher PR will probably take the top spot.

Mr_Roberto




msg:201046
 4:19 pm on Sep 28, 2004 (gmt 0)

The only thing so far that has stuck out in a review of the site is that if you go to:

[example.com...]
it shows
[example.com...]

The way you phrased this, I took it to mean that your browser actually showed ("it shows") the filename index.html on the displayed page URL, which means that somehow there are links to or redirects to the file index.html.

If this is not what you meant then the question doesn't make much sense, because your server software will serve up the content of index.html transparently, and there is no way that Google would even know about this filename. Hence it could not confuse it with another named index.php.

lbobke




msg:201047
 7:40 pm on Sep 28, 2004 (gmt 0)

As stated, I already use [example.com...] as the linking URL - this displays the information from the index.html file. Nowhere on the site links to either index.html or index.php

Are you sure that there are no external backlinks pointing to "index.html"?

Laurenz

steveb




msg:201048
 8:29 pm on Sep 28, 2004 (gmt 0)

"Is there a problem with having index.php AND index.html in the root?"

I would assume so. Or more accurately, it is an invitation to a a problem. I thought that index.html problem was very old news now, and only effected a few sites. On the other hand, messing with Google's brain with duplicate or inconsistent or illogical index pages continues to cause problems, so if I were you I'd delete the useless index.html file right away.

internetheaven




msg:201049
 8:29 am on Sep 29, 2004 (gmt 0)

Marvel, Dayo_UK - THANKYOU! I was beginning to lose all hope of getting an answer.

The fact that other sites are using this okay and after checking the cache was infact of the index.html my mind is at rest and I can go on assuming that the problem lies elsewhere.

Perhaps if you could explain what is wrong more clearly without becoming irate when people try to help you would get a better response. It sounds to me like your server is set up to redirect to the index.html page from www.example.com through a temporary redirect or something like that. But this is only one possibilty.

I don't think I could have explained it more clearly and obviously some people managed to "work it out". You, of course, weren't one of them.

As I understand your question, you have problem with your web pages like :
[example.com...]

I have no idea how you managed to get that from my question? Can I just ask, all those who completely missed the mark on this - Are you natural english speakers or are you using translation software?

Mr_Roberto




msg:201050
 5:57 pm on Sep 29, 2004 (gmt 0)

I have no idea how you managed to get that from my question? Can I just ask, all those who completely missed the mark on this - Are you natural english speakers or are you using translation software?

In answering your original ambiguous question, we (wrongly) assumed you had a good understanding of how http servers such as Apache or IIS actually process default index.html files for domain requests (the servers just send the content, they don't expose the filename to google or anyone else).

Its tough for most people to write really clearly, so I wouldn't hold that against you. But as for your attitude - I for one won't be adding to your stress level by responding to any future questions you might have.

instinct




msg:201051
 8:43 pm on Sep 29, 2004 (gmt 0)

How DARE you all not understand internetheaven's question! What nerve!

;-)

internetheaven




msg:201052
 11:27 pm on Sep 29, 2004 (gmt 0)

In answering your original ambiguous question, we (wrongly) assumed you had a good understanding of how http servers such as Apache or IIS actually process default index.html files for domain requests (the servers just send the content, they don't expose the filename to google or anyone else).

No, what you did was make up a much more complicated version of the question in your head and try to answer that one. I stated that the browser "displayed the information from index.html" when the domain itself was typed in. I kinda figured that was simple enough but I'm Scottish so maybe we explain things differently here. Others got it so don't go questioning my understanding of servers just because you didn't get the question.

How DARE you all not understand internetheaven's question! What nerve!

Real funny. You must be a real blast at parties ... I'm allowed to get frustrated aren't I? Or should we all keep a humerous, cool exterior like you when things aren't going to plan?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved