homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Google and PHP
Works PHP in Google
Kewe




msg:94066
 7:06 am on Sep 3, 2004 (gmt 0)

Hi,

I was reading at the forum and found this link:
[google.com...]

And I saw the last sentence:

At Google, we are able to index most types of pages and files with very few exceptions. File types we are able to index include: pdf, asp, jsp, hdml, shtml, xml, cfm, doc, xls, ppt, rtf, wks, lwp, wri, swf.

My new site has 100 pages php. And .php is not include in this text. Could somebody tell me more about this?

I never have used php before.

Kewe.

 

tantalus




msg:94067
 11:21 am on Sep 3, 2004 (gmt 0)

That's interesting that PHP is omitted.

I don't use PHP however Google does index PHP pages, if they suddenly stopped this is the first place we'd know about.

Beware PHP sessions google and other bots cannot index these.

Receptional Andy




msg:94068
 11:25 am on Sep 3, 2004 (gmt 0)

Google are only listing the more unusual filetypes in that list - they don't include html, htm, asp or php for instance, but they certainly support all of those ;)

charlier




msg:94069
 11:48 am on Sep 3, 2004 (gmt 0)

Php, asp etc are all considered to be html file types as that is the type of output which they send to the browser (or spider) ie a mime type of text/html.

Hugene




msg:94070
 2:41 pm on Sep 3, 2004 (gmt 0)

My site is all php and gets indexed no prob.

I remeber over a year ago, reading on Google that their crawler might have trouble with urls containing? (as many php files do in order to pass variables). But my pages with? get indexed (not all though)

I just looked on Google ofr a few minutes, couldn't find anything about the? Maybe they fixed that.

trillianjedi




msg:94071
 2:48 pm on Sep 3, 2004 (gmt 0)

Receptional Andy points out:-

they don't include html, htm

So I think his conclusion that the list only contains the more unusual flavours of file extension is pretty accurate.....

TJ

yangtao72




msg:94072
 2:57 pm on Sep 3, 2004 (gmt 0)

I'm interested in if google like 'aspx', a lots of sites would switch to .NET and many .NET program generate <form> to client even it's not really a typical FORM but just some .NET server site controls.
I don't think PHP has problem. You know there's a eCompare site, they use PHP&MYSQL build their site and they have many affiliates sites as partners. Beside the top of these pages are different , all content are same. so in addition to pages from www.thisecommercedomain.com were indexed by google , tons of pages from affiliatessitesdomain.thisecommercedomain.com were also indexed by google. So when you search related keyword on Google, WAO....., excellent rank of course. I traced this site's pages indexed by google for years, the number increase to 1.5 million from 0.4 million just these 6 months. So you know, Google is very general to some sites but very? to others ....
It's out of '.PHP' issue!

Airportibo




msg:94073
 3:34 pm on Sep 3, 2004 (gmt 0)

Google defenitely doesn't have a problem with php. But to be on the very, very safe side, you could just tell your webserver to use rewrite rules. So every .php can also be accessed as .html

Receptional Andy




msg:94074
 3:46 pm on Sep 3, 2004 (gmt 0)

I'm interested in if google like 'aspx'

An easy way to check is just to see how many urls Google has containing whichever file extension you're interested in.

[google.com...]

30,000,000 results with 'aspx' - I would guess that it isn't a problem ;)

Kewe




msg:94075
 7:53 am on Sep 4, 2004 (gmt 0)

Thanks for all the reactions on my Topic.
I'm new on this forum but in the last few days I've learned a lot.

I think I rewrite rules. So every .php can also be accessed as .html.
I want to test it also, just incase.

Kewe.

Airportibo




msg:94076
 9:59 pm on Sep 4, 2004 (gmt 0)

...and when you're already implementing rewrite rules, just get rid of the cgi-parameters as well (if you're using any). It'll be worth it :-)

cayleyv




msg:94077
 12:30 am on Sep 5, 2004 (gmt 0)

I promise you Google can index .php - interesting, they seem to believe its so common, its not in their extension list (.html and variants are missing)

they also dont list .py Python scripting which some of their own site is written in. They can index their own site.

vincevincevince




msg:94078
 1:30 pm on Sep 5, 2004 (gmt 0)

As google guy said quite a while ago (lost the link) avoid arguments with ID in the name as these are taken to indicate sessions. Otherwise you'll be fine.

i.e.

OUT:
test.foo/?PHP_SESSID=a56df1s
test.foo/?pageid=345

IN:
test.foo/?whoami=a56df1s
test.foo/?whichpage=345

Also, from my own observations:

Avoid?search=, give each page a unique <title> or <h1> at minimum.

ciml




msg:94079
 1:55 pm on Sep 5, 2004 (gmt 0)

A while ago Google changed behaviour; it will now include unknown extensions. So as well as new-ish common extensions such as .aspx, you can have .foo, .bar or whatever.

Also, you can have something/index.php, index.shtml, index.asp, etc, which can be different from something/

You can have something/Index.html but not something/index.html, something/index.HTML but not something/index.htm

rainborick




msg:94080
 2:52 pm on Sep 5, 2004 (gmt 0)

As long as the page is sent back to Google with the header information "Content-Type: text/html", Google will treat the page as HTML and index it normally. .php, .asp, .cgi, .jsp, etc. all typically generate this header so that browsers (and search engines) will accept the output regardless of the filename extension.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved