|Google and PHP|
Works PHP in Google
I was reading at the forum and found this link:
And I saw the last sentence:
At Google, we are able to index most types of pages and files with very few exceptions. File types we are able to index include: pdf, asp, jsp, hdml, shtml, xml, cfm, doc, xls, ppt, rtf, wks, lwp, wri, swf.
My new site has 100 pages php. And .php is not include in this text. Could somebody tell me more about this?
I never have used php before.
That's interesting that PHP is omitted.
I don't use PHP however Google does index PHP pages, if they suddenly stopped this is the first place we'd know about.
Beware PHP sessions google and other bots cannot index these.
Google are only listing the more unusual filetypes in that list - they don't include html, htm, asp or php for instance, but they certainly support all of those ;)
Php, asp etc are all considered to be html file types as that is the type of output which they send to the browser (or spider) ie a mime type of text/html.
My site is all php and gets indexed no prob.
I remeber over a year ago, reading on Google that their crawler might have trouble with urls containing? (as many php files do in order to pass variables). But my pages with? get indexed (not all though)
I just looked on Google ofr a few minutes, couldn't find anything about the? Maybe they fixed that.
Receptional Andy points out:-
they don't include html, htm
So I think his conclusion that the list only contains the more unusual flavours of file extension is pretty accurate.....
I'm interested in if google like 'aspx', a lots of sites would switch to .NET and many .NET program generate <form> to client even it's not really a typical FORM but just some .NET server site controls.
I don't think PHP has problem. You know there's a eCompare site, they use PHP&MYSQL build their site and they have many affiliates sites as partners. Beside the top of these pages are different , all content are same. so in addition to pages from www.thisecommercedomain.com were indexed by google , tons of pages from affiliatessitesdomain.thisecommercedomain.com were also indexed by google. So when you search related keyword on Google, WAO....., excellent rank of course. I traced this site's pages indexed by google for years, the number increase to 1.5 million from 0.4 million just these 6 months. So you know, Google is very general to some sites but very? to others ....
It's out of '.PHP' issue!
Google defenitely doesn't have a problem with php. But to be on the very, very safe side, you could just tell your webserver to use rewrite rules. So every .php can also be accessed as .html
|I'm interested in if google like 'aspx' |
An easy way to check is just to see how many urls Google has containing whichever file extension you're interested in.
30,000,000 results with 'aspx' - I would guess that it isn't a problem ;)
Thanks for all the reactions on my Topic.
I'm new on this forum but in the last few days I've learned a lot.
I think I rewrite rules. So every .php can also be accessed as .html.
I want to test it also, just incase.
...and when you're already implementing rewrite rules, just get rid of the cgi-parameters as well (if you're using any). It'll be worth it :-)
I promise you Google can index .php - interesting, they seem to believe its so common, its not in their extension list (.html and variants are missing)
they also dont list .py Python scripting which some of their own site is written in. They can index their own site.
As google guy said quite a while ago (lost the link) avoid arguments with ID in the name as these are taken to indicate sessions. Otherwise you'll be fine.
Also, from my own observations:
Avoid?search=, give each page a unique <title> or <h1> at minimum.
A while ago Google changed behaviour; it will now include unknown extensions. So as well as new-ish common extensions such as .aspx, you can have .foo, .bar or whatever.
Also, you can have something/index.php, index.shtml, index.asp, etc, which can be different from something/
You can have something/Index.html but not something/index.html, something/index.HTML but not something/index.htm
As long as the page is sent back to Google with the header information "Content-Type: text/html", Google will treat the page as HTML and index it normally. .php, .asp, .cgi, .jsp, etc. all typically generate this header so that browsers (and search engines) will accept the output regardless of the filename extension.