Forum Moderators: coopster
I've been doing this sort of analysis manually but the workload has increased such that it is nolonger feasible. I want to create or obtain a script that does the same analysis for me and formats it into a nice report. Does anyone know of/have a ready made script for such reporting or am I going to have to create this?
Not really DrDoc. It will supply some of what I'm looking for but it won't evaluate the majority of what I need evaluated (which I neglected to mention) like whether files have DOCTYPE declarations, HTML or XHTML, using tables or CSS-2P, etc... Not to mention it returns a bunch of stuff that the client will have no need to know.
BUT! It is a start and I think it does create an array with all of that information in it which (if so) I could read and take what I need from.
DOCTYPE declarations, HTML or XHTML, using tables or CSS-2P
Wow, that's a tall order and obviously phpinfo isn't much help there.
Nothing constructive to add, but I'm curious about a number of things.
How do you propose to decide whether or not it's HTML or XHTML? In other words, are you basing this on
- mime type
- doctype - so you trust that if it says it's XHTML, it actually is, regardless of mime type or underlying code
- code itself - in other words, writing a validator.
Similarly, how do you distinguish between
1) a site that has CSS for all layout, but still makes extensive use of tables for tabular data?
2) a site that uses some CSS-P, but also depends heavily on tables for layout?
I don't think you could do that automatically without some highly sophisticated AI.
Tom
If no DOCTYPE has been established then that'll be flagged and HTML (unknown version and flavor) will be assumed.
I envision using cURL to open and read the file as delivered by the webserver. I should be able to gather the file type as well. If they're doing mod-rewrite and masking scripted pages as html then obviously I won't be able to pick that up. But if it's a .php or .asp file then I can safely assume the script server they're using.
As for the CSS, it's simply a matter of looking for the various ways of including/linking an external css file and looking for inline css.
Distinguishing CSS-2P versus tables is easy. Look for tables and look for the CSS elements that are used for positioning.
I'm really looking to get a flavor of the page's structural makeup and whether or not the previous web developer had a clue as to what they were doing.
As for the PHP information - I'll have to look into it but I believe I phpinfo returns an array that is used to build that information page. I can call that, suppress the output, and sift through the array for what I want. I can also use the script to query their server for DNS info and the other details I want to know.
Add on the number of images, page weight, percentage of code to visible text, whether or not there are <h1> (h2,h3,etc.) elements, what the title, meta kw, and meta desc are, plus a host of other tidbits and you can see what I'm after.