Forum Moderators: bakedjake

Message Too Old, No Replies

Search function inhtml:

How to search web within tags or script?

         

fclark

7:25 pm on Nov 3, 2004 (gmt 0)

10+ Year Member



I would like to have search engine return results based on the content of the page markup.

It does not seem possible on the major search engines, but maybe some of you who are familiar with the more specialized search engines could point me in the right direction.

I'd rather not have to send a bot to do this.

bakedjake

7:28 pm on Nov 3, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



fclark, curious as to why you'd like something like this. Wouldn't you just be searching standardized tags?

It's actually relatively simple to do with any of the widely available python/perl modules for parsing HTML.

fclark

7:55 pm on Nov 3, 2004 (gmt 0)

10+ Year Member



It has to do with an adwords content optimization strategy I would like to test.

Search the web for pages that run adsense and are relevant to a campaign. Then optimize adgroups for particular pages.

Currently, search results return the "relevant" pages, but not able to filter on sites that include certain javascript terms that would flag the existence of adsense on the page.

Parsing each page in the serps seems a bit redundant if somebody has already done it.

claus

8:08 pm on Nov 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yeah, i have often missed this too, plus the ability to search in comment tags, javascript, and so on.

phaze

7:42 am on Nov 4, 2004 (gmt 0)

10+ Year Member



I agree with bakedjake. You're going to be searching a very limited dictionary. All HTML tags + all javascript methods and properties = probably a few thousand words. To put it in perspective, the stanford version of Google had an initial dictionary of 14 million words.

Something other criteria is needed for ranking. And in this scenario, backlinks, size and position of words and frequency of occurence are irrelevant. What are you looking for? Lets invent something for ya!

mm.

fclark

2:38 pm on Nov 4, 2004 (gmt 0)

10+ Year Member



hi phaze,
This is a filter on those normal results to return pages meeting certain markup criteria. Similar is already done in a more limited fachion with "inurl:" which says, "give me the ranked results AND filter on blet in the url." Well, basically, I'd like to expand inurl to include some filter phrase in other parts of the markup, so a query might look like this:

inhtml:blet widgets

where widgets is term used to return ranked results and blet is the additional markup filter.

Any of the alternative search engine folks reading might be interested to easily create a niche for themselves.

claus

8:05 pm on Nov 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> You're going to be searching a very limited dictionary

Not really. Say:

- i want to search for a div with an ID of "my-string" to see where a specific html-snippet was copied.

- i want to search for a javascript function called myPatethicSolution() to see if other are using it and what modifications they've made.

- i want to search for a specific Dublin Core meta tag with specific attributes

- i want to search for CMS-specific comment tags embedded in pages to see how widespread the use of a specific CMS and/or CMS-plugin is

- i want to search for a specific hidden form field to see who's calling my script outside of my pages

- i want to search for a specific aff code, tracking script, counter, js-newsfeed, css-technique, popup ad, etc, etc.

... and i have plenty of other really useful examples, that the current SE's just can't handle. If only we could get an option to search in raw html code (and JS/CSS/.inc/whatever files) a lot of options that are currently totally impossible would suddently be available.

Yes, this is pure geek stuff, i admit, so i don't mind if it's under "advanced search" or something, as long as it's there.

fclark

8:17 pm on Nov 4, 2004 (gmt 0)

10+ Year Member



amen, claus.

phaze

11:12 pm on Nov 4, 2004 (gmt 0)

10+ Year Member



ok, totally agree - and it sounds like a really interesting project.

phaze

3:43 am on Nov 5, 2004 (gmt 0)

10+ Year Member



Was just reading someone in the paid forum chatting about a script designed to deceive search engines (gasp!). I'll bet google has an internal index they search for fraudulent HTML/CSS/Javascript. If they put a UI on that, you'd have your search engine.

claus

11:35 pm on Nov 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



lol phaze, of course they would be able to do such stuff internally, i've never questioned that part - i just want it on the outside as well ;)