Forum Moderators: bakedjake
Specifically i want the search engine to tell me which website have a reference like this:
< script langauge=javascript src="http:\\www.mycompetitor.com\go.js" >
Its a way to know how many websites are using your COMPETITORS webservice :-)
Anyway, search google for [code search] and you'll find a few code search engines. However, these are usually not searching through source code on common web pages, specifically Google's own "codesearch" is not.
It gives you programmatic access to loots of the data Alexa have crawled.
You can use that and write a program that goes through each page and look for a pattern.
Don’t know if they give you access to the raw html, that you will need, or only allow you to access preparsed text.
[docs.amazonwebservices.com...]
they have a "links" , but by the example they gave it "smells" like the normal anchor link. so no concrete reason to put the effort in testing it. (also , in alexa , link-search did not give what i wanted)
Thanks, Any more ideas, anyone?
Your program could then search each page for your pattern, and save the url for each page that has it.
For example this sample
[alexa.com...] uses AWS and Ruby to access image headers to make them searchable.
More samples her: [alexa.com...]
What you can do is write your own such program to create such a service.
The basic approach would be to run the page in a dummy browser, up to the point that OnLoad has been executed, then parse the document object model (DOM). This would index the page as it displays in a Javascript-enabled browser. Useful for finding Javascript ad links, even if obfuscated.
This has been suggested by others (see "http://tadhg.com/wp/2007/01/30/walking-the-html-dom-without-a-browser/")
but nobody seems to have done this yet?
Say you are looking for a page that uses a certain javascript free script, in order to get a clue about how to implement it. With such a search engine you could input some part of the fress script and immediately find a page where that bit was in the source.
Other examples: get an estimate of the number of pages that runs a specific script or a specific ad-programme, or use "nofollow" on links, or have the word "sex" in meta descriptions, or had an inline style class called "joe".
Really, there is an enourmous amount of useful stuff you could do with a source search engine.