DaveN

msg:257736 | 1:20 pm on Nov 22, 2005 (gmt 0) |
yes .. but i think you will need a good inhouse programmer
|
findtheneedle

msg:257737 | 3:01 pm on Nov 22, 2005 (gmt 0) |
Ok is there a program that you could recommend?
|
DaveN

msg:257738 | 3:08 pm on Nov 22, 2005 (gmt 0) |
you need to scape and rape the serps, you will need a program written for you. DaveN
|
inbound

msg:257739 | 3:15 pm on Nov 22, 2005 (gmt 0) |
Getting more than 1000 results on Google.co.uk: Do searches for UK sites but limit results to .co.uk e.g. widget site:co.uk Then repeat for .com .org.uk .net etc. If you need more than that then try also using a date operator to restrict results to a set period and then get rid of the duplicates. If you need more than that you could try doing something similar with Yahoo and MSN and combining the results OR use common terms with positive and negative operators to split results e.g widget site:co.uk -commomterm widget site:co.uk commomterm gives 2000 results, be careful to use terms that are likely to appear around an even amount of times otherwise you will skew your result set. Other ways include: restricting the file format to return Word or PDF etc Pay Gigablast for a commercial feed and get 10,000 at a time from their data :) Use common first or surnames as positive qualifiers - e.g. widget john site:co.uk Use large town names as positive qualifiers (this will also allow you to filter out directory sites as they will have many appearances in the goegraphical lists. IF you still need more then I think that buying Google would be the next step, certainly easier than combining all of that ;)
|
inbound

msg:257740 | 3:17 pm on Nov 22, 2005 (gmt 0) |
[google.com...] is the home of the legitimate way to query Google en masse, however it's severely limited by the number of daily queries so you may need to scrape as suggested.
|
DaveN

msg:257741 | 3:21 pm on Nov 22, 2005 (gmt 0) |
inbound did you find that maxresult will only return 10 even if set to 100?/ DaveN
|
inbound

msg:257742 | 6:35 pm on Nov 22, 2005 (gmt 0) |
Yes, the Google Search API is a pain with it's throttling, I just thought I had to include it to give the legitimate route to get the results. It's not so much of an issue if you have a program set up to churn away but it severley restricts the applications you can build for realtime queries, which is kind of daft.
|
engine

msg:257743 | 7:22 pm on Nov 22, 2005 (gmt 0) |
hmmm, I think you're looking at it from the wrong angle. With an API key it allows people to run their apps without overloading Google's system. Google could remove the facility and then where'd we be.
|
DaveN

msg:257744 | 7:36 pm on Nov 22, 2005 (gmt 0) |
nah .. what we mean Engine is that you have a maxresults setting like NUM=100 in normal google .. but maxresults is 10 or 9,8,7,6,5,4,3,2,1 but no more than 10.. so it's poor .. even multi threading you use up 10 requests to get 100 results back .. when you only have 1000 request that soon adds up to .. api key query depletion DaveN
|
|