Forum Moderators: coopster

Message Too Old, No Replies

mySQL query for domain names only

domain name query

         

ethan

5:32 pm on Jun 4, 2004 (gmt 0)

10+ Year Member



Hi there,

I'm doing a hits counter... basically tracks the hit, the referer and date.

The problem is that I want to show results not only for urls, but for domains only, so group for example all referers from google.com and add the hits they all produced.

I can't imagine a way other than populating a list of results and then parse them to build a list of domains and add the hits.

Is there a way to construct a mySQL query to take only the domain names from the urls in database addind the hits? may be I do ask too much for a query hehehe..

thanks.

mykel79

11:10 am on Jun 5, 2004 (gmt 0)

10+ Year Member



You can use regular expressions.
For example:
SELECT * FROM tablename WHERE referrer RLIKE 'http:\/\/.*google\.[^\.]*\/.*';

This would match:
[google.com...]
[google.de...]

BUT NOT
[google.example.com...]

coopster

7:46 pm on Jun 5, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld, ethan!

As, mykel79 stated, a regular expression is one option, another may be the MySQL SUBSTRING_INDEX [dev.mysql.com]function. Of course, don't forget to COUNT() [dev.mysql.com] your hits. I'm not sure how you are storing your information, but here is an example to get you started.

SELECT 
SUBSTRING_INDEX(referer, '.', -2) AS site_referer,
COUNT(*) AS hits
FROM table
GROUP BY SUBSTRING_INDEX(referer, '.', -2)
;

ethan

10:30 am on Jun 6, 2004 (gmt 0)

10+ Year Member



thanks coopster and mykel!
Both things are very suitable to get the refering hits from one domain, but I think I have not been enough clear. I'll try now to be more.

I have the following table

id, page_id, unique_hits, raw_hits, referer, hits_date

The script is called by an img with the pageid and referer as params and it adds the refering page to the database.

What I want to do is to be able to show the stats in two ways, one the normal with hits stats and refering url, and the other with just hits and domain names.

So that way I do show how many hits the page has received from each domain, with a sum of the hits of all pages in a domain.

That's the thing I do not see clear if possible, because the query would need to return a list of domains and hits... thinking now, this is way too much for a query...

The sollution I do think now is to fisrt do a normal query, and then get a list of domains from the url of results and then do a new query for each domain name to get the amount of hits for that domain. But this is perhaps a lot of queries... say 20 results per page = 1 + 20 queries.

Any other ideas about doing this more efficiently?

Thanks all!

mykel79

9:52 pm on Jun 6, 2004 (gmt 0)

10+ Year Member



coopster's query does what you (I think) want.
It will show a list of domains and how many referrals there were from each one. If there is more than one hit from a domain, it will show the domain only once and the number of hits along with it.

ethan

10:36 pm on Jun 7, 2004 (gmt 0)

10+ Year Member



I have tested the query... is an approach, but I don't get a solid list of domain names... lots of results are just 'com', 'co.jp',...

because some referersr are www.site.com and others just site.com etc...

but thanks, still trying ;)

coopster

6:24 pm on Jun 8, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Yeah, that's what I figured. You'll have to play around with regular expressions, substrings, combinations of the two, or other string functions until you get what you want.

john_k

6:31 pm on Jun 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might have better luck (or consistancy anyway) by creating a domain name* column and then splitting them apart when you write to the database. You can add them together to get your original URL.

* Your original question asked for a way to split out the domain name. But another post sounds like you want the server or host name. So, do you want to differentiate between ww1.example.com and ww2.example.com, or do you want hits from both referrers lumped together?

jatar_k

6:52 pm on Jun 8, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



for most of the tracking I do I break it down a little more.

I keep the full unchanged referer in it's own col then usually keep domain or ay other data I may want to analyze seperate as well.

I usually do it in multiple steps.

I start by storing raw referer data in a file or initial db.
I usually move it to permanent storage from there doing some frontline analysis that will speed up queries.

I do put some duplicate data in there but it is only for speed of queries. There will eventually be tons of data so I like to split it up a bit.

That leaves the front end tracking able to just dump the data and keep rolling for front end speed. The processing is usually done by cron which can take its time and do the number crunching and comparison. Then the reports can be run from the pre analyzed data.

You can break up the full referer into as many fields as you need maybe even into domain, tld, string or whatever else you need and store them for ease of reporting.

DevlshOne

7:38 pm on Jun 8, 2004 (gmt 0)

10+ Year Member



I would say you'd need to parse up the referer name before it gets inserted into the table. You'll have to use regular expressions to determine if you are getting back 'www.google.com' or 'google.com' or even 'translator.japan.google.com'. Explode the referer by ".", count the number of items in the resulting array and work with it from there. You may also need to build a string comparison function to be sure you get a valid domain type 'com','org','us', etc,.

Good luck!

ethan

8:52 am on Jun 13, 2004 (gmt 0)

10+ Year Member



This is a fact I was not thinking about...

Sometime name1.site.com/... is a totally different sites from name2.site.com/... whenever the domain is the same... This situation occurs in some free hosts.

Perhaps I would have a query to extract just domain name and another for the host name. That way hits comming from name1.site.com/page1.htm and name1.site.com/page2.htm will get counted into name1.site.com also have the other query for just the domain name.

Here is a nice query for the domain name I found searching but not tested yet...

[experts-exchange.com...]

Also, right, I should at least remove the http:// part of the referers, as all are to be same (but bookmark and weird referer headers from software)

thanks for all suggestions!