Forum Moderators: open

Message Too Old, No Replies

Need some advices and clarifications

         

macas

6:50 pm on May 31, 2011 (gmt 0)

10+ Year Member



Hi everyone on this part of forum.

First of all , I want to thank you so much for valuable information and good posts over here .

I need some help and advices how to find or define UA names when I find them in my log files ?

Most of times its something like this "yty5qsectfnjfu94" or wired names whcih I never saw or heard anywhere.

Is anybody familiar with these UA's:
TzoGeoAgent 1.3.g
John's Background Switcher 4.4
Photo NEWSII
Utherverse Client ( What this means ? )
Sogou web spider/4.0
Hot%20Wallpapers/9.03.1 CFNetwork/485.13.8 Darwin/11.0.0


For example I notice that TzoGeoAgent 1.3.g hitting my robots.txt like insane, while John's Background Switcher 4.4 trying something to download ?

For clarification example -
What is name here of UA :
Hot%20Wallpapers/9.03.1 CFNetwork/485.13.8 Darwin/11.0.0

? ?

I'm almost sure that all of them are hijackers / web scrapers.

lucy24

10:34 pm on May 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A couple of them look like potential hotlinkers to me. Many (but not all) UAs with "Darwin" in the name are homemade robots. Sogou is a Chinese spider.

I think most people would say it isn't worth spending too much time investigating, because UAs can be forged. (So can referrers. So can IPs. Happily I've yet to have a visit from a spider that forged all three.)

Putting my money where my mouth isn't, I looked up Utherverse. They're a "social network for adults only" which suggests they were looking for certain keywords on your site. Only you can say whether they found them ;)

macas

3:32 pm on Jun 1, 2011 (gmt 0)

10+ Year Member



Thank you , lucy24 :)

btw I found one more problem in my log files -
It's seems that vaild Googlebot-Image/1.0 hitting "hotlink" image insted real one on my host/website , I dont understand why.

I saw in this thread with a similar problem : [webmasterworld.com ]

Should I whitelist Googlebot to resolve this glitch ?

macas

8:35 pm on Jun 1, 2011 (gmt 0)

10+ Year Member



*update last post
In most cases I see this UA's blocked /transfer to 404 page :

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

OR

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6.6; InfoPath.2; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.04506.30; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; AskTB5.6)

OR

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

OR

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; BRI/2; InfoPath.2)

Visits came from Google Sreach . Can somebody explain me is this okay or is something wrong with my .htaccess file ?

Thanks

lucy24

4:50 am on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: looking vaguely around for someone who speaks Apache ::

(Moderators? Would this question get a better range of answers if it were kicked over to the Apache forum?)

In any case, I think we're going to need a little more information. Without seeing your htaccess file, it's hard to say if there's something wrong with it. There shouldn't be any relationship between UA (or IP, or referrer, or any other variable) and a 404, unless you're pointing them to a page that isn't where you thought it was. If your intention is to block them, they should be getting a 403.

What does your current hotlinking routine look like? I assume you're doing something other than a simple rewrite, or the Googlebot's real destination would never show up in your logs. (At least it doesn't in mine. Apache 2.something.)

Are you seeing
#1 googlebot-image goes to intended page, immediately followed by redirect to hotlink page
or
#2 googlebot-image goes only to hotlink page
or
#3 googlebot-image goes randomly to different pages including free-standing visits to hotlink page

Anyway, you should be able to identify it by a combination of UA (simply containing the element "Googlebot-Image") and IP. Googlebot doesn't use a whole lot of different IP ranges.

Can't use my own htaccess hotlinking routine as an example, because I just do it by referrer, and I allow null referrers. That automatically admits all robots. Besides ::cough, cough:: there's something wrong with my own htaccess file too, because blocked robots always get blocked twice: first from the file they were aiming for, and then from the custom 403 page. Maybe I can blame it on my host and just ignore it.

After-the-fact edit: Oops. Remembered wrong. The custom 403 page doesn't show up in the actual logs as a second 403; it's only in the error log, which doesn't give numbers. So I don't know exactly what's happening at what level. Just "client denied by server configuration", same as when they're blocked.

wilderness

12:44 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In any case, I think we're going to need a little more information. Without seeing your htaccess file, it's hard to say if there's something wrong with it.


FYI!
Portions of macas htaccess (copy and pasted) are displayed in two previous threads (inquires) in the same Apache forum you suggested that this correctly placed SSID inquiry be moved to.

All you need to do is view macas profile and look at "Recent Messages" to locate these threads.

One has not even been replied to (it has been (at least in most instances) an unwritten policy of the Apache and SSID forums to avoid these large posted htaccess files like hot-potatoes. In fact in the Apache forum there is even an explanation discouraging these types of posts (large htaccess file) in the forum charter.)

Don