homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
ia archiver
keyplyr




msg:4471056
 7:59 pm on Jun 29, 2012 (gmt 0)


I'm seeing increasing hits from Chinese and even Japanese IP ranges.

Anyone have a list of valid IP ranges for ia_archiver?

 

incrediBILL




msg:4471071
 8:40 pm on Jun 29, 2012 (gmt 0)

You seriously let that thing access your site?

Letting anything archive your content should be avoided, it has more pitfalls than benefits.

I'd base valid on the first two user agents below and if it comes from AWS or archive.org IP ranges.

The good:

USER AGENT: "ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)"
IP: 174.129.228.67
IP: 204.236.235.245

USER AGENT: "ia_archiver(OS-Wayback)"
IP: 207.241.224.41
IP: 207.241.224.43
IP: 207.241.226.116
IP: 207.241.226.153
IP: 207.241.226.66
IP: 207.241.226.67
IP: 207.241.227.244

The bad and the ugly:

USER AGENT: "ia_archiver"
IP: 49.72.162.197
IP: 49.72.212.77
IP: 49.72.213.2
IP: 49.72.213.21
IP: 49.73.156.140
IP: 49.73.33.128
IP: 49.75.198.15
IP: 58.208.113.29
IP: 58.208.115.131
IP: 58.208.176.125
IP: 58.208.176.166
IP: 58.208.177.55
IP: 58.208.240.228
IP: 58.208.241.138
IP: 58.209.122.124
IP: 58.209.123.35
IP: 58.209.124.111
IP: 58.209.124.7
IP: 58.209.124.77
IP: 58.209.160.165
IP: 58.209.163.52
IP: 58.209.179.152
IP: 58.209.179.166
IP: 58.209.179.215
IP: 58.209.179.218
IP: 58.209.18.134
IP: 58.209.248.70
IP: 58.209.250.79
IP: 58.209.250.88
IP: 58.209.253.116
IP: 58.209.254.119
IP: 58.209.255.39
IP: 58.209.52.236
IP: 58.209.55.118
IP: 58.209.55.51
IP: 61.183.41.36
IP: 63.141.228.126
IP: 114.113.228.107
IP: 114.216.98.102
IP: 114.218.226.235
IP: 114.218.226.99
IP: 114.218.227.196
IP: 114.218.227.96
IP: 114.218.238.43
IP: 114.218.249.216
IP: 114.219.164.143
IP: 114.219.165.221
IP: 117.135.160.14
IP: 117.80.204.202
IP: 117.80.205.212
IP: 117.80.206.191
IP: 117.80.207.1
IP: 117.80.65.230
IP: 117.81.10.18
IP: 117.81.11.16
IP: 117.81.120.175
IP: 117.81.123.27
IP: 117.81.231.166
IP: 117.81.232.103
IP: 117.81.232.236
IP: 117.81.233.137
IP: 117.81.233.193
IP: 117.81.233.60
IP: 117.81.6.252
IP: 117.81.7.119
IP: 117.81.9.32
IP: 117.83.200.170
IP: 117.83.201.140
IP: 117.83.38.251
IP: 117.83.39.23
IP: 117.83.39.65
IP: 117.83.67.224
IP: 117.83.68.65
IP: 117.83.96.198
IP: 117.83.97.113
IP: 119.148.160.84
IP: 119.148.161.64
IP: 121.228.0.43
IP: 121.228.1.214
IP: 121.228.15.78
IP: 121.228.152.135
IP: 121.228.154.82
IP: 121.228.156.87
IP: 121.228.157.102
IP: 121.228.157.45
IP: 121.228.158.100
IP: 121.228.158.184
IP: 121.228.158.25
IP: 121.228.2.136
IP: 121.228.3.217
IP: 121.228.4.204
IP: 121.228.5.197
IP: 121.228.9.243
IP: 121.236.150.143
IP: 121.236.80.170
IP: 126.114.226.88
IP: 180.106.152.145
IP: 180.106.153.15
IP: 180.106.153.193
IP: 180.107.122.24
IP: 180.107.123.71
IP: 180.107.40.195
IP: 180.117.249.18
IP: 180.117.250.201
IP: 184.171.169.117
IP: 202.165.179.32
IP: 203.156.231.178
IP: 205.164.48.130
IP: 208.94.240.36
IP: 211.143.200.88
IP: 211.144.76.77
IP: 211.95.79.156
IP: 218.16.124.252
IP: 218.65.30.73
IP: 218.75.152.226
IP: 218.75.152.235
IP: 219.235.3.162
IP: 220.181.158.160
IP: 221.225.39.180
IP: 221.225.69.13
IP: 221.225.70.165
IP: 222.73.173.221
IP: 222.93.126.101
IP: 222.93.126.157
IP: 222.93.127.50
IP: 222.93.127.58
IP: 222.93.127.73
IP: 222.93.170.20
IP: 222.93.226.94
IP: 222.93.227.1
IP: 222.93.227.48

lucy24




msg:4471085
 9:22 pm on Jun 29, 2012 (gmt 0)

Astounding but true: If you e-mail them and ask, they will tell you if it's theirs. I can count on the fingers of one hand the number of times I've got an answer to a "Please identify your robot" query.

wilderness




msg:4471104
 10:44 pm on Jun 29, 2012 (gmt 0)

Letting anything archive your content should be avoided, it has more pitfalls than benefits.


lets not be so vague here ;)

IMO that archival also applies to major SE's and "cache.

keyplyr




msg:4471130
 12:58 am on Jun 30, 2012 (gmt 0)

No Bill, to answer your question; nothing archives my content. However ia_archiver is also the bot Alexa uses to check your site and keep it in its index, which does have a positive affect when my advertisers check ranking prior to deciding whether to purchase one of my ad campaigns. Thanks for the IP list.

incrediBILL




msg:4471143
 2:35 am on Jun 30, 2012 (gmt 0)

You're very welcome.

I have never permitted ia_archiver on my site yet have always showed up in Alexa.

I also sell advertising, never had anyone ask about Alexa, ever :)

YMMV

keyplyr




msg:4471145
 2:46 am on Jun 30, 2012 (gmt 0)

Sorry to say, Alexa data displays on too many domain/traffic info sites, sometimes not identified as Alexa. I've only received feedback twice about Alexa stats during biz proposals, but I assume many more potential customers consider those numbers. Happily, I think the Alexa phenom has steadily decreased in importance over the last few years, replaced by the even more perplexing phenom of social media. Regardless, my strategy of covering all my bases remains my MO.

motorhaven




msg:4471606
 12:28 am on Jul 2, 2012 (gmt 0)

Alexa it to ranking what "MIPS" is to CPU benchmarks: meaningless indicator of performance.

keyplyr




msg:4471650
 6:53 am on Jul 2, 2012 (gmt 0)


@ motorhaven - agreed, but countless users don't know that. I never get the chance to enlighten many potential clients, they just make their money decisions with the data at hand, valid or not.

motorhaven




msg:4471763
 2:48 pm on Jul 2, 2012 (gmt 0)

I agree with that. I recently had someone approach me about doing a major overhaul on his site and he kept bragging about his Alexa ranking. When I gave him more accurate information, he was disappointed.

I'd put Alexa up for potential advertisers, but my experience is that blocking my sites from Alexa outweighs the potential advertising benefits.

Obviously this varies for all of us depending on niche, so it comes down to a business decisions pro/con. Quite similar to Websense - sites which rely heavily on corporate traffic may not want to block it while others may find it beneficial.

wilderness




msg:4471770
 3:16 pm on Jul 2, 2012 (gmt 0)

Obviously this varies for all of us depending on niche, so it comes down to a business decisions pro/con. Quite similar to Websense - sites which rely heavily on corporate traffic may not want to block it while others may find it beneficial.


Given today's market trends and the variety of devices (PC's, cell-phones, Handhelds), I'm seeing legitimate users simply switching devices.

Course that's not economical for the mass harvester, nor are the bandwidth restrictions of most mobile devices.

incrediBILL




msg:4475509
 9:20 pm on Jul 13, 2012 (gmt 0)

This obnoxious bot using the China IPs is now asking for the index page of one site almost 100 times a day.

Why do they keep coming back with such frequency?

Insanity.

keyplyr




msg:4475514
 9:38 pm on Jul 13, 2012 (gmt 0)


Today started trial period blocking all variations of ia_archiver, Alexa, etc.

I'll be monitoring the numbers with advert requests in relationship to any changes in Alexa ranking. Will post anything significant in a few weeks, thanks.

dstiles




msg:4475699
 10:07 pm on Jul 14, 2012 (gmt 0)

Real ia_archiver DOES respect robots.txt - haven't seen the real one in years.

motorhaven




msg:4475746
 1:35 am on Jul 15, 2012 (gmt 0)

It shouldn't impact Alexa rankings. Alexa builds ranking based on hits from browsers with the plug-in installed. I blocked it for years on a site in the top 10,000, and several other sites with no negative change in Alexa rank (and wouldn't care if it did!).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved