Forum Moderators: phranque

Message Too Old, No Replies

Blog Scraping My Site

How to stop it?

         

amythepoet

11:41 pm on Oct 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's say I'm signed to receive some google alerts for widgets, well, the other day lets say I got one and it's from a strange google blog website that has some of my text on it,

what would you do?

bill

7:57 am on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I usually ignore them unless they are using a reputable hosting firm or well known service that has a simple and clearcut reporting mechanism in place.

amythepoet

10:51 pm on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ok thanks

my web host is aware of it too and I do not see any webhost on their blog

trillianjedi

8:49 am on Nov 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some great links to information from DigitalGhost:-

[webmasterworld.com...]

TJ

jtara

4:50 pm on Nov 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I do not see any webhost on their blog

It's useful to learn how to use whois, traceroute, nslookup/dig, etc. These are Linux tools that will help you find out who hosts a website. If you do not have access to a Linux command prompt, there are equivalent online and Windows tools available.

It's usually pretty easy to figure out who hosts a given site.

amythepoet

7:40 pm on Nov 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you

amythepoet

12:53 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I did a "whois" to try and find the person who has taken some of my descriptions and I got the following"

"not a properly formatted domain name" when I put in the url of the offender.

what now?

jtara

3:37 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"not a properly formatted domain name" when I put in the url of the offender.

Make sure you are entering ONLY the domain name. Also, in most cases, you should leave off any "www", etc.

So, if the blog is at

www.example.com/blogs/dirtycheat

do

whois example.com

If the blog is on a subdomain of blog hosting company, this is probably all you need. This should give you what you need to contact them and complain. (File a DMCA takedown request.) But, presumably, this isn't the case, as you probably would have just gone to the blog hosting company's home page, and found out how to contact them.

What this will give you is the registration information for the blogger's domain, which probably won't do you much good. You MIGHT get their real name and phone number from this, but you'll probably get the name of a privacy service, or false information. So, let's move on...

You should also determine the IP address of the web server, and look that up. You can determine the IP address with ping. This time, INCLUDE any part before the base domain name (e.g. "www") but leave off any other part of the URL. e.g.

ping www.example.com

You don't care about the actual ping replys. Look at this part, though:

PING www.example.com (166.34.0.192) 56(84) bytes of data.

Write down the IP address part (192.0.34.166 in this example).

Now, do this:

whois 166.34.0.192.in-addr.arpa

REVERSE the order of the four numbers in the IP address (but do NOT reverse the order of the digits in each number!), and add ".in-addr.arpa" to the end of it.

Most likely, you will get an answer saying that there was "no match". Follow the above example, and that is indeed what you will get.

Do it again, only leave off the first number. (LAST number in the IP address, but of course you are entering it reversed...)

whois 34.0.192.in-addr.arpa

Ah, an answer!

OrgName: Internet Assigned Numbers Authority
OrgID: IANA
Address: 4676 Admiralty Way, Suite 330
City: Marina del Rey
StateProv: CA
PostalCode: 90292-6695
Country: US

NetRange: 0.0.0.0 - 0.255.255.255
CIDR: 0.0.0.0/8
NetName: RESERVED-1
NetHandle: NET-0-0-0-0-1
Parent:
NetType: IANA Special Use
Comment: Please see RFC 3330 for additional information.
RegDate:
Updated: 2002-10-14

As you can see, the IP address belonging to "example.com" was issued to the IANA - which happens to be the organization that issues IP addresess! This is because example.com is a special reserved domain name. (That's why the moderators aren't all over me for posting this example! :) )

Most likely you will get an answer on the second try. If not, chop off one more set of numbers:

whois 0.192.in-addr.arpa

But it's unlikely you will have to go that far.

What you have now is the owner of the IP address that the server is on - which is generally distinct from the owner of the domain name that the server is on. This will almost always be a hosting company, and you can now send a DMCA takedown request to the hosting company.

If this seems like a pain, it is. But, fortunately, there are web pages and commercial and free tools that make it easier. Do a search for "reverse whois".

amythepoet

4:01 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh thank you so much. Im on it right now!

I will let you know the scoop.

amythepoet

4:12 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok, I used the whois command again and did you said and still got "not a properly formed domain"

The thing is the blog looks like this:

www.example.blogs.dirthcheat

there are no slashes

and when I click on it, there are all sort of other little descriptions in there, from diifferent sites

on one of them I see he has some google adsense stuff on the right, it's all a mess

amythepoet

4:15 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I also tried the ping command and got nothing, help!

jtara

4:50 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



www.example.blogs.dirthcheat

That's impossible. That's.... an improperly-formed domain name. :)

There's gotta be a .com, .net, .org, etc. (a TLD or "top-level domain") in there somewhere.

If you want, stickymail me the domain name, and I'll see what you are doing wrong.

jtara

6:32 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, cleared some things up. Amy was omitting the .org from the domain name, which wasn't helping things. :)

It wasn't a blog. Just because a site has a subdomain doesn't mean it's a blog - although that is the most common use of subdomains nowadays.

It's just a straight-up scraper site, that happens to have subdomains for it's various categories of stolen content...

The registrant helpfully provided presumably his full name, address, and phone number - in Zimbabwe... Keep in mind that the DMCA is a U.S. law, and won't help with sites that are overseas.

However, turns out the site is hosted in the U.S., and hopefully I was able to help Amy identify the hosting company.

amythepoet

7:18 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HI again,

Just wanted to say that this whole thing started because I was getting google blog alerts and it appeared to me that the first site was a blog site,

still is confusing to me

jtara

8:19 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just wanted to say that this whole thing started because I was getting google blog alerts and it appeared to me that the first site was a blog site,

Just proves that Google isn't perfect. :)

Is there some way for blogs to register with Google so that they will be crawled for inclusion in their blog results? If so, I imagine it wouldn't be uncommon for scrapers to submit themselves as "blogs", and set-up a domain structure that makes them look like a blog.

It's obvious that Google isn't manually checking these, and their algorithm is easy to fool.

As long as this is the case, it looks like Google blog alerts are at least a good way to get notified about scraper sites copying your material.

Any way, exactly what is a "blog", anyway? It's a fuzzy definition, and who is Google to argue with somebody if they say they have a blog? It shows how fatally flawed the search situation still is.

leadegroot

8:38 pm on Nov 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



and quite often they look like a blog because they are using blog software to post the scraped crap - scraper technical skills can be low and many blog tools are free. :(