homepage Welcome to WebmasterWorld Guest from 54.161.175.231
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
PediaSearch.com Crawler
keyplyr




msg:3025009
 10:35 pm on Jul 27, 2006 (gmt 0)

While I do get traffic from Wikipedia articles that reference my webpage content, this has to be a violation of my copyright. From what I understand, this bot follows these links, scrapes your site's content, creates a book and sells it to the user.

85.214.51.184 - - [27/Jul/2006:10:53:39 -0400] "GET / HTTP/1.0" 200 10956 "-" "PediaSearch.com Crawler"

 

bull




msg:3027343
 6:13 pm on Jul 29, 2006 (gmt 0)

and it does not fetch robots.txt

deny from 85.214.

Mokita




msg:3027533
 10:01 pm on Jul 29, 2006 (gmt 0)

It did ask for robots.txt when it visited one of our sites, but it remains to be seen whether it honours it or not - as I'd never heard of it previously.

pediapress.com - - [24/Jul/2006:03:45:48 +1000] "GET /robots.txt HTTP/1.0" 200 1930 "-" "PediaSearch.com Crawler"
pediapress.com - - [24/Jul/2006:03:45:50 +1000] "GET / HTTP/1.0" 200 9489 "-" "PediaSearch.com Crawler"

wilderness




msg:3027578
 10:57 pm on Jul 29, 2006 (gmt 0)

and it does not fetch robots.txt

deny from 85.214.

Yo Jan!
Get with the program!

deny from 85. ;)

Have you changed locales yet and did you purchase that "dingy" I suggested for local transportation?

Best Don

zCat




msg:3027580
 10:59 pm on Jul 29, 2006 (gmt 0)

From the website, only genuine Wikipedia articles are printed, which is perfectly legal provided they follow the GFDL. However it does say "PediaPress suggests further articles based on what's in your book" - maybe they scrape the external links from Wikipedia articles?

The company behind it is located in Germany.

wilderness




msg:3027597
 11:20 pm on Jul 29, 2006 (gmt 0)

only genuine Wikipedia articles are printed

Isn't that an oxymoron?
It's a user based forum.

I've seen some of these pages that provide "not so accurate" information.

You or I, or anybody may go to any page and add inaccuracies should we desire and there are no controls to prevent such a thing (with the exception of another user updating our change later).

zCat




msg:3027606
 11:47 pm on Jul 29, 2006 (gmt 0)

Wilderness, I mean "genuine" in the sense that the content printed is (according to the bot's site) genuine, original and legal Wikipedia content.

Whether that content is always accurate is another question; I think there's an ongoing thread elsewhere on WW about the rights and wrongs of Wikipedia.

wilderness




msg:3027657
 1:47 am on Jul 30, 2006 (gmt 0)

zCat,
I realized what you meant and was just yanking your chain.

Don

zCat




msg:3027763
 7:19 am on Jul 30, 2006 (gmt 0)

Ah, the forum software must have swallowed your smileys ;-).

Used to be quite active in Wikipedia but I think it's getting out of control

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved