Forum Moderators: coopster

Message Too Old, No Replies

Regex geared towards Wikipedia bot development

         

ocon

2:14 pm on Nov 9, 2009 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm trying to create a bot for Wikipedia, but I'm having the hardest time with some regex. I was hoping somebody might be able to help me with my coding, or better yet, might know of a website with some regex geared towards Wikipedia, considering the many bots written for it.

What I'm looking to do with my regex pattern is to convert:

[[Link]] => Link
[[Complex link¦Link]] => Link
[[A more complex link!¦Link]] => Link
[[Link]]s => Links
[[File:Image.jpg¦thumb¦An example with a [[link]]]] => [[File:Image.jpg¦thumb¦An example with a link]]
[[Image:Image.jpg¦thumb¦An example with a [[link]]]] => [[Image:Image.jpg¦thumb¦An example with a link]]

And do nothing to:

[[File:Image.jpg¦thumb]]
[[Category:Examples]]
[[fr:Le link]]
[[be-x-old:Linko]]

What I have is:

$contents=$input;
$contents=preg_replace("/\[\[[a-z]{2,7}:[^\¦\.]+\]\]/iU","",$contents);
$contents=preg_replace("/\[\[[^:\¦]+\¦([^:\{\}\¦]+)\]\][^\[\]]/i","$1",$contents);
$contents=preg_replace("/\[\[([^:]+)\]\]/iU","$1",$contents);
$output=$contents;

And while it does an ok job, I'm noticing its a little loosely written, and sometimes takes out large chunks of code, leaves behind random brackets, and takes out spaces between words.

Any help is greatly appreciated.

dreamcatcher

7:02 pm on Dec 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looks like this has everyone confused ocon. Did you sort this out yourself?

dc