Welcome 44.206.248.122 to 4D4M D07 N37  b t e

< -
0102030405060708091011
+ >

Regex, A Love Hate Relationship.
09.03.13
18:21:50


So I'm working on my blog obsessively lately and I realize I'm really going to want to put some code up here since I work on a lot of little script and I tend to spend a lot of time in CLIs of switches, appliances, and my Linux boxes. So in my last post I put together a little code so that I could present the code I wrote to get my unauthorized twitter feed. The need for this got me running down a long and dark hole of regex, which even though its frustrating its still good because I do use it from time to time at work. That Twitter code looks like this:

$user = "4d4mdotnet";
$count = 5;
$apiurl = "http://api.twitter.com/1/statuses/user_timeline.json?screen_name=".$user."&count=".$count;

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $apiurl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);

echo "
Twitter Feed
";
$tweet = json_decode($output);

if (isset($tweet->error)) {
echo "Rate Limited by Twitter.";
echo "
";
} elseif (isset($tweet[0]->text)) {
for ($i=1; $i <= $count; $i++) {
$display = $tweet[($i-1)]->text;
echo "
".$display."
";
echo "
";
}
} else {
echo "Something else went wrong.";
echo "
";
}


Doing that worked out really well until I realized that scrubbing my data to be SQL safe and then returning it to its HTML format was putting s at the end of every line. So now I have to write up a function to remove these elements from the code. I started by using a preg_replace() function from PHP but I got lost in the weeds on the regex. So I called in a buddy of mine who's a super genius about this stuff and he gave me this.

function html2xmp($cc_str) {
$cc_str = preg_replace('|(]*>)(.*)()|seiU', ''$1' . preg_replace('||', '', '$2') . '$3'', $cc_str);
return $cc_str;
}


There are a couple things I'll point out here that was tripping me up:

1) As you can see I am embedding a preg_replace() into a preg_replace() in hopes to use regex to find the text between the XMP tags and the second preg_replace() to remove the . I went through tons of iterations of this failing every time. One of the reasons is I was only capturing the text between the tags and not actually capturing and parsing out the tags themselves. So above you can notice the tags have parentheses around them so we can pass it to the next step and not lose them like I was before.

2) The second thing I didn't have the proper pattern modifies on the first expression I wrote. So the idea is that I need to grab three strings. The first tag, the text between the tags, and the second tag. Then I need to pass them into the second part of the function so that it I can run an embedded preg_replace() against the second string or $2. When I got close to getting this before I kept running into 'PHP Fatal error: preg_replace(): Failed evaluating code:' and so I would give up and try something else. However he added the 'e' modifying the expression to not to try and process it as code. At least that's how I take it.

So there it is. This is the simple function I created with a little help to remove s from between two tags. I'm still really confused when it comes to regex so I might have some my explanation wrong but this code works now so if you need something similar feel free to use it and I'm always happy to get e-mails or comments about it. I hope this helps because nowhere could I find someone else who was doing this type of replacement.

P.S. I through that twitter code on there because I just wanted to see if my function would work if I had two sets of XMP tags. /crossfingers

UPDATE:

So I ended up having to rewrite my function a bit to really address some issues. I believe I have it down a lot better now. Mostly the big change was going from xmp tags (even though I liked that tag) to a pre tag because the xmp tag was depreciated as far as I can tell. Also I added in the htmlentities() function to avoid having the tags get messed up if I include the /pre tag in the code. Here is what it looks like now:

function html2xmp($cc_str) {
$cc_str = preg_replace('|(]*>)(.*)()|seiU', ''$1' . htmlentities(preg_replace('||', '', '$2')) . '$3'', $cc_str), ENT_QUOTES);
return $cc_str;
}


Hope that helps.


π rss Copyright © 2005 - 2024 4d4m.net