Web Hosting Talk







View Full Version : regular expression help


larwilliams
05-31-2009, 02:14 PM
Hi,

I am making a small currency conversion system for a friend and need some basic regular expression help. I need to be able to parse out the currency names and rates from a string that will always follow the same format

The format is as follows:

1 CAD = USD (0.912324)

I need to be able to get the four pieces of information

1) amount in base currency (in this case, CAD)
2) base currency name (in this case, CAD)
3) converted currency name (in this case, USD)
4) conversion rate (in this case, 0.912324)

My regexp skills are a bit rusty, so any assistance would be appreciated.

Steve_Arm
05-31-2009, 02:35 PM
Of course,

$part = '1 CAD = USD (0.912324)';
preg_match('/^([0-9]+)\s([A-Z]{3})\s?=\s?([A-Z]{3})\s(\(?[0-9\.]+\)?)$/', $part, $matches);
print_r($matches);
or for last part without the parentheses '/^([0-9]+)\s([A-Z]{3})\s?=\s?([A-Z]{3})\s\(?([0-9\.]+)\)?$/'

larwilliams
05-31-2009, 02:49 PM
Of course,

$part = '1 CAD = USD (0.912324)';
preg_match('/^([0-9]+)\s([A-Z]{3})\s?=\s?([A-Z]{3})\s(\(?[0-9\.]+\)?)$/', $part, $matches);
print_r($matches);
or for last part without the parentheses '/^([0-9]+)\s([A-Z]{3})\s?=\s?([A-Z]{3})\s\(?([0-9\.]+)\)?$/'

Thanks, I just cooked my own solution in the mean time and come up with something a bit more generic than yours :D


preg_match("/(.*?) (.*?) \= (.*?) \((.*?)\)/i", $currencyxml, $matches);

mwatkins
05-31-2009, 04:54 PM
On a sunny Vancouver afternoon I'm a bit too lazy to delve into the PHP docs to build an equivalent approach to the following no-regex solution, so I'll just throw out this Python example just in case there is value in translating it to PHP. I like regexes more than most but sometimes avoiding them is a nice thing. Assume you are populating "line" with a foreach looping through the input data - splitting the line on the whitespace gives you:

>>> line.split()
['1', 'CAD', '=', 'USD', '(0.912324)']
# in Python we can do multiple assignments as so
>>> base_amount, base_currency, op, conv_currency, rate = line.split()

And here's a full solution for decoding one line. It's a little longer because I am casting the values (which a regex still returns to you as a string) to usable values in the right type (for base_amount and rate):

$ python
>>> import Decimal
>>> line = '1 CAD = USD (0.912324)'
>>> base_amount, base_currency, op, conv_currency, rate = line.split()
>>> rate = rate.replace('(', '').replace(')','')
>>> base_amount, rate = int(base_amount), decimal.Decimal(rate)

What are those variables containing now?

>>> base_amount, base_currency, conv_currency, rate
1, 'CAD', 'USD', Decimal('0.912324')

mwatkins
05-31-2009, 05:04 PM
The other, probably more likely, approach I would take is to use our trusty regex tool to do the splitting:

This gets us almost all the way there:

$ python
>>> import re
>>> line = '1 CAD = USD (0.912324)'
>>> re.split(r'\W', line)
['1', 'CAD', '', '', 'USD', '', '0', '912324', '']

This gets us closer yet:

>>> [v for v in re.split('\W', line) if v]
['1', 'CAD', 'USD', '0', '912324']

Putting it all together:

$ python
>>> import decimal
>>> import re
>>> line = '1 CAD = USD (0.912324)'
>>> base_amount, base_currency, op, conv_currency, rate = [
v for v in re.split('\W', line) if v]
>>> base_amount, rate = int(base_amount), decimal.Decimal(rate)

PHP has preg_split, worth a look perhaps.

http://ca3.php.net/manual/en/function.preg-split.php

Cmafai
06-01-2009, 07:13 PM
If you want to be lazy, and the regex is bugging you for some reason, an alternative would be to simply use a function like substr() to get each part.


$input = '1 CAD = USD (0.912324)';
$part[] = substr($input,0,1);
$part[] = substr($input,2,5);
// etc...

Of course that is completely pointless and stupid if the format is going to change at all. I really don't recommend this approach unless something weird is preventing you from using one of the other solutions. Just thought I'd throw in an alternative :)