| View previous topic :: View next topic |
| Author |
Message |
JEKelly Newbie

Joined: 10 Nov 2009 Posts: 1
|
Posted: Tue Nov 10, 2009 8:14 am Post subject: how to return result of regular expression (regex) |
|
|
hello.
i am trying to extract (by using formula's in calc) a regular expression that basically looks for a letter followed by a variable amount of digits, e.g. R4322, X616, T88993, etc.
I have successfully used SEARCH with a regex to locate the first character of the string. The problem is that i don't know how long that string is, nor can i return the result of the found string (as far as i know).
i have also tried searching for the character following the regular expression by searching from the beginning of the found expression and looking for a space. But, i run into a problem when the regex string appears at the end of the string being searched, (thereby ending up with one character short).
example:
string, B2: "Intel Xeon X5492 @ 3.40GHz"
searching for "X5492"
formula: SEARCH("([:alpha:]?[:digit:]{3,5}[:space:]?)";B2)
this gives me the the position of the string "X5492", but how do i know how long it is so that i can do a MID statement to extract it?
I know that you can refer to the result of the searched string in the FIND/REPLACE dialog box, but how to do this in a cell's formula?
any ideas?
thanks, in advance.
J |
|
| Back to top |
|
 |
Robert Tucker Moderator


Joined: 16 Aug 2004 Posts: 3367 Location: Manchester UK
|
Posted: Tue Nov 10, 2009 10:27 am Post subject: |
|
|
Can you not:
=SEARCH(("[:alpha:][0-9]+");B2)
in one cell,
=REPLACE(B2;1;SEARCH(("[:alpha:][0-9]+");B2);"")
in another cell, then search for the first space in the second cell etc etc... _________________ LibreOffice 3.6.6 on Fedora 18, LibreOffice 4.0.2 on Ubuntu 13.04 (Double Boot) |
|
| Back to top |
|
 |
keme Moderator


Joined: 30 Aug 2004 Posts: 2732 Location: Egersund, Norway
|
Posted: Tue Nov 10, 2009 4:16 pm Post subject: |
|
|
First, like Robert Tucker indicates without explicitly stating it, forget the question marks. They make the initial letter and the trailing word delimiter optional, but I believe they're required.
Second: I believe you need multiple cells to do this reliably. (You can make it a one-liner, but that would lead to unnecessary repeated calculations and a long and convoluted formula.)
I guess your strings could have spaces or any kind of punctuation character on either side of the code you're after, or nothing (the string starts or ends with the code). Looks like you're limiting to 3, 4 or 5 digits after the letter. So you want a letter starting a "word", immediately followed by 3-5 digits finishing a "word". In OOo, regex for word start/end are \< and \>. So you want to find \<[:alpha:][:digit:]{3,5}\>
In C2, enter =SEARCH("\<[:alpha:][:digit:]{3,5}\>";B2)
Next, you want to find the position of the last character. As the transition from letters to numbers is taken as a word limit, you need to start that search within the row of digits. The digits starts at the next position, C2+1, so we need to find a "word end" after C2+1.
In D2, enter =SEARCH(".\>";B2;C2+1)
Now we know where the code starts and ends, so we can extract it:
In E2, enter =MID(B2;C2;D2-C2+1)
If there's no matching code, C2 will return an error. You may want to catch that in E2, with something like =IF(ISERROR(C2;"** no match **";MID(B2;C2;D2-C2+1)) instead.
If you can have multiple codes inside a string, you need to repeat those 3 cells, starting the search after the point where the previous code ended. |
|
| Back to top |
|
 |
porg Newbie

Joined: 26 Mar 2010 Posts: 1
|
Posted: Fri Mar 26, 2010 10:14 am Post subject: Also want to know how to retrieve the RESULTS of a RegEx! |
|
|
My application:
I have a list of domains and want to extract their domain ending with a formula,
so that I can then easily use this retrieved data in the column "TLD" for sorting!
| Code: |
TLD DOMAIN MORE ATTRIBUTES...
.org jon.org
.net baby.net
.at komm.at
.co.at denk.co.at
.de dietrich.de
.com health-insurance.com
.travel dubai.travel
|
My RegEx used in the cells of column "TLD" would be either
Approach 1: A list of all known TLDs
| Code: | | \.(com|net|org|at|de|co\.uk|etc)$ |
Approach 2: A blunt generic pattern. May result in a few false positives, but may satisfy my purpose, assumed that I don't have any exotic domain names or endings.
| Code: | | (\.[a-z]{2,6}){1,2}$ |
My workaround:
I will simple export as delimiter seperated values, then do the RegEx modification in a text editor, and copy the results back.
But of course this is only works efficient if I finished editing the domain data.
As soon as I add new rows or edit old ones, I would need to run this whole process again.
Therefore, spreadsheets which formulas, which thus automatically update the data live, would be way more comfortable. |
|
| Back to top |
|
 |
mclaudt General User

Joined: 11 May 2010 Posts: 9
|
|
| Back to top |
|
 |
mclaudt General User

Joined: 11 May 2010 Posts: 9
|
Posted: Wed May 12, 2010 9:37 am Post subject: |
|
|
Please vote Issue 106099 for adding regexp functionality to SUBSTITUTE function.
It can solve all discussed problems. |
|
| Back to top |
|
 |
mclaudt General User

Joined: 11 May 2010 Posts: 9
|
Posted: Wed May 12, 2010 2:33 pm Post subject: |
|
|
| Solved |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|