OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

how to return result of regular expression (regex)

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Calc
View previous topic :: View next topic  
Author Message
JEKelly
Newbie
Newbie


Joined: 10 Nov 2009
Posts: 1

PostPosted: Tue Nov 10, 2009 8:14 am    Post subject: how to return result of regular expression (regex) Reply with quote

hello.

i am trying to extract (by using formula's in calc) a regular expression that basically looks for a letter followed by a variable amount of digits, e.g. R4322, X616, T88993, etc.

I have successfully used SEARCH with a regex to locate the first character of the string. The problem is that i don't know how long that string is, nor can i return the result of the found string (as far as i know).

i have also tried searching for the character following the regular expression by searching from the beginning of the found expression and looking for a space. But, i run into a problem when the regex string appears at the end of the string being searched, (thereby ending up with one character short).

example:
string, B2: "Intel Xeon X5492 @ 3.40GHz"
searching for "X5492"
formula: SEARCH("([:alpha:]?[:digit:]{3,5}[:space:]?)";B2)

this gives me the the position of the string "X5492", but how do i know how long it is so that i can do a MID statement to extract it?

I know that you can refer to the result of the searched string in the FIND/REPLACE dialog box, but how to do this in a cell's formula?

any ideas?

thanks, in advance.

J
Back to top
View user's profile Send private message
Robert Tucker
Moderator
Moderator


Joined: 16 Aug 2004
Posts: 3407
Location: Manchester UK

PostPosted: Tue Nov 10, 2009 10:27 am    Post subject: Reply with quote

Can you not:

=SEARCH(("[:alpha:][0-9]+");B2)

in one cell,

=REPLACE(B2;1;SEARCH(("[:alpha:][0-9]+");B2);"")

in another cell, then search for the first space in the second cell etc etc...
_________________
OpenOffice 4.0.0 and LibreOffice 4.x.x on Fedora 20, Ubuntu 13.10, Windows 8.1 Preview (Triple Boot)
Back to top
View user's profile Send private message
keme
Moderator
Moderator


Joined: 30 Aug 2004
Posts: 2910
Location: Egersund, Norway

PostPosted: Tue Nov 10, 2009 4:16 pm    Post subject: Reply with quote

First, like Robert Tucker indicates without explicitly stating it, forget the question marks. They make the initial letter and the trailing word delimiter optional, but I believe they're required.

Second: I believe you need multiple cells to do this reliably. (You can make it a one-liner, but that would lead to unnecessary repeated calculations and a long and convoluted formula.)

I guess your strings could have spaces or any kind of punctuation character on either side of the code you're after, or nothing (the string starts or ends with the code). Looks like you're limiting to 3, 4 or 5 digits after the letter. So you want a letter starting a "word", immediately followed by 3-5 digits finishing a "word". In OOo, regex for word start/end are \< and \>. So you want to find \<[:alpha:][:digit:]{3,5}\>

In C2, enter =SEARCH("\<[:alpha:][:digit:]{3,5}\>";B2)

Next, you want to find the position of the last character. As the transition from letters to numbers is taken as a word limit, you need to start that search within the row of digits. The digits starts at the next position, C2+1, so we need to find a "word end" after C2+1.

In D2, enter =SEARCH(".\>";B2;C2+1)

Now we know where the code starts and ends, so we can extract it:

In E2, enter =MID(B2;C2;D2-C2+1)

If there's no matching code, C2 will return an error. You may want to catch that in E2, with something like =IF(ISERROR(C2;"** no match **";MID(B2;C2;D2-C2+1)) instead.

If you can have multiple codes inside a string, you need to repeat those 3 cells, starting the search after the point where the previous code ended.
Back to top
View user's profile Send private message
porg
Newbie
Newbie


Joined: 26 Mar 2010
Posts: 1

PostPosted: Fri Mar 26, 2010 10:14 am    Post subject: Also want to know how to retrieve the RESULTS of a RegEx! Reply with quote

My application:
I have a list of domains and want to extract their domain ending with a formula,
so that I can then easily use this retrieved data in the column "TLD" for sorting!

Code:

TLD      DOMAIN                  MORE ATTRIBUTES...
.org     jon.org             
.net     baby.net             
.at      komm.at             
.co.at   denk.co.at           
.de      dietrich.de         
.com     health-insurance.com
.travel  dubai.travel         


My RegEx used in the cells of column "TLD" would be either
Approach 1: A list of all known TLDs
Code:
\.(com|net|org|at|de|co\.uk|etc)$


Approach 2: A blunt generic pattern. May result in a few false positives, but may satisfy my purpose, assumed that I don't have any exotic domain names or endings.
Code:
(\.[a-z]{2,6}){1,2}$


My workaround:

I will simple export as delimiter seperated values, then do the RegEx modification in a text editor, and copy the results back.

But of course this is only works efficient if I finished editing the domain data.
As soon as I add new rows or edit old ones, I would need to run this whole process again.

Therefore, spreadsheets which formulas, which thus automatically update the data live, would be way more comfortable.
Back to top
View user's profile Send private message
mclaudt
General User
General User


Joined: 11 May 2010
Posts: 9

PostPosted: Wed May 12, 2010 8:11 am    Post subject: Reply with quote

Same issue here, but any direct solution yet.

http://user.services.openoffice.org/en/forum/viewtopic.php?t=30502
http://www.oooforum.org/forum/viewtopic.phtml?p=373845
Back to top
View user's profile Send private message
mclaudt
General User
General User


Joined: 11 May 2010
Posts: 9

PostPosted: Wed May 12, 2010 9:37 am    Post subject: Reply with quote

Please vote Issue 106099 for adding regexp functionality to SUBSTITUTE function.

It can solve all discussed problems.
Back to top
View user's profile Send private message
mclaudt
General User
General User


Joined: 11 May 2010
Posts: 9

PostPosted: Wed May 12, 2010 2:33 pm    Post subject: Reply with quote

Solved
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Calc All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group