OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Extract number / value from alphanumeric string

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Calc
View previous topic :: View next topic  
Author Message
Romisek
Newbie
Newbie


Joined: 23 Oct 2010
Posts: 2

PostPosted: Sat Oct 23, 2010 3:34 am    Post subject: Extract number / value from alphanumeric string Reply with quote

Hi, pls could anyone help me with this task?

I need to extract number / value from a short alphanumeric string as following example (a number always appears at the beginning and can be of different length decimal):

8.5hr
6rm
6.83hr
4rm

To get:
8.5
6
6.83
4


Can I use any Reg Exp function (but without a java script)?

I would also need it to function ideally on both Calc and Excel spreadsheets.


Many thx
Back to top
View user's profile Send private message
keme
Moderator
Moderator


Joined: 30 Aug 2004
Posts: 2910
Location: Egersund, Norway

PostPosted: Sat Oct 23, 2010 4:43 am    Post subject: Reply with quote

A few examples that should be easy to understand (but may fail if the string is very much different from the examples given):

=VALUE(LEFT(<text>;SEARCH("([:alpha:])";<text>)-1)) should work for your examples.

A slightly more elaborate regex will catch a few other "suffixes". Searching for the first transition from number to text:
=VALUE(LEFT(<text>;SEARCH("([:digit:]([:space:]|[:alpha:]))";<text>)))

Note that this search term includes the last digit of your number in the match, so the "-1" adjustment is not needed.

AFAIK those formulas will not work in Excel. (I don't think this is easily done in Excel without installing some regex addon, but I have limited experience with recent Excel versions so I may be wrong.)
Back to top
View user's profile Send private message
ken johnson
Super User
Super User


Joined: 23 Apr 2009
Posts: 2032
Location: Sydney, Australia

PostPosted: Sat Oct 23, 2010 7:26 am    Post subject: Reply with quote

This array formula starts from the left side of such a numeric/alphabetic string and looks at progressively longer substrings, checking for an error returned by VALUE(substring).
If the error is detected the array formula returns a blank, otherwise it returns the length of the substring.
The maximum value in the resulting array is the number of characters making up the number contained in the string.
This maximum value is then used to extract the numerical characters using the LEFT function.
The VALUE function then converts the substring to a number.
With the numeric/alphabetic strings in column A, starting in A1, this formula in B1 worked in Calc...
Code:
VALUE(LEFT(A1;MAX(IF(ISERROR(VALUE(LEFT(A1;ROW(INDIRECT("$B$1:$B$"&LEN(A1))))));"";ROW(INDIRECT("$B$1:$B$"&LEN(A1)))))))

Unfortunately, Excel could not cope. This part of the formula returned the number of characters in the numeric substring...
Code:
MAX(IF(ISERROR(VALUE(LEFT(A1,ROW(INDIRECT("$B$1:$B"&LEN(A1)))))),"",ROW(INDIRECT("$B$1:$B"&LEN(A1)))))
but it refused to then use this value with the LEFT and VALUE functions.
To get Excel working I've had to do away with the ROW(INDIRECT("$B$1:$B$"&LEN(A1))) part and use instead ROW($B$1:$B$30). Here I have assumed that 30 would be a reasonable maximum number of characters in the longest numeric/alphabetic string in column A.
So, if the assumption that "no column A string is longer than 30 characters" is correct, and if the first string is in A1, then in Calc try...
Code:
VALUE(LEFT(A1;MAX(IF(ISERROR(VALUE(LEFT(A1;ROW($B$1:$B$30))));"";ROW($B$1:$B$30)))))
entered into B1 using the Ctrl+Shift+Enter key combination, then hold down the Ctrl key while dragging the fill handle to copy the array formula into the rows below.
In Excel try...
Code:
VALUE(LEFT(A1,MAX(IF(ISERROR(VALUE(LEFT(A1,ROW($C$1:$C$30)))),"",ROW($C$1:$C$30)))))
entered into B1 using the Ctrl+Shift+Enter key combination, then filled down to the rows below.

Ken Johnson
_________________
If your problem has been solved please add "[Solved]" to the beginning of your first post title (edit button).
Back to top
View user's profile Send private message
Jens S
Power User
Power User


Joined: 26 May 2008
Posts: 87
Location: Denmark

PostPosted: Sat Oct 23, 2010 11:34 am    Post subject: Reply with quote

You can use Regex in Search & Replace (ctrl+F) in Calc:

Search for: [^0-9\.]
Replace with: (nothing)
x Selected cells
x Regular expressions

Replace all

VBA can handle Regex in Excel, try this udf. Just type in B1: =Freenum(A1)
Function FreeNum(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
With re
.Pattern = "[^0-9\.]"
.Global = True
.MultiLine = True
End With
FreeNum = re.Replace(s, "")
Set re = Nothing
End Function

Jens
Back to top
View user's profile Send private message
Romisek
Newbie
Newbie


Joined: 23 Oct 2010
Posts: 2

PostPosted: Sat Oct 30, 2010 3:24 am    Post subject: Reply with quote

Thank you all very much! I did not expect the answer so quickly.
Seeing the formulas, wow, such a basic task, retrieve a number from a string, my mobile phone can handle this, why it is difficult? There is no formula function for RE? Why? Because of compatibility with Excel?

Anyway, the VB macro seems to be the easiest way after all. Despite I wanted to stay away from "programming" it seems to be the easiest.
Any drawbacks using VB in Calc?
Is it compatible with Excel? I guess yes.
By using Star Basic I guess I would break it wouldn't I?

P.S.
Where I can put the FreeNum() macro please? What is UDF?

Thank you all again, you all helped a lot.
Back to top
View user's profile Send private message
Villeroy
Super User
Super User


Joined: 04 Oct 2004
Posts: 10106
Location: Germany

PostPosted: Sat Oct 30, 2010 5:22 am    Post subject: Reply with quote

Quote:
my mobile phone can handle this, why it is difficult?

Look at the programming code of your phone software (just in case it is open source).

A spreadsheet can be seen as a simplified, visual programming language with 3 different types of data: Number, text and error. In Excel boolean /true/false) constitutes a 4th type.

You to come to terms with this programming language when you master the following polarities:
Data vs. Formulas (later: array formulas)
Values vs. Formatting
Text vs. Number (including booleans, dates, times)
Absolute vs. Relative referencing

It can not be simplier without losing the versatility of a programming language. The spreadsheet language is not made for string manipulation in the first place. Virtually all scripting languages can extract patterns from strings more easily.
No, VB, VBA, StarBasic are different, much more importantly Excel and Calc are totally different applciations. You may use the same language but you have to tell different things.
Excel does not support regular expressions at all, making any kind of pattern matching much more complicated. VB/VBA has some loadable Regex Module. All this depends on Windows.
OOo comes with a Python runtime with a full set of libraries as a more professional alternative. You can also use JavaScript with Regexes.
_________________
Rest in peace, oooforum.org
Get help on https://forum.openoffice.org
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Calc All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group