Villeroy Super User


Joined: 04 Oct 2004 Posts: 7649 Location: Germany
|
Posted: Tue Sep 05, 2006 8:31 am Post subject: |
|
|
Excel:
Wild card * means "zero or more of any char"
Wild card ? means "zero or one of any char"
~ is Excel's escape char: ~** finds a literal * followed by zero or more of anything.
Regex:
The "any char" in regex is a dot.
* means "zero or more of the preceeding item"
? means "zero or one of the preceeding item"
So * equals .* and ? eauals .?
The escape char is \
\*.* finds a literal * followed by zero or more of anything.
You may replace quantifiers * and ? with
+ at least one of the preceeding item
{2} exactly 2 of the preceeding item
{2,} at least two of the preceeding item
{1,5} 1 to 5 of the preceeding item
You may replace the dot with
[0-9] or [:digit:] one digit
[a-z] [:lower:] one small letter where [:lower:] includes äöüâ and alike
[:space:] one char of any kind of space
[AaBb] one char of any kind of a or b
[^AaBb] one char of anything but a or b
[^0-9] one char of anything but digit
Unfortunately named character classes [:name:] are buggy:
[^:digit:]+ matches a digit although it should not.
[:digit:] at the end of a regex does not match a single digit, although it should. Use [:digit:] at the end of a regex with quantifier {1}
^ matches the start of a line (the position before the first char)
$ matches the end of a line (the position behind the last char)
In Writer $ is the position between the last char and the paragraph-break, UNLESS the regular expression is a $ only. In this case $ matches the paragraph-break itself. Two consecutive paragraph-breaks are a blank line and can be found with regex ^$ (start and end with nothing between). \n matches a line break, \t a tab. AFAIK the only expressions allowed in the replace-box of find/replace are & \t and \n. In the replace box \n means paragraph break and & means "all that was matched by the search-regex".
Search: ^[:digit:]+\n
Replace: &\n
replaces a line of digits 12345 folowed by line-break with 12345 and a paragraph-break.
Well, this is almost all I know about regexes in OOo and I stop writing here. It's saddening, that OOo-regexes are buggy and limited although there are so many (bug-)free and unlimited implementations out there.
I have posted this short version several times:
Small regex-howto for all those readers wanting a simple MS-style pattern-matching with * and ?
Short regex-howto:
- Use .* instead of *
- Use . instead of ?
- ^ means start
- $ means end
- if you search for a literal * escape it with \*
- same with any kind of braces and any of ^$.*?\&|
- in doubt escape anything but <, > and alphanumerics, since \<, \>, \n, \t and \C have a special meaning in regexes.
- avoid named character-classes as they are described in help (eg. [:digit:])
Support of named classes is buggy and will change in future.
http://www.openoffice.org/issues/show_bug.cgi?id=64368 _________________ XUbuntu 9.04, OOo 3.1.1(Sun), Sun Java 1.5.0_06 |
|