| View previous topic :: View next topic |
| Author |
Message |
teste General User

Joined: 23 May 2010 Posts: 7
|
Posted: Wed May 26, 2010 2:53 pm Post subject: How to change the structure of this specific text. |
|
|
Hi people, I have a big text copied from a pdf file with this structure:
103119556, ABELAR VIEIRA ROSA NETO, 67.00, 1319; 103150727, ABELARDO MELO GOMES,
61.00, 2962; 103110766, ABIGAIL DA SILVA JOSE, 66.00, 1461; 103108055, ABNER FERREIRA
SANTOS DE SOUZA, 60.00, 3532; 103126985, ADAIL SOARES SIQUEIRA JUNIOR, 64.00, 2045;
103126151, ADALA MICHELINE GALVAO RUELA FELICIANO, 70.00, 845;
I would like to know how to let this text in this way:
103119556, ABELAR VIEIRA ROSA NETO, 67.00, 1319;
103150727, ABELARDO MELO GOMES, 61.00, 2962;
103110766, ABIGAIL DA SILVA JOSE, 66.00, 1461;
....
What are the steps to do this?
Could someone hemp me?
Wait an answer, thanks |
|
| Back to top |
|
 |
Taolin Newbie

Joined: 26 May 2010 Posts: 3 Location: Texas
|
Posted: Wed May 26, 2010 6:50 pm Post subject: |
|
|
I followed your discussion on this in the Openoffice.org IRC channel and now, reading the above, see why none of the advice you were given could work.
All of the suggestions made were targeted at turning the ";" into a CR. The problem is that you *already* have a bunch of CR in your data that makes it impossible to have the lines come out with the same number of fields in each line.
The "\n" or "\r" business, whether it is done with SED or OO itself will correctly replace all ";" with CR, but it won't do what you are asking, leave the ";" at the end and make each new line begin with the next character after a ";". You have to get rid of the CR that has been added by the paste/save (whatever) from the PDF.
103119556, ABELAR VIEIRA ROSA NETO, 67.00, 1319;
will break correctly (although it will lose the ";") but then you also have an *existing* break after
MELO GOMES,
so that winds up on a line by itself, then the next line in the input file breaks into two lines at
66.00, 1461; 103108055, ABNER
You will have to remove all of the existing CR (CR/LF, probably) before the OO Find/Replace or SED methods will work. SED should be able to do this with something like
sed -e 's/\r//' originalfile.txt > result.txt
then use the SED command that you were given in the IRC channel.
(sed -e 's/;/\r/' originalfile.txt > result.txt [if I recall correctly]) and even then, I would not kill off the ";", but rather use sed -e 's/;/;\r/' originalfile.txt > result.txt
That may leave you a few blanks to clean up, but that should be worst-case.
I am no SED expert, but I *am* sure of the cause of the failure of the advice you received on IRC.
Please feel welcome to contact me directly on IRC and I can help more. |
|
| Back to top |
|
 |
JohnV Administrator

Joined: 07 Mar 2003 Posts: 8982 Location: Lexinton, Kentucky, USA
|
Posted: Wed May 26, 2010 8:20 pm Post subject: |
|
|
Use Find & Replace with Regular Expressions checked. The 1st step may not be needed if you have no line breaks.
Search = \n – find line breaks
Replace = \n – replace with paragraph breaks (looks strange but it is correct)
Search = <spacebar>$ - find a space followed by paragraph breaks
Replace = nothing – the space is removed but paragraph breaks will remain
Search = $ - find paragraph breaks
Replace = nothing – paragraph breaks will be deleted
Search = ;<spacebar> - find ; plus a space
Replace = \n – replace with paragraph breaks
You will have to fix the last line yourself because there is no space after the ;. |
|
| Back to top |
|
 |
teste General User

Joined: 23 May 2010 Posts: 7
|
Posted: Fri May 28, 2010 4:45 am Post subject: |
|
|
Hi JohnV, your suggestion works,
Thanks |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|