| View previous topic :: View next topic |
| Author |
Message |
clemens General User

Joined: 23 Oct 2008 Posts: 21
|
Posted: Mon Sep 27, 2010 8:28 am Post subject: How to convert unicode snippets to actual glyphs? |
|
|
Hello,
I have a file in Shift Jis with inside some unicode code snippets (with this format 〈★鑱〉 the numbers vary) for the many glyphs not included inside Shift Jis (one could ask why they didn't do everything with unicode from the beginning, I have the same question, but no answer). I converted the file to utf8, but of course the code snippets remained as they were, since they were recognised as simple text. Now I have a unicode text file with inside some unicode code snippets for individual glyphs.
How can I can convert the snippets automatically to the glyphs?
How do I make a macro to do this?
Cheers
p.s. I have been using OO for quite a while, but never tried macros.
Last edited by clemens on Tue Sep 28, 2010 2:29 am; edited 1 time in total |
|
| Back to top |
|
 |
Robert Tucker Moderator


Joined: 16 Aug 2004 Posts: 3367 Location: Manchester UK
|
Posted: Mon Sep 27, 2010 10:42 am Post subject: |
|
|
How about simple search and replace? _________________ LibreOffice 3.6.6 on Fedora 18, LibreOffice 4.0.2 on Ubuntu 13.04 (Double Boot) |
|
| Back to top |
|
 |
clemens General User

Joined: 23 Oct 2008 Posts: 21
|
Posted: Tue Sep 28, 2010 2:27 am Post subject: |
|
|
I don't understand how that would be possible, in the sense that I have to find the code snippets (all with a sligtly different number, hundreds altogether), find the related glyph online in the unicode search engine, and then paste it in place of the code.
Perhaps I wasn't clear enough in my initial post. Anyhow I just edited it a bit adding more information.
All the best |
|
| Back to top |
|
 |
Robert Tucker Moderator


Joined: 16 Aug 2004 Posts: 3367 Location: Manchester UK
|
Posted: Tue Sep 28, 2010 3:26 am Post subject: |
|
|
Perhaps you can use regex. Search for something like: &#([a-z0-9]*); and replace with something like \x$1; except that substituting with a Unicode number does not seem to work in OpenOffice Writer so you may want to look at (command line) Perl or sed or indeed a macro.
| Quote: | \xXXXX
Represents a special character based on its four-digit hexadecimal code (XXXX).
|
OpenOffice Help files _________________ LibreOffice 3.6.6 on Fedora 18, LibreOffice 4.0.2 on Ubuntu 13.04 (Double Boot) |
|
| Back to top |
|
 |
clemens General User

Joined: 23 Oct 2008 Posts: 21
|
Posted: Wed Sep 29, 2010 12:22 am Post subject: |
|
|
Thank you for your reply.
I don't really know how to use regex and macros well, this was the first reason why I posted here, in the hope that someone could guide me through the process of making the actual macro.
As far as programming languages, I have studied Python a little bit, but I have never actually done anything yet with it, so I might as well say I am a complete beginner.
My idea is to make a macro that recognizes the code snippet (after all they are all included between <>, with the added ★&#x that always stays the same), get the number, find the related glyph and then replace the whole thing with the glyph itself.
All the best |
|
| Back to top |
|
 |
Robert Tucker Moderator


Joined: 16 Aug 2004 Posts: 3367 Location: Manchester UK
|
Posted: Wed Sep 29, 2010 4:02 am Post subject: |
|
|
Looking at it again I don't think my suggestion will work – one can't use back-references to input Unicode hex code it seems. Can you not convert your file into html, put:
<html>
<head></head>
<body>
at the top and:
</body>
</html>
at the bottom and search and replace all paragraph breaks with <br><br>, then open the file (having given it an .html extension) in your browser and copy paste the result out?
It might be useful if you state your operating system.
[Admittedly I'm not a macro writer!] _________________ LibreOffice 3.6.6 on Fedora 18, LibreOffice 4.0.2 on Ubuntu 13.04 (Double Boot) |
|
| Back to top |
|
 |
Robert Tucker Moderator


Joined: 16 Aug 2004 Posts: 3367 Location: Manchester UK
|
|
| Back to top |
|
 |
clemens General User

Joined: 23 Oct 2008 Posts: 21
|
Posted: Wed Sep 29, 2010 10:24 am Post subject: |
|
|
Actually the html trick worked! Thank you very much. The star glyph remained, but I just needed to delete them all, copy and paste on a new file and the trick was done.
Anyhow I still have one problem. I wasn't able to replace the the paragraph ends with the br code.
I tried a search an replace all ^p with <br><br>, but Writer doesn't find any ^p, which is strange because the original is actually composed of many lines of text.
What should I do? |
|
| Back to top |
|
 |
Robert Tucker Moderator


Joined: 16 Aug 2004 Posts: 3367 Location: Manchester UK
|
Posted: Wed Sep 29, 2010 11:51 am Post subject: |
|
|
If you install the OpenOffice extension AltSearch you can search for \p and replace with <br><br>.
However are you sure the glyph your browser is inserting is the correct one? If you have gone back to using the Shift-JIS file (not the Unicode one) possibly it is – anyway if you can read Japanese I guess you will know. _________________ LibreOffice 3.6.6 on Fedora 18, LibreOffice 4.0.2 on Ubuntu 13.04 (Double Boot) |
|
| Back to top |
|
 |
|