OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

How to convert unicode snippets to actual glyphs?

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API
View previous topic :: View next topic  
Author Message
clemens
General User
General User


Joined: 23 Oct 2008
Posts: 21

PostPosted: Mon Sep 27, 2010 8:28 am    Post subject: How to convert unicode snippets to actual glyphs? Reply with quote

Hello,

I have a file in Shift Jis with inside some unicode code snippets (with this format 〈★鑱〉 the numbers vary) for the many glyphs not included inside Shift Jis (one could ask why they didn't do everything with unicode from the beginning, I have the same question, but no answer). I converted the file to utf8, but of course the code snippets remained as they were, since they were recognised as simple text. Now I have a unicode text file with inside some unicode code snippets for individual glyphs.
How can I can convert the snippets automatically to the glyphs?
How do I make a macro to do this?
Cheers

p.s. I have been using OO for quite a while, but never tried macros.


Last edited by clemens on Tue Sep 28, 2010 2:29 am; edited 1 time in total
Back to top
View user's profile Send private message
Robert Tucker
Moderator
Moderator


Joined: 16 Aug 2004
Posts: 3407
Location: Manchester UK

PostPosted: Mon Sep 27, 2010 10:42 am    Post subject: Reply with quote

How about simple search and replace?
_________________
OpenOffice 4.0.0 and LibreOffice 4.x.x on Fedora 20, Ubuntu 13.10, Windows 8.1 Preview (Triple Boot)
Back to top
View user's profile Send private message
clemens
General User
General User


Joined: 23 Oct 2008
Posts: 21

PostPosted: Tue Sep 28, 2010 2:27 am    Post subject: Reply with quote

I don't understand how that would be possible, in the sense that I have to find the code snippets (all with a sligtly different number, hundreds altogether), find the related glyph online in the unicode search engine, and then paste it in place of the code.
Perhaps I wasn't clear enough in my initial post. Anyhow I just edited it a bit adding more information.
All the best
Back to top
View user's profile Send private message
Robert Tucker
Moderator
Moderator


Joined: 16 Aug 2004
Posts: 3407
Location: Manchester UK

PostPosted: Tue Sep 28, 2010 3:26 am    Post subject: Reply with quote

Perhaps you can use regex. Search for something like: &#([a-z0-9]*); and replace with something like \x$1; except that substituting with a Unicode number does not seem to work in OpenOffice Writer so you may want to look at (command line) Perl or sed or indeed a macro.

Quote:
\xXXXX
Represents a special character based on its four-digit hexadecimal code (XXXX).

OpenOffice Help files
_________________
OpenOffice 4.0.0 and LibreOffice 4.x.x on Fedora 20, Ubuntu 13.10, Windows 8.1 Preview (Triple Boot)
Back to top
View user's profile Send private message
clemens
General User
General User


Joined: 23 Oct 2008
Posts: 21

PostPosted: Wed Sep 29, 2010 12:22 am    Post subject: Reply with quote

Thank you for your reply.
I don't really know how to use regex and macros well, this was the first reason why I posted here, in the hope that someone could guide me through the process of making the actual macro.
As far as programming languages, I have studied Python a little bit, but I have never actually done anything yet with it, so I might as well say I am a complete beginner.
My idea is to make a macro that recognizes the code snippet (after all they are all included between <>, with the added ★&#x that always stays the same), get the number, find the related glyph and then replace the whole thing with the glyph itself.
All the best
Back to top
View user's profile Send private message
Robert Tucker
Moderator
Moderator


Joined: 16 Aug 2004
Posts: 3407
Location: Manchester UK

PostPosted: Wed Sep 29, 2010 4:02 am    Post subject: Reply with quote

Looking at it again I don't think my suggestion will work – one can't use back-references to input Unicode hex code it seems. Can you not convert your file into html, put:

<html>
<head></head>
<body>

at the top and:

</body>
</html>

at the bottom and search and replace all paragraph breaks with <br><br>, then open the file (having given it an .html extension) in your browser and copy paste the result out?

It might be useful if you state your operating system.

[Admittedly I'm not a macro writer!]
_________________
OpenOffice 4.0.0 and LibreOffice 4.x.x on Fedora 20, Ubuntu 13.10, Windows 8.1 Preview (Triple Boot)
Back to top
View user's profile Send private message
Robert Tucker
Moderator
Moderator


Joined: 16 Aug 2004
Posts: 3407
Location: Manchester UK

PostPosted: Wed Sep 29, 2010 6:02 am    Post subject: Reply with quote

In fact looking more closely it seems the conversion may be quite difficult. The 9471 in your example may not be a Unicode hex number but still a Shift-JIS hex number, see:

http://software.hixie.ch/utilities/unix/encoding-tools/MAPPINGS/SHIFTJIS.TXT
http://www.isthisthingon.org/unicode/index.phtml

You may need quite a special program (or macro) to do the conversion.
_________________
OpenOffice 4.0.0 and LibreOffice 4.x.x on Fedora 20, Ubuntu 13.10, Windows 8.1 Preview (Triple Boot)
Back to top
View user's profile Send private message
clemens
General User
General User


Joined: 23 Oct 2008
Posts: 21

PostPosted: Wed Sep 29, 2010 10:24 am    Post subject: Reply with quote

Actually the html trick worked! Thank you very much. The star glyph remained, but I just needed to delete them all, copy and paste on a new file and the trick was done.
Anyhow I still have one problem. I wasn't able to replace the the paragraph ends with the br code.
I tried a search an replace all ^p with <br><br>, but Writer doesn't find any ^p, which is strange because the original is actually composed of many lines of text.
What should I do?
Back to top
View user's profile Send private message
Robert Tucker
Moderator
Moderator


Joined: 16 Aug 2004
Posts: 3407
Location: Manchester UK

PostPosted: Wed Sep 29, 2010 11:51 am    Post subject: Reply with quote

If you install the OpenOffice extension AltSearch you can search for \p and replace with <br><br>.

However are you sure the glyph your browser is inserting is the correct one? If you have gone back to using the Shift-JIS file (not the Unicode one) possibly it is – anyway if you can read Japanese I guess you will know.
_________________
OpenOffice 4.0.0 and LibreOffice 4.x.x on Fedora 20, Ubuntu 13.10, Windows 8.1 Preview (Triple Boot)
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group