OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Macro or built-in function to convert document encoding?

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Writer
View previous topic :: View next topic  
Author Message
charlener0
General User
General User


Joined: 12 Nov 2008
Posts: 5

PostPosted: Wed Nov 12, 2008 9:14 pm    Post subject: Macro or built-in function to convert document encoding? Reply with quote

Hi,

Is there a way to convert a document's encoding/charset on the fly when loading a file, or after loading? I'm working in a country where many still don't use Unicode and the files I get (using Windows-1251) are gobbledeygook. I use this online conversion utility - http://badaa.mngl.net/convert/con2uni.htm - for some shorter items, but when you have a more complex document with tables and layouts, it tends to lose formatting and I have to do it section by section.

Thoughts?

Thanks,
Charlene
Back to top
View user's profile Send private message
B Marcelly
Super User
Super User


Joined: 12 May 2004
Posts: 1453
Location: France

PostPosted: Wed Nov 12, 2008 11:32 pm    Post subject: Reply with quote

Hi,
I suppose these files are "pure ascii" text documents.

1 - In OpenOffice : menu File > Open
2 - In the dialog window, click the drop-down "File type", choose Text encoded (*.txt)
3 - if the document has another extension than .txt type * as the file name and click Open to see all documents
4 - Choose the document, click Open
5 - the ASCII filter option dialog appears. Choose the correct encoding.

______
Bernard
Back to top
View user's profile Send private message Visit poster's website
charlener0
General User
General User


Joined: 12 Nov 2008
Posts: 5

PostPosted: Wed Nov 12, 2008 11:49 pm    Post subject: Reply with quote

Hi,

Wish it was that easy but unfortunately not. These are word documents, excel spreadsheets, powerpoints, etc. The ones I'm particularly concerned with at the moment are word docs but I do come across the others as well too.

Unfortunately this is mostly due to Mongolian having extra cyrillic characters that wouldn't work with a typical russian layout, and someone made some utility (MonKey) to write using 1251 with some special fonts, and even though unicode is supported in modern systems there are people who are still using the older software...
Back to top
View user's profile Send private message
charlener0
General User
General User


Joined: 12 Nov 2008
Posts: 5

PostPosted: Thu Nov 27, 2008 1:45 am    Post subject: any other ideas? Reply with quote

Hey all, I'm still looking. I've been looking at this source code to see how to do it in other languages, but my base language now is mostly PHP and I couldn't get it to behave in there...

http://badaa.mngl.net/convert/convert.js

Thoughts?

Charlene
Back to top
View user's profile Send private message
bataak
Newbie
Newbie


Joined: 04 Feb 2009
Posts: 3

PostPosted: Wed Feb 04, 2009 11:04 am    Post subject: Reply with quote

Hi, try this macro file http://mn-spell.googlecode.com/files/OOoUnicodeConverter-1.0.0.oxt
Back to top
View user's profile Send private message
charlener0
General User
General User


Joined: 12 Nov 2008
Posts: 5

PostPosted: Sun Feb 08, 2009 3:43 am    Post subject: Reply with quote

hey, that worked pretty well! I read up and looked at examples and modified your code to be a little more "one size fits all" and checks spreadsheet sheet names too. However, my attempt to get it to work for presentations didn't work...it seems to make Impress hang then I have to kill it to access OO again. Any ideas?:

Code:

REM  *****  BASIC  *****
REM This macro file is based on Dmitry G. Mastrukov, A. Novodroskii's Recode from cp1252 to cp1251 for Excel/Word files without language set. GPL lisension
REM Modified by Dorjgotov Batumongke 2009
REM Modified more by Charlene Barina

function fnWhichComponent(oDoc) as string
if HasUnoInterfaces(oDoc, "com.sun.star.lang.XServiceInfo") then
   if thisComponent.supportsService ("com.sun.star.text.GenericTextDocument") then
      fnWhichComponent = "Text"
   elseif thisComponent.supportsService("com.sun.star.sheet.SpreadsheetDocument") then
      fnWhichComponent = "Spreadsheet"
   elseif thisComponent.supportsService("com.sun.star.presentation.PresentationDocument") then
      fnWhichComponent = "Presentation"
   elseif thisComponent.supportsService("com.sun.star.drawing.GenericDrawingDocument") then
      fnWhichComponent = "Drawing"
   else
      fnWhichComponent = "Oops current document something else"
   end if
else
   fnWhichComponent = "Not a document"
end if
End function


Sub Mon1251toUnicode
  Dim I As Long
  Dim oDoc As Object
  Dim oReplace As Object
  Dim mUTF8(70) As String
  Dim mCP1251(70) As String

  mUTF8() = Array  ("à", "á", "â", "ã", "ä", "å", "¸", "æ", "ç", "è", "é", "ê", "ë", _
                 "ì", "í", "î", "º", "ï", "ð", "ñ", "ò", "ó", "¿", "ô", "õ", "ö", _
                 "÷", "ø", "ù", "û", "ý", "þ", "ÿ", "À", "Á", "Â", "Ã", "Ä", "Å", _
                 "¨", "Æ", "Ç", "È", "É", "Ê", "Ë", "Ì", "Í", "Î", "ª", "Ï", "Ð", _
                 "Ñ", "Ò", "Ó", "¯", "Ô", "Õ", "Ö", "×", "Ø", "Ù", "Û", "Ý", "Þ", _
                 "ß","ü","ú","Ü","Ú")
  mCP1251() = Array("а", "б", "в", "г", "д", "е", "ё", "ж", "з", "и", "й", "к", "л", _
                 "м", "н", "о", "ө", "п", "р", "с", "т", "у", "ү", "ф", "х", "ц", _
                 "ч", "ш", "щ", "ы", "э", "ю", "я", "А", "Б", "В", "Г", "Д", "Е", _
                 "Ё", "Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "Ө", "П", "Р", _
                 "С", "Т", "У", "Ү", "Ф", "Х", "Ц", "Ч", "Ш", "Щ", "Ы", "Э", "Ю", _
                 "Я","ь","ъ","Ь","Ъ")

  oDoc = ThisComponent
 
  'Still need to switch fonts from Mon to not-Mon 
  if fnWhichComponent(thisComponent) = "Text" then
     oReplace = oDoc.createReplaceDescriptor
     oReplace.searchAll=True
     oReplace.SearchCaseSensitive = TRUE
     For I = LBound( mUTF8() ) to UBound( mUTF8() )
          oReplace.SearchString = mUTF8(I)
          oReplace.ReplaceString = mCP1251(I)
          oDoc.replaceAll(oReplace)          
       Next I
    
     'Changes font
     Cursor = oDoc.Text.createTextCursor
     Cursor.gotoStart(False)
     Do
        Cursor.gotoEndofParagraph(True)
        if Cursor.CharFontName = "Times New Roman Mon" then
           Cursor.CharFontName = "Times New Roman"
        elseif Cursor.CharFontName = "Arial Mon" then
           Cursor.CharFontName = "Arial"
        endif
        Proceed = Cursor.gotoNextParagraph(False)
     Loop While Proceed
     msgbox("Дууслаа!")
             
  elseif fnWhichComponent(thisComponent) = "Spreadsheet" then
     For m = 0 to oDoc.Sheets.Count - 1
        Sheet = oDoc.Sheets(m)
          oReplace = Sheet.createReplaceDescriptor
          oReplace.SearchCaseSensitive = TRUE
       sheetName = Sheet.getName        
        For I = lbound(mUTF8()) To ubound(mUTF8())
           oReplace.SearchString = mUTF8(I)
          oReplace.ReplaceString = mCP1251(I)
          Sheet.replaceAll(oReplace)
          sheetName = Replace(sheetName,mUTF8(I),mCP1251(I))
        Next I    
        Sheet.setname(sheetName)
        'not as pretty, but font change
        Sheet.CharFontName = "Times New Roman"
     Next m 
     msgbox("Дууслаа!")
     
  'currently broken
  elseif fnWhichComponent(thisComponent) = "Presentation" then
       Slide = oDoc.DrawPages(0)
     oReplace = Slide.createReplaceDescriptor()
     oReplace.SearchCaseSensitive = TRUE
     For m = 0 to oDoc.Drawpages.Count - 1
        Slide = oDoc.DrawPages(m)
        For I = lbound(mUTF8()) To ubound(mUTF8())
          oReplace.SearchString = mUTF8(I)       
          oReplace.ReplaceString = mCP1251(I)
          Slide.replaceAll(oReplace)
        Next I    
        'font change to arial
     Next m
     msgbox("Дууслаа!")
   
  else
     msgbox "Уучлаарай. Үүнийг энд баримтын дээр хэрэглэх чадахгүй.", 16, "Error"
     exit sub
  end if
End Sub
Back to top
View user's profile Send private message
bataak
Newbie
Newbie


Joined: 04 Feb 2009
Posts: 3

PostPosted: Wed Feb 11, 2009 5:55 pm    Post subject: Reply with quote

Hi Charlene! Nice code. Maybe it needs to use reload function into impress.
Back to top
View user's profile Send private message
charlener0
General User
General User


Joined: 12 Nov 2008
Posts: 5

PostPosted: Sat Feb 14, 2009 8:12 pm    Post subject: Reply with quote

Hmm. Tried a reload but it didn't work. When you run the macro Impress (and openoffice in general) just freezes so you have to end the process.

Any other ideas?
Back to top
View user's profile Send private message
bataak
Newbie
Newbie


Joined: 04 Feb 2009
Posts: 3

PostPosted: Thu Feb 19, 2009 9:24 pm    Post subject: Reply with quote

Now it's ok. http://mn-spell.googlecode.com/files/OOoUnicodeConverter-1.2.3.oxt

It's an important problem to change font name but it doesn't work in footer and header. How to fix that?
Back to top
View user's profile Send private message
davesiberia
Newbie
Newbie


Joined: 08 Aug 2008
Posts: 3
Location: Krasnoyarsk, Russia

PostPosted: Tue May 12, 2009 9:31 pm    Post subject: Other macro Reply with quote

I am embarrassed to offer this, but maybe it will help in your situation.

This is what I came up with to address the same problem of transcoding correctly into Cyrillic.

What I would like to add (either to my macro or the one above) is at the end of the process to select the text modified and change the language to Russian.

I would also like to know how to delete bookmarks from a macro,

Dave.

Code:

sub LATtoCYR
rem --------------------------------------------------------------------
rem define variables
dim document   as object
dim dispatcher as object
rem ----------------------------------------------------------------------
rem get access to the document
document   = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")

rem --- mark the current place to return to after replace -------
dim args1(0) as new com.sun.star.beans.PropertyValue
args1(0).Name = "Bookmark"
args1(0).Value = "kjsdkjbsjbkjbds2342kjb"

dispatcher.executeDispatch(document, ".uno:InsertBookmark", "", 0, args1())


rem ----------------------------------------------------------------------
dispatcher.executeDispatch(document, ".uno:Cut", "", 0, Array())

rem -----------------Set up arguments for search and replace--------------
dim args2(18) as new com.sun.star.beans.PropertyValue
args2(0).Name = "SearchItem.StyleFamily"
args2(0).Value = 2
args2(1).Name = "SearchItem.CellType"
args2(1).Value = 0
args2(2).Name = "SearchItem.RowDirection"
args2(2).Value = true
args2(3).Name = "SearchItem.AllTables"
args2(3).Value = false
args2(4).Name = "SearchItem.Backward"
args2(4).Value = false
args2(5).Name = "SearchItem.Pattern"
args2(5).Value = false
args2(6).Name = "SearchItem.Content"
args2(6).Value = false
args2(7).Name = "SearchItem.AsianOptions"
args2(7).Value = false
args2(8).Name = "SearchItem.AlgorithmType"
args2(8).Value = 0
args2(9).Name = "SearchItem.SearchFlags"
args2(9).Value = 65536
args2(10).Name = "SearchItem.SearchString"
args2(10).Value = "à"
args2(11).Name = "SearchItem.ReplaceString"
args2(11).Value = "а"
args2(12).Name = "SearchItem.Locale"
args2(12).Value = 255
args2(13).Name = "SearchItem.ChangedChars"
args2(13).Value = 2
args2(14).Name = "SearchItem.DeletedChars"
args2(14).Value = 2
args2(15).Name = "SearchItem.InsertedChars"
args2(15).Value = 2
args2(16).Name = "SearchItem.TransliterateFlags"
args2(16).Value = 1024
args2(17).Name = "SearchItem.Command"
args2(17).Value = 3
args2(18).Name = "Quiet"
args2(18).Value = true

rem ----------- for each of 32 letters in the Russian alphabet  ------------------------------
rem ----------- search for args2(10) and replace with the correct encoding args2(11) ---------
rem ----------- first lower case, then upper case
FOR i = 0 to 32

dispatcher.executeDispatch(document, ".uno:GoToStartOfDoc", "", 0, Array())
args2(10).Value = chr$ ( ASC ("à") + i )
args2(11).Value = chr$ ( ASC ("а") + i )
dispatcher.executeDispatch(document, ".uno:ExecuteSearch", "", 0, args2())

next i

FOR i = 0 to 32

dispatcher.executeDispatch(document, ".uno:GoToStartOfDoc", "", 0, Array())
args2(10).Value = chr$ ( ASC ("À") + i )
args2(11).Value = chr$ ( ASC ("А") + i )
dispatcher.executeDispatch(document, ".uno:ExecuteSearch", "", 0, args2())

next i

rem ---- Return to bookmark -----
dispatcher.executeDispatch(document, ".uno:JumpToNextBookmark", "", 0, Array())

end sub
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Writer All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group