| View previous topic :: View next topic |
| Author |
Message |
charlener0 General User

Joined: 12 Nov 2008 Posts: 5
|
Posted: Wed Nov 12, 2008 9:14 pm Post subject: Macro or built-in function to convert document encoding? |
|
|
Hi,
Is there a way to convert a document's encoding/charset on the fly when loading a file, or after loading? I'm working in a country where many still don't use Unicode and the files I get (using Windows-1251) are gobbledeygook. I use this online conversion utility - http://badaa.mngl.net/convert/con2uni.htm - for some shorter items, but when you have a more complex document with tables and layouts, it tends to lose formatting and I have to do it section by section.
Thoughts?
Thanks,
Charlene |
|
| Back to top |
|
 |
B Marcelly Super User

Joined: 12 May 2004 Posts: 1414 Location: France
|
Posted: Wed Nov 12, 2008 11:32 pm Post subject: |
|
|
Hi,
I suppose these files are "pure ascii" text documents.
1 - In OpenOffice : menu File > Open
2 - In the dialog window, click the drop-down "File type", choose Text encoded (*.txt)
3 - if the document has another extension than .txt type * as the file name and click Open to see all documents
4 - Choose the document, click Open
5 - the ASCII filter option dialog appears. Choose the correct encoding.
______
Bernard |
|
| Back to top |
|
 |
charlener0 General User

Joined: 12 Nov 2008 Posts: 5
|
Posted: Wed Nov 12, 2008 11:49 pm Post subject: |
|
|
Hi,
Wish it was that easy but unfortunately not. These are word documents, excel spreadsheets, powerpoints, etc. The ones I'm particularly concerned with at the moment are word docs but I do come across the others as well too.
Unfortunately this is mostly due to Mongolian having extra cyrillic characters that wouldn't work with a typical russian layout, and someone made some utility (MonKey) to write using 1251 with some special fonts, and even though unicode is supported in modern systems there are people who are still using the older software... |
|
| Back to top |
|
 |
charlener0 General User

Joined: 12 Nov 2008 Posts: 5
|
Posted: Thu Nov 27, 2008 1:45 am Post subject: any other ideas? |
|
|
Hey all, I'm still looking. I've been looking at this source code to see how to do it in other languages, but my base language now is mostly PHP and I couldn't get it to behave in there...
http://badaa.mngl.net/convert/convert.js
Thoughts?
Charlene |
|
| Back to top |
|
 |
bataak Newbie

Joined: 04 Feb 2009 Posts: 3
|
|
| Back to top |
|
 |
charlener0 General User

Joined: 12 Nov 2008 Posts: 5
|
Posted: Sun Feb 08, 2009 3:43 am Post subject: |
|
|
hey, that worked pretty well! I read up and looked at examples and modified your code to be a little more "one size fits all" and checks spreadsheet sheet names too. However, my attempt to get it to work for presentations didn't work...it seems to make Impress hang then I have to kill it to access OO again. Any ideas?:
| Code: |
REM ***** BASIC *****
REM This macro file is based on Dmitry G. Mastrukov, A. Novodroskii's Recode from cp1252 to cp1251 for Excel/Word files without language set. GPL lisension
REM Modified by Dorjgotov Batumongke 2009
REM Modified more by Charlene Barina
function fnWhichComponent(oDoc) as string
if HasUnoInterfaces(oDoc, "com.sun.star.lang.XServiceInfo") then
if thisComponent.supportsService ("com.sun.star.text.GenericTextDocument") then
fnWhichComponent = "Text"
elseif thisComponent.supportsService("com.sun.star.sheet.SpreadsheetDocument") then
fnWhichComponent = "Spreadsheet"
elseif thisComponent.supportsService("com.sun.star.presentation.PresentationDocument") then
fnWhichComponent = "Presentation"
elseif thisComponent.supportsService("com.sun.star.drawing.GenericDrawingDocument") then
fnWhichComponent = "Drawing"
else
fnWhichComponent = "Oops current document something else"
end if
else
fnWhichComponent = "Not a document"
end if
End function
Sub Mon1251toUnicode
Dim I As Long
Dim oDoc As Object
Dim oReplace As Object
Dim mUTF8(70) As String
Dim mCP1251(70) As String
mUTF8() = Array ("à", "á", "â", "ã", "ä", "å", "¸", "æ", "ç", "è", "é", "ê", "ë", _
"ì", "í", "î", "º", "ï", "ð", "ñ", "ò", "ó", "¿", "ô", "õ", "ö", _
"÷", "ø", "ù", "û", "ý", "þ", "ÿ", "À", "Á", "Â", "Ã", "Ä", "Å", _
"¨", "Æ", "Ç", "È", "É", "Ê", "Ë", "Ì", "Í", "Î", "ª", "Ï", "Ð", _
"Ñ", "Ò", "Ó", "¯", "Ô", "Õ", "Ö", "×", "Ø", "Ù", "Û", "Ý", "Þ", _
"ß","ü","ú","Ü","Ú")
mCP1251() = Array("а", "б", "в", "г", "д", "е", "ё", "ж", "з", "и", "й", "к", "л", _
"м", "н", "о", "ө", "п", "р", "с", "т", "у", "ү", "ф", "х", "ц", _
"ч", "ш", "щ", "ы", "э", "ю", "я", "А", "Б", "В", "Г", "Д", "Е", _
"Ё", "Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "Ө", "П", "Р", _
"С", "Т", "У", "Ү", "Ф", "Х", "Ц", "Ч", "Ш", "Щ", "Ы", "Э", "Ю", _
"Я","ь","ъ","Ь","Ъ")
oDoc = ThisComponent
'Still need to switch fonts from Mon to not-Mon
if fnWhichComponent(thisComponent) = "Text" then
oReplace = oDoc.createReplaceDescriptor
oReplace.searchAll=True
oReplace.SearchCaseSensitive = TRUE
For I = LBound( mUTF8() ) to UBound( mUTF8() )
oReplace.SearchString = mUTF8(I)
oReplace.ReplaceString = mCP1251(I)
oDoc.replaceAll(oReplace)
Next I
'Changes font
Cursor = oDoc.Text.createTextCursor
Cursor.gotoStart(False)
Do
Cursor.gotoEndofParagraph(True)
if Cursor.CharFontName = "Times New Roman Mon" then
Cursor.CharFontName = "Times New Roman"
elseif Cursor.CharFontName = "Arial Mon" then
Cursor.CharFontName = "Arial"
endif
Proceed = Cursor.gotoNextParagraph(False)
Loop While Proceed
msgbox("Дууслаа!")
elseif fnWhichComponent(thisComponent) = "Spreadsheet" then
For m = 0 to oDoc.Sheets.Count - 1
Sheet = oDoc.Sheets(m)
oReplace = Sheet.createReplaceDescriptor
oReplace.SearchCaseSensitive = TRUE
sheetName = Sheet.getName
For I = lbound(mUTF8()) To ubound(mUTF8())
oReplace.SearchString = mUTF8(I)
oReplace.ReplaceString = mCP1251(I)
Sheet.replaceAll(oReplace)
sheetName = Replace(sheetName,mUTF8(I),mCP1251(I))
Next I
Sheet.setname(sheetName)
'not as pretty, but font change
Sheet.CharFontName = "Times New Roman"
Next m
msgbox("Дууслаа!")
'currently broken
elseif fnWhichComponent(thisComponent) = "Presentation" then
Slide = oDoc.DrawPages(0)
oReplace = Slide.createReplaceDescriptor()
oReplace.SearchCaseSensitive = TRUE
For m = 0 to oDoc.Drawpages.Count - 1
Slide = oDoc.DrawPages(m)
For I = lbound(mUTF8()) To ubound(mUTF8())
oReplace.SearchString = mUTF8(I)
oReplace.ReplaceString = mCP1251(I)
Slide.replaceAll(oReplace)
Next I
'font change to arial
Next m
msgbox("Дууслаа!")
else
msgbox "Уучлаарай. Үүнийг энд баримтын дээр хэрэглэх чадахгүй.", 16, "Error"
exit sub
end if
End Sub
|
|
|
| Back to top |
|
 |
bataak Newbie

Joined: 04 Feb 2009 Posts: 3
|
Posted: Wed Feb 11, 2009 5:55 pm Post subject: |
|
|
| Hi Charlene! Nice code. Maybe it needs to use reload function into impress. |
|
| Back to top |
|
 |
charlener0 General User

Joined: 12 Nov 2008 Posts: 5
|
Posted: Sat Feb 14, 2009 8:12 pm Post subject: |
|
|
Hmm. Tried a reload but it didn't work. When you run the macro Impress (and openoffice in general) just freezes so you have to end the process.
Any other ideas? |
|
| Back to top |
|
 |
bataak Newbie

Joined: 04 Feb 2009 Posts: 3
|
|
| Back to top |
|
 |
davesiberia Newbie

Joined: 08 Aug 2008 Posts: 3 Location: Krasnoyarsk, Russia
|
Posted: Tue May 12, 2009 9:31 pm Post subject: Other macro |
|
|
I am embarrassed to offer this, but maybe it will help in your situation.
This is what I came up with to address the same problem of transcoding correctly into Cyrillic.
What I would like to add (either to my macro or the one above) is at the end of the process to select the text modified and change the language to Russian.
I would also like to know how to delete bookmarks from a macro,
Dave.
| Code: |
sub LATtoCYR
rem --------------------------------------------------------------------
rem define variables
dim document as object
dim dispatcher as object
rem ----------------------------------------------------------------------
rem get access to the document
document = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
rem --- mark the current place to return to after replace -------
dim args1(0) as new com.sun.star.beans.PropertyValue
args1(0).Name = "Bookmark"
args1(0).Value = "kjsdkjbsjbkjbds2342kjb"
dispatcher.executeDispatch(document, ".uno:InsertBookmark", "", 0, args1())
rem ----------------------------------------------------------------------
dispatcher.executeDispatch(document, ".uno:Cut", "", 0, Array())
rem -----------------Set up arguments for search and replace--------------
dim args2(18) as new com.sun.star.beans.PropertyValue
args2(0).Name = "SearchItem.StyleFamily"
args2(0).Value = 2
args2(1).Name = "SearchItem.CellType"
args2(1).Value = 0
args2(2).Name = "SearchItem.RowDirection"
args2(2).Value = true
args2(3).Name = "SearchItem.AllTables"
args2(3).Value = false
args2(4).Name = "SearchItem.Backward"
args2(4).Value = false
args2(5).Name = "SearchItem.Pattern"
args2(5).Value = false
args2(6).Name = "SearchItem.Content"
args2(6).Value = false
args2(7).Name = "SearchItem.AsianOptions"
args2(7).Value = false
args2(8).Name = "SearchItem.AlgorithmType"
args2(8).Value = 0
args2(9).Name = "SearchItem.SearchFlags"
args2(9).Value = 65536
args2(10).Name = "SearchItem.SearchString"
args2(10).Value = "à"
args2(11).Name = "SearchItem.ReplaceString"
args2(11).Value = "а"
args2(12).Name = "SearchItem.Locale"
args2(12).Value = 255
args2(13).Name = "SearchItem.ChangedChars"
args2(13).Value = 2
args2(14).Name = "SearchItem.DeletedChars"
args2(14).Value = 2
args2(15).Name = "SearchItem.InsertedChars"
args2(15).Value = 2
args2(16).Name = "SearchItem.TransliterateFlags"
args2(16).Value = 1024
args2(17).Name = "SearchItem.Command"
args2(17).Value = 3
args2(18).Name = "Quiet"
args2(18).Value = true
rem ----------- for each of 32 letters in the Russian alphabet ------------------------------
rem ----------- search for args2(10) and replace with the correct encoding args2(11) ---------
rem ----------- first lower case, then upper case
FOR i = 0 to 32
dispatcher.executeDispatch(document, ".uno:GoToStartOfDoc", "", 0, Array())
args2(10).Value = chr$ ( ASC ("à") + i )
args2(11).Value = chr$ ( ASC ("а") + i )
dispatcher.executeDispatch(document, ".uno:ExecuteSearch", "", 0, args2())
next i
FOR i = 0 to 32
dispatcher.executeDispatch(document, ".uno:GoToStartOfDoc", "", 0, Array())
args2(10).Value = chr$ ( ASC ("À") + i )
args2(11).Value = chr$ ( ASC ("А") + i )
dispatcher.executeDispatch(document, ".uno:ExecuteSearch", "", 0, args2())
next i
rem ---- Return to bookmark -----
dispatcher.executeDispatch(document, ".uno:JumpToNextBookmark", "", 0, Array())
end sub
|
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|