OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Word and character count in selection
Goto page 1, 2  Next
 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Code Snippets
View previous topic :: View next topic  
Author Message
JohnV
Administrator
Administrator


Joined: 07 Mar 2003
Posts: 9183
Location: Lexinton, Kentucky, USA

PostPosted: Sun Feb 15, 2004 4:06 pm    Post subject: Word and character count in selection Reply with quote

EDITED 2/24/04. The code below originally contained the line:
sExcludeFromCharacterCount$ = " " where the quotes contained a space and a tab
but when this was copied and pasted the tab was converted to several spaces. I have taken a different approach to avoid this problem. END EDIT

The subject of word and character counting has come up often lately and I have seen many referrals to Andrew Brown's (AB) word count macro. This macro is not accurate as noted by AB in the code comments. This can be seen by inserting OO's dummy text (dt + F3) and running the macro which shows that the OO program count for this is 292 Words and 1540 Characters while the macro returns values of 323 Words and 1542 Characters.

AB's macro is an effort to increase the speed of Daniel Vogelheim's (DV) original word count macro and also have it count words in footnotes. I believe that I first found the original DV macro in one of AB's earlier macro documents but I can no longer find it though I'm sure it exists somewhere.

Last year I played with DV's macro as part of my own learning experience with OO basic. I have not previous posted my code but I think it is important that a macro is available that provides an accurate word/character count of a text selection. DV's macro appeared accurate and I believe my own version is also. What my macro does and doesn't do are noted in its introductory comments. What code is mine and what is DV's is also noted. (I believe that DV was, and he still may be, a Sun employee. His name appears in the OO credits.)
Code:
' Based on the original* "dvwc" macro by Daniel Vogelheim   *see end of doc
' Displays a message box with number of words & characters
' in the document and the current selection.
' John Vigor edited this in 2003 to provide both a character count
' and a character count with exclusions. DV's had one of these.
' Does not normally count in frames, headers, footers or footnotes
' although these items may be individually selected. Selected text
' cannot exceed 64K of characters. (About 18 dense single spaced pages
' using New Times Roman size 12 and 1 inch margins all around. This
' size takes about 20 seconds on a 770MHz machine, so go get a cup
' of coffee or just be patient.)
' OO's word count, as of OO1.1 rc4, does not count in fields but does
' count in the other areas mentioned above. OO's character count
' counts a line break (Shift+Enter) as a character (Issue filed #16918).   

Sub SelectionCount
'DEFINE CHARACTER COUNT BELOW. The default exclusions from character count are
'spaces & tabs, i.e., one of each of these is contained in the definition of e$.
'You can add characters between the quotes and/or delete the space and/or tab. 
e$ = Chr(32) + Chr(9)  ' Chr(32) is a space, Chr(9) is a tab. Valid replacements would
'be e$ = Chr(32) or e$ = Chr(9) or e$ = "" with the latter being no exclusions. If you
'did not change e$ and if the line below read:
' sExcludeFromCharacterCount$ = e$ + "a" then spaces, tabs and the letter "a" would not
'be counted.   
sExcludeFromCharCount$ = e$ +  ""
'DEFINE WORD SEPERATORS BELOW. The default word separators are spaces and
'hyphens (true hyphenated words like "half-dollar" will be counted as two words
'instead of one). You can add separators between the quotes and/or delete the hyphen.
'Examples: "/" to count "and/or" as two words. "&" to count "Johnson & Johnson" as
'two words instead of three. A period is not normally needed but you can add one
'to count "www.website.com" as three words instead of one.   
sWordSeps = " -"
' This section is basically all DV's code with small modifications needed by JV
sWordSeps = sWordSeps + chr(9) + chr(10) + chr(13)'a tab, line break and paragraph break
sNeverCountChars = chr(10) & chr(13)'never include line or paragraph breaks in char count
oDocument = thisComponent
oSelection = oDocument.getCurrentSelection()
nSelCount = oSelection.getCount()
' access the program's document statistics
nAllChars = oDocument.CharacterCount
nAllWords = oDocument.WordCount
' initialize counts
nSelWords = 0 : nSelChars = 0 : nSelCharEx = 0
' iterate over multiple selections
Do
sText = oSelection.getByIndex(nSel).getString()
' count word in sText by scanning the selected text character for character
nCount = Len(sText)
bLastWasSeparator = true
bWord = false
'first letter starts a word
i = 1
Do ' DV used different logic for this section and there was nothing wrong
   ' with it. A programing exercise for JV and it better fit his needs.
sChr = Mid(sText,i,1)
If instr(sWordSeps, sChr) = 0 then 'if true then it's part of a word
  bWord = true
  GoSub CountIt 'count this character?
 Elseif bWord = True then 'is a seperator and at end of word.
  nSelWords = nSelWords + 1
  bWord = false
  GoSub CountIt
 Else
  GoSub CountIt 'is seperator but not the at end of a word.
EndIf
' End of JV's logic.
i = i + 1
Loop Until i > nCount 'get the next character in the string
nSel = nSel + 1
Loop while nSel < nSelCount
' Begin JV stuff
if bWord then nSelWords = nSelWords + 1
sExclude$ = ""
if Len(sExcludeFromCharCount$) = 0 then
 sExclude$ = "* No exclusions."
 else sExclude$ = Build_sExclude(sExcludeFromCharCount$)
endif
' JV altered DV's message box.
sT = chr(9): sP = chr(13) 'a Tab and Paragraph Break
a$ = "Program Document Count" + sP + sT & " All words:  " + nAllWords + sP
b$ = sT & " All chars:   " + nAllChars + sP + "Macro Selection Count" + sP
sMsg = a$ & b$
If nSelChars > 0 then
  a$ = sT & " Words:  " + nSelWords + sP + sT & " Chars:   " + nSelChars + sP + sT
  b$ = "   *  Chars:   " + nSelCharEx + sP & sExclude$
  sMsg = sMsg + a$ + b$
 Else a$ = "No text was selected or the selection" & sP & sT & "exceeded 64K characters."
  sMsg = sMsg + sT & a$
EndIf
msgbox sMsg
Exit Sub
CountIt: 'Going to count this character/excluded char count?
Select Case instr(sNeverCountChars,sChr)
 case = 0
         If instr(sExcludeFromCharCount$, sChr) = 0 then
          nSelCharEx = nSelCharEx + 1 : nSelChars = nSelChars + 1
         Else nSelChars = nSelChars + 1
        Endif          
End Select
Return
End Sub

' This JV function constructs the string that shows
' the excluded characters for the character count.
Function Build_sExclude(sExcludeFromCharCount$)
sExclude$ = "* Excluding "
sOthers = sExcludeFromCharCount$
iPos = instr(sOthers," ")
If iPos > 0 then
 Mid(sOthers,iPos,1,"")
 select Case len(sOthers)
  case 0 : sExclude$ = sExclude$ & "spaces."
  case > 0: If instr(sOthers,chr(9)) = 0 then
             sExclude$ = sExclude$ & "spaces and "
            Else sExclude$ = sExclude$ & "spaces"
            EndIf   
 end select
EndIf
iPos = instr(sOthers,chr(9))
If iPos > 0 then
 Mid(sOthers,iPos,1,"")
 Select Case len(sOthers)
  Case 0 : If len(sExclude$) < 13 then
            sExclude$ = sExclude$ & "tabs."
           Else sExclude$ = sExclude$ & " and tabs."
           EndIf   
  Case > 0: If len(sExclude$) < 13 then
             sExclude$ = sExclude$ & "tabs and "
            Else sExclude$ = sExclude$ & ", tabs and "
            EndIf    
 End Select 
EndIf
Build_sExclude = sExclude$ & sOthers
End Function
'* I can no longer find DV's original version although a faster
' modified version by Andrew Brown (version 2.0.2, Sept. 3, 2003)
' is currently available in the downloadable macro installer at:
' http://www.darwinwars.com/lunatic/bugs/oo_macros.html
' However, my tests do not indicate the counts are very accurate.
Back to top
View user's profile Send private message
schelle
General User
General User


Joined: 30 Oct 2003
Posts: 45
Location: Australia

PostPosted: Sun Apr 11, 2004 7:13 pm    Post subject: Reply with quote

Oh Dear... sorry to be a nuisance, but it seems once again the cut and paste process has changed something, and I can't work out what!!!

When I try and run this macro I get the message "Basic runtime error. Argument is not optional."

The text underlined below is highlighted as the problem.

Quote:
' This JV function constructs the string that shows
' the excluded characters for the character count.
Function Build_sExclude(sExcludeFromCharCount$)
sExclude$ = "* Excluding "
sOthers = sExcludeFromCharCount$
iPos = instr(sOthers," ")
If iPos > 0 then
Mid(sOthers,iPos,1,"")
select Case len(sOthers)


What am I missing this time?
_________________
"Poetry is the journal of a sea animal living on land, wanting to fly in the air."
(Carl Sandberg)
Back to top
View user's profile Send private message Visit poster's website
JohnV
Administrator
Administrator


Joined: 07 Mar 2003
Posts: 9183
Location: Lexinton, Kentucky, USA

PostPosted: Sun Apr 11, 2004 7:37 pm    Post subject: Reply with quote

Not sure what to tell you. I just copied and pasted the macro from here and got no error.

The function in which the error occurred must receive the argument, "sExcludeFromCharCount$" which is passed to it by the line, "else sExclude$ = Build_sExclude(sExcludeFromCharCount$)" so check this code line. It's about 30 lines up from where the error occurred.

If this doesn't fix it try to copy & paste the macro again. Do not copy to an intervening document, instead copy & paste directly to the OO IDE.

Let me know how you fare.
Back to top
View user's profile Send private message
schelle
General User
General User


Joined: 30 Oct 2003
Posts: 45
Location: Australia

PostPosted: Sun Apr 11, 2004 10:53 pm    Post subject: Reply with quote

Whoops... another trap for macro newbies... I had the shortcut key assigned to the wrong macro in the module (is this the right way to say that?)

In the new module which I created and named 'wordcount' there appeared the macros 'build_sExclude' and 'selection count'. Since there was no obvious 'Main', I assigned the shortcut to the first item in the list, which was 'build_sExclude'. This didn't work.

Following your suggestion I recopied but with no success, I tried checking the code but could find nothing different, then Idea point the shortcut to 'selection count' and everything works perfectly... obvious really... sorry to waste your time : )
_________________
"Poetry is the journal of a sea animal living on land, wanting to fly in the air."
(Carl Sandberg)
Back to top
View user's profile Send private message Visit poster's website
accabrown
Power User
Power User


Joined: 21 Apr 2004
Posts: 75
Location: England

PostPosted: Wed Apr 21, 2004 4:03 am    Post subject: the AB word count inaccuracy Reply with quote

The reason I don't use the DV (Daniel Vogelheim) code or its derivatives in my macro is that it is excruciatingly slow on large selections. In the end I compromised in a wholly arbitrary fashion and decided that anything less than about 200 characters would be counted char by char, which is slow but pretty accurate, and nything longer would be counted with the quick one.

If I knew how to write embedded components in Pythin, I could easily slurp the selecition into a string and count it very quicly indeed. But I don't.

Andrew Brown
Back to top
View user's profile Send private message Visit poster's website
JohnV
Administrator
Administrator


Joined: 07 Mar 2003
Posts: 9183
Location: Lexinton, Kentucky, USA

PostPosted: Wed Apr 21, 2004 6:47 pm    Post subject: Reply with quote

Andrew,

Nice to see you join us.

I fully understand your desire to speed up the word count and that a compromise was necessary to achieve it. On the other hand there are users that are paid by the exact word and/or character count and I think they they should have a tool available for that purpose.

I think what someone uses in this case is a matter of what he needs. Fast and close or slow and (hopefully) exact.

Your material resides on my machine and I have certainly benefited from it.

Cheers,
JohnV
Back to top
View user's profile Send private message
Guest






PostPosted: Sat Jun 12, 2004 2:57 am    Post subject: Reply with quote

Add en dash and em dash to the sWordSeps line in the macro for consistency with the program word count?

Code:
sWordSeps = " -"


James Naughton
Back to top
JohnV
Administrator
Administrator


Joined: 07 Mar 2003
Posts: 9183
Location: Lexinton, Kentucky, USA

PostPosted: Sun Jun 13, 2004 10:25 am    Post subject: Reply with quote

Sounds like a reasonable request. I'm away from my machine for a while but will try to remember when I return next week.
Back to top
View user's profile Send private message
thelusiv
General User
General User


Joined: 26 Jul 2004
Posts: 5

PostPosted: Mon Jul 26, 2004 6:09 pm    Post subject: Reply with quote

i want to use this with a spread sheet. what do i need to change? if i try to run it while selecting a cell, or any amount of text in a cell, it says "BASIC runtime error. Property or method not found." does this need to be changed for calc?
Code:
oSelection = oDocument. getCurrentSelection()
Back to top
View user's profile Send private message Visit poster's website AIM Address
SergeM
Super User
Super User


Joined: 09 Sep 2003
Posts: 3211
Location: Troyes France

PostPosted: Mon Jul 26, 2004 10:19 pm    Post subject: Reply with quote

Change this code line as
Code:

oSelection = oDocument.CurrentSelection

_________________
Linux & Windows OOo3.0
UNO & C++ : WIKI
http://wiki.services.openoffice.org/wiki/Using_Cpp_with_the_OOo_SDK
In French
http://wiki.services.openoffice.org/wiki/Documentation/FR/Cpp_Guide
Back to top
View user's profile Send private message Visit poster's website
SergeM
Super User
Super User


Joined: 09 Sep 2003
Posts: 3211
Location: Troyes France

PostPosted: Mon Jul 26, 2004 10:28 pm    Post subject: Reply with quote

Your problem is not here in fact. getCurrentSelection() works fine too

I think your problem is elsewhere : a current selection of a group of cells is not a text even if there is only strings in your cells...
_________________
Linux & Windows OOo3.0
UNO & C++ : WIKI
http://wiki.services.openoffice.org/wiki/Using_Cpp_with_the_OOo_SDK
In French
http://wiki.services.openoffice.org/wiki/Documentation/FR/Cpp_Guide
Back to top
View user's profile Send private message Visit poster's website
thelusiv
General User
General User


Joined: 26 Jul 2004
Posts: 5

PostPosted: Tue Jul 27, 2004 5:29 am    Post subject: Reply with quote

OK, well let me give you a little background. My wife does database entry and she uses OOo to make xls files, which she sends to her manager, who imports them into a Notes database. Something screwy about the Excel -> Notes filter requires one of the fields to be no longer than 32 characters. She would like to be able to quickly count the characters in just one field to see if they are above or below this limit. So she does not really need to be able to select multiple cells, in fact she could highlight the text itself after selecting a single cell. How could this be made to work? I am not unfamiliar with BASIC but have not yet had a chance to look at the OOo API. I will check that out soon. Thanks
Back to top
View user's profile Send private message Visit poster's website AIM Address
SergeM
Super User
Super User


Joined: 09 Sep 2003
Posts: 3211
Location: Troyes France

PostPosted: Tue Jul 27, 2004 8:43 am    Post subject: Reply with quote

and what do you want to do when the string length is longer than 32 characters ?
_________________
Linux & Windows OOo3.0
UNO & C++ : WIKI
http://wiki.services.openoffice.org/wiki/Using_Cpp_with_the_OOo_SDK
In French
http://wiki.services.openoffice.org/wiki/Documentation/FR/Cpp_Guide
Back to top
View user's profile Send private message Visit poster's website
thelusiv
General User
General User


Joined: 26 Jul 2004
Posts: 5

PostPosted: Tue Jul 27, 2004 9:48 am    Post subject: Reply with quote

I just want it to report the length. She can adjust the field herself if it is too long, or maybe I could write another macro to do that...
Back to top
View user's profile Send private message Visit poster's website AIM Address
SergeM
Super User
Super User


Joined: 09 Sep 2003
Posts: 3211
Location: Troyes France

PostPosted: Tue Jul 27, 2004 9:56 am    Post subject: Reply with quote

I will try but I am too tired for today (I worked on UNO/C++ all the day long)
I am not at home tomorrow.
If you can wait, answer on this thread to put it as unread : so i will not forget it.
_________________
Linux & Windows OOo3.0
UNO & C++ : WIKI
http://wiki.services.openoffice.org/wiki/Using_Cpp_with_the_OOo_SDK
In French
http://wiki.services.openoffice.org/wiki/Documentation/FR/Cpp_Guide
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Code Snippets All times are GMT - 8 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group