| View previous topic :: View next topic |
| Author |
Message |
JohnV Administrator

Joined: 07 Mar 2003 Posts: 8979 Location: Lexinton, Kentucky, USA
|
Posted: Sun Feb 15, 2004 4:06 pm Post subject: Word and character count in selection |
|
|
EDITED 2/24/04. The code below originally contained the line:
sExcludeFromCharacterCount$ = " " where the quotes contained a space and a tab
but when this was copied and pasted the tab was converted to several spaces. I have taken a different approach to avoid this problem. END EDIT
The subject of word and character counting has come up often lately and I have seen many referrals to Andrew Brown's (AB) word count macro. This macro is not accurate as noted by AB in the code comments. This can be seen by inserting OO's dummy text (dt + F3) and running the macro which shows that the OO program count for this is 292 Words and 1540 Characters while the macro returns values of 323 Words and 1542 Characters.
AB's macro is an effort to increase the speed of Daniel Vogelheim's (DV) original word count macro and also have it count words in footnotes. I believe that I first found the original DV macro in one of AB's earlier macro documents but I can no longer find it though I'm sure it exists somewhere.
Last year I played with DV's macro as part of my own learning experience with OO basic. I have not previous posted my code but I think it is important that a macro is available that provides an accurate word/character count of a text selection. DV's macro appeared accurate and I believe my own version is also. What my macro does and doesn't do are noted in its introductory comments. What code is mine and what is DV's is also noted. (I believe that DV was, and he still may be, a Sun employee. His name appears in the OO credits.)
| Code: | ' Based on the original* "dvwc" macro by Daniel Vogelheim *see end of doc
' Displays a message box with number of words & characters
' in the document and the current selection.
' John Vigor edited this in 2003 to provide both a character count
' and a character count with exclusions. DV's had one of these.
' Does not normally count in frames, headers, footers or footnotes
' although these items may be individually selected. Selected text
' cannot exceed 64K of characters. (About 18 dense single spaced pages
' using New Times Roman size 12 and 1 inch margins all around. This
' size takes about 20 seconds on a 770MHz machine, so go get a cup
' of coffee or just be patient.)
' OO's word count, as of OO1.1 rc4, does not count in fields but does
' count in the other areas mentioned above. OO's character count
' counts a line break (Shift+Enter) as a character (Issue filed #16918).
Sub SelectionCount
'DEFINE CHARACTER COUNT BELOW. The default exclusions from character count are
'spaces & tabs, i.e., one of each of these is contained in the definition of e$.
'You can add characters between the quotes and/or delete the space and/or tab.
e$ = Chr(32) + Chr(9) ' Chr(32) is a space, Chr(9) is a tab. Valid replacements would
'be e$ = Chr(32) or e$ = Chr(9) or e$ = "" with the latter being no exclusions. If you
'did not change e$ and if the line below read:
' sExcludeFromCharacterCount$ = e$ + "a" then spaces, tabs and the letter "a" would not
'be counted.
sExcludeFromCharCount$ = e$ + ""
'DEFINE WORD SEPERATORS BELOW. The default word separators are spaces and
'hyphens (true hyphenated words like "half-dollar" will be counted as two words
'instead of one). You can add separators between the quotes and/or delete the hyphen.
'Examples: "/" to count "and/or" as two words. "&" to count "Johnson & Johnson" as
'two words instead of three. A period is not normally needed but you can add one
'to count "www.website.com" as three words instead of one.
sWordSeps = " -"
' This section is basically all DV's code with small modifications needed by JV
sWordSeps = sWordSeps + chr(9) + chr(10) + chr(13)'a tab, line break and paragraph break
sNeverCountChars = chr(10) & chr(13)'never include line or paragraph breaks in char count
oDocument = thisComponent
oSelection = oDocument.getCurrentSelection()
nSelCount = oSelection.getCount()
' access the program's document statistics
nAllChars = oDocument.CharacterCount
nAllWords = oDocument.WordCount
' initialize counts
nSelWords = 0 : nSelChars = 0 : nSelCharEx = 0
' iterate over multiple selections
Do
sText = oSelection.getByIndex(nSel).getString()
' count word in sText by scanning the selected text character for character
nCount = Len(sText)
bLastWasSeparator = true
bWord = false
'first letter starts a word
i = 1
Do ' DV used different logic for this section and there was nothing wrong
' with it. A programing exercise for JV and it better fit his needs.
sChr = Mid(sText,i,1)
If instr(sWordSeps, sChr) = 0 then 'if true then it's part of a word
bWord = true
GoSub CountIt 'count this character?
Elseif bWord = True then 'is a seperator and at end of word.
nSelWords = nSelWords + 1
bWord = false
GoSub CountIt
Else
GoSub CountIt 'is seperator but not the at end of a word.
EndIf
' End of JV's logic.
i = i + 1
Loop Until i > nCount 'get the next character in the string
nSel = nSel + 1
Loop while nSel < nSelCount
' Begin JV stuff
if bWord then nSelWords = nSelWords + 1
sExclude$ = ""
if Len(sExcludeFromCharCount$) = 0 then
sExclude$ = "* No exclusions."
else sExclude$ = Build_sExclude(sExcludeFromCharCount$)
endif
' JV altered DV's message box.
sT = chr(9): sP = chr(13) 'a Tab and Paragraph Break
a$ = "Program Document Count" + sP + sT & " All words: " + nAllWords + sP
b$ = sT & " All chars: " + nAllChars + sP + "Macro Selection Count" + sP
sMsg = a$ & b$
If nSelChars > 0 then
a$ = sT & " Words: " + nSelWords + sP + sT & " Chars: " + nSelChars + sP + sT
b$ = " * Chars: " + nSelCharEx + sP & sExclude$
sMsg = sMsg + a$ + b$
Else a$ = "No text was selected or the selection" & sP & sT & "exceeded 64K characters."
sMsg = sMsg + sT & a$
EndIf
msgbox sMsg
Exit Sub
CountIt: 'Going to count this character/excluded char count?
Select Case instr(sNeverCountChars,sChr)
case = 0
If instr(sExcludeFromCharCount$, sChr) = 0 then
nSelCharEx = nSelCharEx + 1 : nSelChars = nSelChars + 1
Else nSelChars = nSelChars + 1
Endif
End Select
Return
End Sub
' This JV function constructs the string that shows
' the excluded characters for the character count.
Function Build_sExclude(sExcludeFromCharCount$)
sExclude$ = "* Excluding "
sOthers = sExcludeFromCharCount$
iPos = instr(sOthers," ")
If iPos > 0 then
Mid(sOthers,iPos,1,"")
select Case len(sOthers)
case 0 : sExclude$ = sExclude$ & "spaces."
case > 0: If instr(sOthers,chr(9)) = 0 then
sExclude$ = sExclude$ & "spaces and "
Else sExclude$ = sExclude$ & "spaces"
EndIf
end select
EndIf
iPos = instr(sOthers,chr(9))
If iPos > 0 then
Mid(sOthers,iPos,1,"")
Select Case len(sOthers)
Case 0 : If len(sExclude$) < 13 then
sExclude$ = sExclude$ & "tabs."
Else sExclude$ = sExclude$ & " and tabs."
EndIf
Case > 0: If len(sExclude$) < 13 then
sExclude$ = sExclude$ & "tabs and "
Else sExclude$ = sExclude$ & ", tabs and "
EndIf
End Select
EndIf
Build_sExclude = sExclude$ & sOthers
End Function
'* I can no longer find DV's original version although a faster
' modified version by Andrew Brown (version 2.0.2, Sept. 3, 2003)
' is currently available in the downloadable macro installer at:
' http://www.darwinwars.com/lunatic/bugs/oo_macros.html
' However, my tests do not indicate the counts are very accurate. |
|
|
| Back to top |
|
 |
schelle General User


Joined: 30 Oct 2003 Posts: 45 Location: Australia
|
Posted: Sun Apr 11, 2004 7:13 pm Post subject: |
|
|
Oh Dear... sorry to be a nuisance, but it seems once again the cut and paste process has changed something, and I can't work out what!!!
When I try and run this macro I get the message "Basic runtime error. Argument is not optional."
The text underlined below is highlighted as the problem.
| Quote: | ' This JV function constructs the string that shows
' the excluded characters for the character count.
Function Build_sExclude(sExcludeFromCharCount$)
sExclude$ = "* Excluding "
sOthers = sExcludeFromCharCount$
iPos = instr(sOthers," ")
If iPos > 0 then
Mid(sOthers,iPos,1,"")
select Case len(sOthers) |
What am I missing this time? _________________ "Poetry is the journal of a sea animal living on land, wanting to fly in the air."
(Carl Sandberg) |
|
| Back to top |
|
 |
JohnV Administrator

Joined: 07 Mar 2003 Posts: 8979 Location: Lexinton, Kentucky, USA
|
Posted: Sun Apr 11, 2004 7:37 pm Post subject: |
|
|
Not sure what to tell you. I just copied and pasted the macro from here and got no error.
The function in which the error occurred must receive the argument, "sExcludeFromCharCount$" which is passed to it by the line, "else sExclude$ = Build_sExclude(sExcludeFromCharCount$)" so check this code line. It's about 30 lines up from where the error occurred.
If this doesn't fix it try to copy & paste the macro again. Do not copy to an intervening document, instead copy & paste directly to the OO IDE.
Let me know how you fare. |
|
| Back to top |
|
 |
schelle General User


Joined: 30 Oct 2003 Posts: 45 Location: Australia
|
Posted: Sun Apr 11, 2004 10:53 pm Post subject: |
|
|
Whoops... another trap for macro newbies... I had the shortcut key assigned to the wrong macro in the module (is this the right way to say that?)
In the new module which I created and named 'wordcount' there appeared the macros 'build_sExclude' and 'selection count'. Since there was no obvious 'Main', I assigned the shortcut to the first item in the list, which was 'build_sExclude'. This didn't work.
Following your suggestion I recopied but with no success, I tried checking the code but could find nothing different, then point the shortcut to 'selection count' and everything works perfectly... obvious really... sorry to waste your time : ) _________________ "Poetry is the journal of a sea animal living on land, wanting to fly in the air."
(Carl Sandberg) |
|
| Back to top |
|
 |
accabrown Power User

Joined: 21 Apr 2004 Posts: 75 Location: England
|
Posted: Wed Apr 21, 2004 4:03 am Post subject: the AB word count inaccuracy |
|
|
The reason I don't use the DV (Daniel Vogelheim) code or its derivatives in my macro is that it is excruciatingly slow on large selections. In the end I compromised in a wholly arbitrary fashion and decided that anything less than about 200 characters would be counted char by char, which is slow but pretty accurate, and nything longer would be counted with the quick one.
If I knew how to write embedded components in Pythin, I could easily slurp the selecition into a string and count it very quicly indeed. But I don't.
Andrew Brown |
|
| Back to top |
|
 |
JohnV Administrator

Joined: 07 Mar 2003 Posts: 8979 Location: Lexinton, Kentucky, USA
|
Posted: Wed Apr 21, 2004 6:47 pm Post subject: |
|
|
Andrew,
Nice to see you join us.
I fully understand your desire to speed up the word count and that a compromise was necessary to achieve it. On the other hand there are users that are paid by the exact word and/or character count and I think they they should have a tool available for that purpose.
I think what someone uses in this case is a matter of what he needs. Fast and close or slow and (hopefully) exact.
Your material resides on my machine and I have certainly benefited from it.
Cheers,
JohnV |
|
| Back to top |
|
 |
Guest
|
Posted: Sat Jun 12, 2004 2:57 am Post subject: |
|
|
Add en dash and em dash to the sWordSeps line in the macro for consistency with the program word count?
James Naughton |
|
| Back to top |
|
 |
JohnV Administrator

Joined: 07 Mar 2003 Posts: 8979 Location: Lexinton, Kentucky, USA
|
Posted: Sun Jun 13, 2004 10:25 am Post subject: |
|
|
| Sounds like a reasonable request. I'm away from my machine for a while but will try to remember when I return next week. |
|
| Back to top |
|
 |
thelusiv General User

Joined: 26 Jul 2004 Posts: 5
|
Posted: Mon Jul 26, 2004 6:09 pm Post subject: |
|
|
i want to use this with a spread sheet. what do i need to change? if i try to run it while selecting a cell, or any amount of text in a cell, it says "BASIC runtime error. Property or method not found." does this need to be changed for calc?
| Code: | | oSelection = oDocument. getCurrentSelection() |
|
|
| Back to top |
|
 |
SergeM Super User

Joined: 09 Sep 2003 Posts: 3211 Location: Troyes France
|
|
| Back to top |
|
 |
SergeM Super User

Joined: 09 Sep 2003 Posts: 3211 Location: Troyes France
|
|
| Back to top |
|
 |
thelusiv General User

Joined: 26 Jul 2004 Posts: 5
|
Posted: Tue Jul 27, 2004 5:29 am Post subject: |
|
|
| OK, well let me give you a little background. My wife does database entry and she uses OOo to make xls files, which she sends to her manager, who imports them into a Notes database. Something screwy about the Excel -> Notes filter requires one of the fields to be no longer than 32 characters. She would like to be able to quickly count the characters in just one field to see if they are above or below this limit. So she does not really need to be able to select multiple cells, in fact she could highlight the text itself after selecting a single cell. How could this be made to work? I am not unfamiliar with BASIC but have not yet had a chance to look at the OOo API. I will check that out soon. Thanks |
|
| Back to top |
|
 |
SergeM Super User

Joined: 09 Sep 2003 Posts: 3211 Location: Troyes France
|
|
| Back to top |
|
 |
thelusiv General User

Joined: 26 Jul 2004 Posts: 5
|
Posted: Tue Jul 27, 2004 9:48 am Post subject: |
|
|
| I just want it to report the length. She can adjust the field herself if it is too long, or maybe I could write another macro to do that... |
|
| Back to top |
|
 |
SergeM Super User

Joined: 09 Sep 2003 Posts: 3211 Location: Troyes France
|
|
| Back to top |
|
 |
|