OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Speed comparison between VB and OOo Basic - Some results

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API
View previous topic :: View next topic  
Author Message
nom
OOo Enthusiast
OOo Enthusiast


Joined: 17 Dec 2003
Posts: 153
Location: NSW, Australia

PostPosted: Mon Apr 19, 2004 5:29 pm    Post subject: Speed comparison between VB and OOo Basic - Some results Reply with quote

I didn't do this for the sake of comparison, but rather was looking for the fastest way to accomplish a task. Having done this I now know which one is fastest, and to my suprise it is OOo Basic !

I first started off with OOo Basic. The task is to look at the content of a cell in a Calc sheet and find out if it contains characters that have ASCII numbers > 127.

I need to process more than 28,000 cells with each cell containing strings on average 6 characters long.

This takes about 40 seconds in OOBasic.

Because I thought this was too slow I wanted to do the same thing in VB since I assumed it would be faster. To my suprise it took 70 seconds to complete!

****
Update: After I finished writing the subs I found a better way of performing the search (after reading a tutorial by Bernard Marcelly). I will try this when I get the time and post back with the results. The idea being that instead of checking each cell at a time (This is actually where I think VB looses time), load all the data into memory as an array (right at the beggining) and loop through the array. So perhas VB will be faster in this case.... will see.

nom
Back to top
View user's profile Send private message
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Tue Apr 20, 2004 9:26 am    Post subject: Reply with quote

I suspect that each UNO operation has a fixed constant overhead. And I think it may be in the low milliseconds. Ouch! Oowee! (I think I read this fact somewhere on the www.openoffice.org website in a uno developer area. Also, naturally a concept like "milliseconds" of overhead is purely dependant on how fast your computer runs! But I'm just repeating what I read.)

What this means is that the performance is going to be directly related to the number of UNO calls you make.

Some uno bridges may have a higher per-function-call overhead than others -- especially if they communicate over TCP/IP.

If you don't understand the API, you may be using "psuedo properties" (for lack of an official term) thinking there is no overhead, because you cannot see the function calls.

Imagine a loop like....
Code:
oDoc = ThisComponent
For i = 1 To 1000
   oSheet = oDoc.Sheets( 1 )
   ..... do something ....
Next


The idea that the document has a "Sheets" property is purely an illusion. The above code is actually doing this....
Code:
oDoc = ThisComponent
For i = 1 To 1000
   oSheet = oDoc.getSheets().getByIndex( 1 )
   ..... do something ....
Next


The first example masks the fact that there are two function calls happening. getSheets() and getByIndex(). Understanding this may help you in writing more efficient code.

A natural optimization that any compiler junkie would think of is called "invariant code motion". Since the statement is NOT affected in any way by the loop, and does not affect the loop in any way, it is therefore "invariant" over the loop. You can move it outside the loop like this....
Code:
oDoc = ThisComponent
oSheet = oDoc.getSheets().getByIndex( 1 )
For i = 1 To 1000
   ..... do something ....
Next


Other optimizations, such as "loop unwinding" may not help in OOo macros, since the vast majority of overhead is in the uno bridge.
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
dfrench
Moderator
Moderator


Joined: 03 Mar 2003
Posts: 1605
Location: Wellington, New Zealand

PostPosted: Tue Apr 20, 2004 12:12 pm    Post subject: Reply with quote

Quote:
The idea being that instead of checking each cell at a time (This is actually where I think VB looses time), load all the data into memory as an array (right at the beggining) and loop through the array.
Quite right, for the reasons stated by Danny it is always faster to work with a range rather than a cell at a time ... have a look at the posts about getdataarray and setdataarray though, there are traps for the unwary.
Back to top
View user's profile Send private message
nom
OOo Enthusiast
OOo Enthusiast


Joined: 17 Dec 2003
Posts: 153
Location: NSW, Australia

PostPosted: Fri May 14, 2004 6:38 am    Post subject: Reply with quote

I made some time and did a speed comparison between VB and StarBasic (SB), but this time loaded the data into memory.

For some insight this is what I use to do (OK I know it is poor Embarassed ):

Code:

For iCellCol = 0 To iCols - 1         

   ' Always start from the first row
   iCellRow = 0
   
   ' Define current cell
   oCell = oSheet.getCellByPosition(iCellCol, iCellRow)
   
   ' Locate the cell below the current selection
   oNext = oSheet.getCellByPosition(iCellCol,iCellRow + 1 )
   
   For iRow = 0 to 28596-1
   
      ' Get the sting in the cell
      sCell = oCell.getString()
      
      ' Now move the current cell to the one beneath it
      oCell = oNext
      
      iCellRow = iCellrow + 1 ' Update location of the current row
      
      ' Define the oNext cell
      'THIS IS WHERE THE TIME IS WASTED
      oNext = oSheet.getCellByPosition(iCellCol,iCellRow + 1 )
   
   Next
   
   iCellCol = iCellCol + 1
      
Next iCellCol


Now this is what I do (Inspired by Bernard Marcelly) (very clean)):

Code:

oArea = oSheet.getCellRangeByName("A1:A28595")

' THIS HOW THE DATA IS LOADED INTO MEMORY
arrValues = oArea.getDataArray

for i = LBound(arrValues) to UBound(arrValues)
   nCol = arrValues(i) ' Pick the i th column
   for j = LBound(nCol) to UBound(nCol) ' Iterate through the rows
      sCell = nCol(j) ' This will give us the content of the cell
   next
arrValues(i) = nCol
next

'PUT THE VALUES BACK INTO THE CELLS (This is just to show that it can be done)
oArea.setDataArray(arrValues)


Now the results. In SB the old code would take 31 secs. Now the new code takes 19 secs

In VB the old code would take 73 secs and the new code (hang onto your seats) takes 1 sec (yes, ONE second).

However there ias a problem. I couldn't find a way to to get this code to work in VB
Code:

'PUT THE VALUES BACK INTO THE CELLS (This is just to show that it can be done)
oArea.setDataArray(arrValues)


(any takers?)

IMPORTANT CLARIFICATION
The reason why I wanted to use this code was essentially to find characteres whose ASCII numbers where greater than 126 and get rid of them. In this case the best code to use in SB is (given by JohnV):

Code:

Set oFind = oSheet.createReplaceDescriptor

With oFind
 .SearchString = "[^ -~]" ' Find all ascii characters > 126
 .ReplaceString = "*"
 .SearchRegularExpression = True
End With

nFound = oSheet.replaceAll(oFind)



This takes less than ONE second., and searches the WHOLE spreadsheet. (the previous code only searched through one column, whereas another 3 needed to be checked. In VB it takes 3 secs to check all 4 columns)

Notice that for simplicity I removed the search and replace functionality in the first two code snippets, however this last code does the whole lot!!

Nom
Back to top
View user's profile Send private message
nom
OOo Enthusiast
OOo Enthusiast


Joined: 17 Dec 2003
Posts: 153
Location: NSW, Australia

PostPosted: Fri May 14, 2004 7:04 am    Post subject: Reply with quote

While I was studying Bernard's code I thouhgt of another way of doing the search.

As a reminder this is how he does it:
Code:

oArea = oSheet.getCellRangeByName("A1:A28595")

arrValues = oArea.getDataArray

for i = LBound(arrValues) to UBound(arrValues)
   nCol = arrValues(i) ' Pick the i th column
   for j = LBound(nCol) to UBound(nCol) ' Iterate through the rows
      sCell = nCol(j) ' This will give us the content of the cell
   next
arrValues(i) = nCol
next



Rather than extracting one Column at a time and then looping through all its rows to get the relevant data, I wanted to access it directly like this:

Code:

oArea = oSheet.getCellRangeByName("A1:A28595")

nColumns = oArea.columns.count
nRows = oArea.rows.count

arrValues = oArea.getDataArray

for i = 0 to nColumns-1
   for j = 0 to nRows-1
      temp = arrValues(j,i)    ' Looks like this returns a one element array
      sCell = temp(0)            ' workaround to extract the desired data
   Next j
Next i


EDIT: WRONG
This code is a little bit faster than the first example. However the intersting thing to note is, when accesing a 2D array directly, the data must be passed to a temporary (temp) ONE element array first, and then accessed using temp(0). This is actually deceiving. If you have a nx1 vector this will work, but not for a mxn matrix! please see note below. For some reason only the values of the FIRST column (or row I can't remember exactly) are returned even if you do something like arrvalues(2,3).
This is somewhat strange. (could it be a bug?)

The ".getDataArray" does NOT return a "real" array. Please see this post for the correct implementation http://www.oooforum.org/forum/viewtopic.php?t=9117&highlight=
Basically one has to use Bernard's code first to create a new array that can then be accessed by its coordinates.... so there wouldn't be a speed improvement.


Nom


Last edited by nom on Wed Jun 23, 2004 6:07 pm; edited 3 times in total
Back to top
View user's profile Send private message
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Fri May 14, 2004 7:28 am    Post subject: Reply with quote

Your new results are initially surprising, but not after you analyze where the time is being spent.

Here is an outline of how I think the time was spent in old vs. new....


Old way (roughly)....
1. For loop
1.1. Get/manipulate cell values (uno calls -- inside loop)

New way (roughly)....
1. Get array of values (uno call -- outside loop)
2. For loop
3. Put array of values (uno call -- outside loop)

Now instead of looking at old vs. new, let's look at OOoBasic vs. VB.

In OOoBasic, you have in the old way a loop with a uno call inside it. In the new way you have moved the uno calls outside the loop. Shaved off a significant fraction of the runtime. (31 sec vs. 19 sec.)

So in VB, you do mostly the same. Move the inside the loop uno calls outside the loop. But you get a speed difference of 73 sec. vs. 1 sec. Why? Because VB compiles to native machine code! OOo Basic must still be interpreted at some level. (This makes sense. I can't imagine the OOo developers having to implement a native compiler for every platform OOo runs on.) In VB, most of that 73 second runtime is spent in UNO calls, plus the overhead of the COM to UNO bridge from Windows Automation over to UNO method calls. By having an uno call before and after the loop, you only have two expensive COM-UNO calls instead of a bunch of them inside the loop. Then the native code is just manipulating local array data. At lightning speed. In fact, it probably takes less than 1 second. And the runtime may actually be dominated by the uno call before the loop!

nom wrote:
(any takers?)

I'm not a VB expert by any means. I only use OOo Basic in order to get results, but I'm looking for a way to abandon OOo Basic ASAP.

That being said, when passing an array from VB, don't you need to include the parenthesis after the array to signal that it is an array?

oArea.setDataArray( arrValues() )
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Fri May 14, 2004 7:34 am    Post subject: Reply with quote

nom wrote:
the intersting thing to note is, when accesing a 2D array directly, the data must be passed to a temporary (temp) ONE element array first, and then accessed using temp(0).
This is somewhat strange. (could it be a bug?)


That seems perfectly natural and intuitive. Not a bug.

It looks like getDataArray() returns a two dimensional array of single-element arrays. The two dimensional array is zero based. So are the single-element arrays within the two dimensional array.
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
nom
OOo Enthusiast
OOo Enthusiast


Joined: 17 Dec 2003
Posts: 153
Location: NSW, Australia

PostPosted: Fri May 14, 2004 8:35 am    Post subject: Reply with quote

Quote:
That being said, when passing an array from VB, don't you need to include the parenthesis after the array to signal that it is an array?

oArea.setDataArray( arrValues() )


Unfortunately it makes not difference.

In the first case I get: com.sun.star.uno.runtime Exception

With the brackets I get : subscript out of range.
Back to top
View user's profile Send private message
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Fri May 14, 2004 8:42 am    Post subject: Reply with quote

There have been various VB examples of passing arrays of com.sun.star.beans.PropertyValue to both loadComponentFromURL and storeToURL.

In those cases, an array is passed, with the extra parens. Also of note is that the arrays are always expressly dimensioned.

So what if you tried this....

Dimension a new array, whose size is exactly the size of the dynamic array which was returned from the spreadsheet. In a nested For loop, copy, element by element, the values from the dynamic array to the dimensioned array.

Then pass the dimensioned array name, using parens as the argument.

Any luck with that?
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
nom
OOo Enthusiast
OOo Enthusiast


Joined: 17 Dec 2003
Posts: 153
Location: NSW, Australia

PostPosted: Fri May 14, 2004 9:55 pm    Post subject: Reply with quote

Got it working! Thanks for the hint Danny.
Actually I don't need a temporary array, I just had to define the dimension of the array to start of with. The trick in VB is to use the ReDim definition and extract the number of rows and column from the selected range. After that everything works nicely!

The working code is here:


Code:

iProb = 0

Set oArea = oSheet.getCellRangeByName("A1:D28595")

nRows = oArea.rows.Count ' Get number of rows
nCols = oArea.Columns.Count ' Get number of columns

' Define the array we will use to store the data. THIS IS THE IMPORTANT BIT
ReDim arrValues(0 To nRows, o To nCols) As Variant

'Get data
arrValues = oArea.getDataArray

' Put data back into cells
oArea.setDataArray (arrValues())  ' This now works!


Interestingly the last line of code does not need the brackets at the end. Both arrValues (without the brackets) and arrValues() work.

Nom
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group