OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

RFC on VBA vs OOo Basic Performance

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API
View previous topic :: View next topic  
Author Message
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Sat Jul 29, 2006 6:27 pm    Post subject: RFC on VBA vs OOo Basic Performance Reply with quote

As some you may know, I an currently working on a HOWTO on VBA to OOB migration. As part of this I have been doing a benchmark comparison on their relative performance. The preliminary results are:
Code:
Run Time in Secs On 1.2Ghz P3M Laptop     VBA    OOB   Ratio
ShellSort(Intensive Basic)               0.29  19.24   1:66
Test Harness (Lots of UNO / COM Calls)   1.11  19.66   1:18

The sample one-sigma was typically 3% or so over a batch of runs giving a broad comparison 1 Sec VBA = 1 Min OOB for pure code. On my Laptop OOB runs at about 50 KIPS which is fine for non-looping algo's, but it could take a VBA migrator by nasty surprise when migrating a heavy one..

This was run under XP/SP2 agains MSOffice 2003. Also note that from PERFMON I could see that the Calc and Basic RTS are running one-task in SOFFICE.BIN. This was all user compute. It just looks like the design/performance of the Basic interpreter/RTS means that it runs like a total dog. FYI VBscript runs at about a 1:2 ratio -- that is 30 times faster then OOB on the sort routine at least. I haven't done the timing on using VBscript over the Automation Bridge yet.

Any comments from the forum members?

BTW With the exception of toggling one True/False this runs unchanged on both VBA and OOB. //Terry
Code:
   Option Explicit
' Module ShellSort
'
Public Function ShellSort(ByRef A() As Variant) As String
'
' Complements to Donald Knuth, "Art of Computer Programming
'                               Vol 3 - Sorting and Searching"
'
' There are various ShellSort variants.  All use a process of sorting
' a set of collapsing subsequences.  If you think of a bubble sort, the worse
' runtime is O(N*Nb) where Nb is the largest distanc of an entry in an
' unsorted position from its sorted position.  The trick that the Shell
' algortihm exploits is to do the initial passes to do large swap bounds,
' decreasing these as the sort progresses, so the journey of an individual
' element from its start position look quite like a binary chop.
'
' The exchange sort variant below runs in about half the time of a bubble sort
' variant.  You can also play bounds with the nesting strategy. This
' implementation uses a Sedgewick Sequence. RTFM for more info.
'
' The average sort time is O(N^(7/6)) and worst case O(N^(4/3)). For comparison
' The average sort time is O(N*Log N) and worst case O(N^2) for Quicksort.

   Dim Alb As Variant, Aub As Variant, Aspan As Variant
   Dim nSedgewick As Variant, nShell As Long, inc As Long
   Dim i As Long, j As Long, tmp As Variant
   Dim nMoves As Long, sDebug As String
   sDebug = ""

   Alb = LBound(A)
   Aub = UBound(A)
   Aspan = Aub - Alb + 1
   
   nSedgewick = Array(1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, 8929, 16001, _
                       36289, 64769, 146305, 260609, 587521, 1045505, 2354689, _
                       4188161, 9427969, 16764929, 37730305, 67084289, _
                       150958081, 268386305, 999999999999#)
   nShell = 0
   Do While nSedgewick(nShell + 1) < Aspan: nShell = nShell + 1: Loop
   
   Do While nShell >= 0:
      ' sort by insertion in increments of h
      inc = nSedgewick(nShell)
'DBG
      nMoves = 0
      For i = Alb + inc To Aub
         tmp = A(i)
         For j = i - inc To Alb Step -inc
'DBG
            nMoves = nMoves + 1
            If A(j) <= tmp Then Exit For
            A(j + inc) = A(j)
         Next j
         A(j + inc) = tmp
      Next i
'DBG
      sDebug = sDebug & Chr(10) & inc & ", " & nMoves & ", " & (Aub - Alb - inc)
      nShell = nShell - 1
   Loop

 ' The real reason that this is a function is to avoid the variant syntax
 ' of sub calls in VBA vs. OOB
 
   ShellSort = sDebug
End Function

Code:
Option Explicit
' Module TestHarness
'
#Const OOB = True     ' Set this True or False depending on whether
                       ' your code is in Calc or Excel
Const nArray = 20000

Sub test()
Dim x(), timer1, timer2, sDebug As String
ReDim x(nArray - 1)

' Disable screen updating and AutoCalculate

#If OOB Then
Msgbox "def OOB"
   ThisComponent.addActionLock
   ThisComponent.lockControllers
   ThisComponent.enableAutomaticCalculation (False)
#Else
   Application.ScreenUpdating = False
   Application.Calculation = xlCalculationManual
#End If

' timer1 times the load and fetch overhad (UNO/COM intensive)
timer1 = getTimeMS()
LoadArray x, 1

' timer1 times the sort (pure Basic runtime no UNO/COM)
timer2 = getTimeMS()
sDebug = ShellSort(x())

timer2 = getTimeMS() - timer2

StoreArray x, 2
timer1 = getTimeMS() - timer1 - timer2

' Now post the debug back to the sheet

StoreArray Array(sDebug, _
                 "Sort Time = " & (timer2 / 1000!) & " sec" & Chr(10) & _
                 "Load/Store Time = " & (timer1 / 1000!) & " sec" _
                 ), 3
                 
' Put Screen Updating and AutoCalculate back to defaults
#If OOB Then
   ThisComponent.enableAutomaticCalculation (True)
   ThisComponent.unLockControllers
   ThisComponent.removeActionLock
#Else
   Application.ScreenUpdating = True
   Application.Calculation = xlCalculationManual
#End If
End Sub

Sub FillSheetColumn()
   Dim vArray(nArray), i, y
   For i = 0 To UBound(vArray)
      vArray(i) = Chr(Int(25 * Rnd() + Asc("A"))) & CStr(Int(10000000 * Rnd()))
   Next i
   StoreArray vArray, 1
End Sub

Private Sub LoadArray(x, ByVal nCol)
   Dim i
   #If OOB Then
      With ThisComponent.Sheets(0)
         For i = 0 To UBound(x)
            x(i) = .getCellbyPosition(nCol - 1, i).String
         Next i
      End With
   #Else
      With ActiveWorkbook.Sheets(1)
         For i = 0 To UBound(x)
            x(i) = .Cells(i + 1, 1).Value
         Next i
      End With
   #End If
End Sub

Private Sub StoreArray(x, ByVal nCol)
   Dim i
   #If OOB Then
      With ThisComponent.Sheets(0)
         For i = 0 To UBound(x)
            .getCellbyPosition(nCol - 1, i).String = x(i)
         Next i
      End With
   #Else
      With ActiveWorkbook.Sheets(1)
         For i = 0 To UBound(x)
            .Cells(i + 1, nCol).Value = x(i)
         Next i
      End With
   #End If
End Sub

Function getTimeMS() As Long
   #If OOB Then
      getTimeMS = CLng(CSng(GetSystemTicks) / 0.98)
   #Else
      getTimeMS = CLng(Timer() * 1000)
   #End If
End Function
Back to top
View user's profile Send private message Visit poster's website
pitonyak
Administrator
Administrator


Joined: 09 Mar 2004
Posts: 3655
Location: Columbus, Ohio, USA

PostPosted: Sat Jul 29, 2006 7:17 pm    Post subject: Reply with quote

Very interesting. I would be interested to see if the Linux versions ran faster or slower than the Windows versions.

I looked at this only breifly. I was able to cut the load time in half by obtaining all of the data in one call using getDataArray() rather than making a separate call to get the cell each time. For example:
Code:
      Dim oData
      Dim oRow
      With ThisComponent.Sheets(0)
         oData = .getCellRangeByPosition(nCol-1, 0, nCol-1, UBound(x)).getDataArray()
         For i = 0 To UBound(x)
            oRow = oData(i)
            x(i) = oRow(0)
            'x(i) = .getCellbyPosition(nCol - 1, i).String
         Next i
      End With


No time to look at more...
_________________
--
Andrew Pitonyak
http://www.pitonyak.org/oo.php
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Sun Jul 30, 2006 1:55 am    Post subject: Reply with quote

I did the same Andrew, but didn't post this variant. I was wondering who would suggest it. Smile The only trouble is that if you do that then you also need to use the VBA vectored equivalent to maintain the like-for-like and this speeds up the VBA code even more (% wise)! But you are right in indirectly pointing out that there are two separate issues here:
  • There are efficient and inefficient ways to write code whatever the Basic variant. You should always tend to the efficient, such as the CellRange.DataArray pseudoProperty in OOB and Range.Value in VBA especially if it is a runtime hotspot. Both collapse the UNO / COM call overhead.

  • The OOB implementation is particularly weak compared to VBA. Looking at the source, I can see why and it would be a failry straight forward OOo project to speed it up considerably.

//Terry
Back to top
View user's profile Send private message Visit poster's website
Villeroy
Super User
Super User


Joined: 04 Oct 2004
Posts: 10106
Location: Germany

PostPosted: Sun Jul 30, 2006 9:18 am    Post subject: Reply with quote

Calc does not use lists of lists internally. It uses multidimensional arrays, based on lBound 1.
Short demo:
Code:

REM call as array function from a cell or a preselected range (if auto-calc is off)
REM =COPYVALUES(A1:B5) Ctrl+Shift+Enter
Function COPYVALUES(byval arr)
print lBound(Arr(),1),uBound(arr(),1),lBound(Arr(),2),uBound(arr(),2)
COPYVALUES = arr
End Function

_________________
Rest in peace, oooforum.org
Get help on https://forum.openoffice.org
Back to top
View user's profile Send private message
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Sun Jul 30, 2006 1:00 pm    Post subject: Reply with quote

Thanks Villeroy, exactly the sort of feedback that I was looking for -- only you are right and wrong. You are right in the case that you are talking about the Calc engine does seem to pass [in] parameter ranges as multidiemsional arrays which if you reference them in the watch window show up -- in the case you quote as Variant (1 to 2, 1 to 5). You and inspect them and arr(1,2) addressing works. The problem is that getDataArray returns a sequence< sequence< any > > which in OOB maps into a variant LoL. This is shown in the watch window as a Null Variant/Empty even though its not. (A bit of an incovenient bug here.) Hence Andrew's example would barf at runtime if he used oData(i,0). In VBA you can use the addressing syntax oData(i) (0), and it would be really nice to have this convenience in OOB, but unfortunately not.

In this game my motto is always trust but verify //Terry
Back to top
View user's profile Send private message Visit poster's website
Villeroy
Super User
Super User


Joined: 04 Oct 2004
Posts: 10106
Location: Germany

PostPosted: Sun Jul 30, 2006 1:43 pm    Post subject: Reply with quote

Quote:
Looking at the source, I can see why and it would be a failry straight forward OOo project to speed it up considerably.

Awestruck, I noticed that you are digging in the source and I wanted to annotate my observation, that calc uses "real" arrays while punishing us with lists of lists.
Yes, getDataArray() returns a LoL and I hate it, in particular when I need the numeric data of columns.
As a workaround I use to calculate spreadsheet-data like this:
oRange.getCellByPosition(0,0).setFormula("=SUM(myResults)*"& cStr(dblFactor)
oRange.fillSeries(com.sun.star.sheet.FillDirection.TO_BOTTOM, com.sun.star.sheet.FillMode.SIMPLE,0,0,0)
If I don't like to keep the formulas (I keep them in most cases):
aData() = oRange.getDataArray()
oRange.setDataArray(aData())
No loops, just a few API-calls, letting the application do the main work.
But the API has other bugs than LoL-data from ranges and those bugs are documented badly, if they are documented at all.
Just in case you want to fill a range with fast array-functions or change their dimensions:
http://www.openoffice.org/issues/show_bug.cgi?id=61807
It's possible, but I spent half an hour of testing until I found some undocumented "features".
_________________
Rest in peace, oooforum.org
Get help on https://forum.openoffice.org
Back to top
View user's profile Send private message
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Sun Jul 30, 2006 3:41 pm    Post subject: Reply with quote

I agree with everything you say and just to prove it see How To: paste a formula to a Column in Calc -- great minds think alike. Te He. See also OOo qa Issue 67774 which is related to this.

I am working with the OOo QA team and Jim Thompson on a rework of the VBA to Calc Porting guide. You might want to take a look at OOo qa Issue 67776 but ignore the text and look at the attachments. These topics are just a sanity check of some of my observatinos for peer review, to make sure that what I am saying is sound. However, give it another few days because I will be posting the next draft version soon. Any comments or feedback really appreciated.

BTW I haven't printed off the code (and I'm old enough to prefer paper than screen flicking and 20K lines of code is only 150 A4 pages double sided), but on a quick scan the main reason that OOB runs like a dog is SbiRuntime::FindElement, and I suspect that profiling the RTS will throw up that you are spending 95% of your time in this function. Most of this stuff should be done at compile time and even if you don't want to fundamentally change the bytecode architecture it is fairly trivial to introduce a translation look aside cache so that you only need to do this stuff once per variable fetch per macro run. Even if you purge the cache on a host of events to maintain coherency this would still mean that any subsequent execution of any line of code would run 10-20 times faster than the first. When I've finished the guide, maybe I'll make it my next project Smile

Love the English naming convention with German comments. //Terry
Back to top
View user's profile Send private message Visit poster's website
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Wed Aug 02, 2006 8:13 pm    Post subject: Reply with quote

Out of interest, I also recoded this routine in Perl to see how the Perl RTS copes with this algorithm. I just focussed on the sort timings and didn't bother to get the automation bridge bit going, but here are the timings:
Code:

RTS  Time (sec)  Ratio
OOB    19.24
VBA     0.29     1:66
Perl    0.86     1:22

So Perl is of the same order as VBscript and these set the benchmark for what this sorts of byte-code interpreter could achieve. I spend a couple of hours going through the 66K lines of compiler and RTS code so now I understand why OOB runs like such a dog. To really bottom the fine details would need to do an execution code walk, but that involves a lot of work to set up a test environment.

The nub of the problem is that compiler has a very simple code generation strategy and just about leaves everything to the byte-code interpreter in the RTS. The storing of variables, execution of functions all work off the stack and the code seems reasonable lean. The problem comes when you want to access the variables. So a = b+c gets translated to
Code:

  FIND b, type1
  GET
  FIND c, type2
  GET
  PLUS
  FIND d, type3
  SET

The problem is that the FIND has to do a lot of work, and this really hits you hard when you have something like:
Code:

   For i = 0 To 20000
      ThisComponent.Sheets(0).CellbyPosition(nCol - 1, i).String = x(i)
   Next i

This will call this find function (or one of its variants) 160,000 times. The work involved for a local variant which binds to primitive data type is reasonable lightweight (60,000 calls), array element access slightly worse (another 20,000), but the killer is the object and method accesses (the other 80,000) where the RTS does all the necessary reflection / inspection on the objects and methods / properties. 99.99% of this is duplicated and entirely redundant.

What is worse here is that you think you might help by using the With construct, but is just syntactic sugar which is substituted at compile-time, so the following is equivalent to the above:
Code:

   With ThisComponent.Sheets(0)   
      For i = 0 To 20000
          .CellbyPosition(nCol - 1, i).String = x(i)
      Next i
   End With

However recoding this as:
Code:

   k = ThisComponent.Sheets(0)
   nm1 = nCol - 1
   For i = 0 To j
      k.getCellbyPosition(nm1, i).setString(x(i))
   Next i
manually hoists unnecessary work out the loop reducing the total runtime of this piece of code by by nearly 60%!

What is very clear is that the Basic compiler and RTS ignore two essential tenants of good environment design
  • Never put off to runtime what you can do effectively at compile time
  • If you have to do it at runtime, the cache invariant work so that you only do it once not thousands of times.
Now the first of these would require a major piece of re-architecture so couldn't be taken lightly, but I reckon that you could implement a look-aside cache to bypass 90% of this repeated find work with perhaps a few weeks work. Isn't this a worthy project if we have a 10x speed up of the Basic performance as a result?
Back to top
View user's profile Send private message Visit poster's website
B Marcelly
Super User
Super User


Joined: 12 May 2004
Posts: 1453
Location: France

PostPosted: Thu Aug 03, 2006 5:19 am    Post subject: Reply with quote

TerryE wrote:
However recoding this as:
Code:

   k = ThisComponent.Sheets(0)
   nm1 = nCol - 1
   For i = 0 To j
      k.getCellbyPosition(nm1, i).setString(x(i))
   Next i
manually hoists unnecessary work out the loop reducing the total runtime of this piece of code by by nearly 60%!

Of course this improvement only appears if you run with freezed ui (lock and unlockControllers).
Personally I always avoid repeating those long dotted constructions.
Code:
' instead of this:
ThisComponent.Sheets(0).getCellbyPosition(nCol - 1, i).String = x(i)
' I  prefer to do this, even without loops:
myDoc = ThisComponent
mySheet = myDoc.Sheets(0)
mySheet.getCellbyPosition((nCol - 1, i).String = x(i)

This has several advantages:
- as you said, loops run quicker
- it's easier to read
- if you make a mistake ("Property or method not found") you find easily where is the problem.

Setting constant values outside a loop, with an appropriate variable, is old habit for me (I worked on assembly code for a long time). But I understand that today programmers rely on good optimizing compilers (perhaps they are even not aware of this), and it may be a problem when passing from VBA to OOo Basic.

The slow execution of Basic gets however unnoticed in most macros.
Anyway, thanks for your inspection of the source code (I don't understand all these C++ modules). Perhaps can you offer your time and experience to modify the source ? See Contributing page.
_________________
Bernard

OpenOffice.org 1.1.5 fr / Apache OpenOffice 4.0.1 / LibreOffice 4.1.0
MS-Windows 7 Home SP1
This forum is spammed, use instead Apache OpenOffice forums
Back to top
View user's profile Send private message Visit poster's website
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Thu Aug 03, 2006 7:41 am    Post subject: Reply with quote

I agree that most simple action macros should be written for clarity -- there is absolutely no value in anything other than Keep It Simple Stupid if you are falling through the code once. But there are times when people do tend to write For ... Next code particularly in Calc appilications, and this forum is littered with them asking for help. If you are migrating such a macro from VBA then you will get a shock. For an enterprise this sort of issue might be enough to put a red flag over the option of full migration to OOo.

I agree that you can usually find an efficient way. On another posting a guy was crying for help because he was processing some big "database" sheets and it was taking over 90 minutes on his high performance PC. I has a play and optimised the algo down to 45secs on my laptop, but I has to be clever in a way that only professional programmers would think of. If the RTS was 10 plus times faster then that 90 mins would have fallen to a more acceptable 9 mins and KISS would have been good enough.

As to the code improvement I don't have the few weeks to spare, but at least I can frame up the proposition and feed it into the system. //Terry
Back to top
View user's profile Send private message Visit poster's website
pitonyak
Administrator
Administrator


Joined: 09 Mar 2004
Posts: 3655
Location: Columbus, Ohio, USA

PostPosted: Thu Aug 03, 2006 12:12 pm    Post subject: Reply with quote

My guess is that Andreas Bregas knows the most about this. Andreas is a programmer for Sun who does most of the Basic programming from what I have seen. I am interested in what he has to say... It might be worth posting to the dev mailling list to see if we can get him to bite!
_________________
--
Andrew Pitonyak
http://www.pitonyak.org/oo.php
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
arxytas
Newbie
Newbie


Joined: 07 Aug 2006
Posts: 1

PostPosted: Mon Aug 07, 2006 11:11 am    Post subject: Reply with quote

Ok, it's clear that OOoBasic is way too slow....
So this has to change in order to become more friendly to users. I tried the following code:

Code:

Dim A(1024,1024) as Integer

Sub ButtonClicked
   Dim i as integer, j as integer
   
   For i=1 to 1024
      For j=1 to 1024
         A(i,j)=i+j
      Next j
   Next i
End Sub

On my machine (AthlonXP@1800) this takes 15 seconds and about 110MB extra memory when running!!!!!!
How could this situation be rewritten to speed it up? I'm not going to write how long it took in excel.....
Back to top
View user's profile Send private message
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Mon Aug 07, 2006 1:33 pm    Post subject: Reply with quote

Yup you will be running at ~100KIPs for a typical mix (Next is particularly light-weight). Yes this needs fixing, but very few of us will need to do a lot of VBA algo in calc or whatever, but that being said 3 out of my current 4 projectlettes are eavy on the looping.!!

I suppose they'll say the answer is to wait for a 4-way Core 2 PC !!!!
_________________
Terry
WinXPSP3, OOo 2.4.1, Ubunto 8.04 for development
Also try the Official OOo Community Forum where I mainly post now.
Back to top
View user's profile Send private message Visit poster's website
arthurkhachikyan
Newbie
Newbie


Joined: 04 Feb 2007
Posts: 3
Location: Armenia

PostPosted: Sun Feb 18, 2007 2:34 am    Post subject: Re: RFC on VBA vs OOo Basic Performance Reply with quote

ThisComponent.(Un)LockControllers do not work in OO 2.0.4. Please, who knows why?
Back to top
View user's profile Send private message MSN Messenger
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Sun Feb 18, 2007 8:39 am    Post subject: Reply with quote

No, I don't, sorry. Why not upgrade to OOo 2.1 ?
_________________
Terry
WinXPSP3, OOo 2.4.1, Ubunto 8.04 for development
Also try the Official OOo Community Forum where I mainly post now.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group