OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

[Solved]Work around for Median in SQL Query

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Base
View previous topic :: View next topic  
Author Message
enjoying28
General User
General User


Joined: 17 Dec 2010
Posts: 13

PostPosted: Fri Dec 17, 2010 2:18 pm    Post subject: [Solved]Work around for Median in SQL Query Reply with quote

Hay I need some help. I have a 42k record table that I need the median price for a grouped set of records. Grouped by year then month and then the median (not the avg or mean) of them. I know base does not have a function for median does anyone have code for a macro or something that i could use as a work around?
Code:
SELECT Median( "Price" ) AS "Price", "LstArea" AS "LstArea", YEAR( "CloseDate" ) AS "Year", MONTH( "CloseDate" ) AS "Month" FROM "main" "main" WHERE ( "LstStatus" = 'Sold' ) GROUP BY "LstArea", "Year", "Month"


thanks for any info.


Last edited by enjoying28 on Tue Dec 21, 2010 12:14 pm; edited 1 time in total
Back to top
View user's profile Send private message
Sliderule
Super User
Super User


Joined: 29 May 2004
Posts: 2499
Location: 3rd Rock From The Sun

PostPosted: Fri Dec 17, 2010 7:22 pm    Post subject: Reply with quote

enjoying28:

Just as an FYI, I did see your similar post on the other forum:

http://user.services.openoffice.org/en/forum/viewtopic.php?f=13&t=36695

and no responses.

I suspect, that is because there may not be a 'simple' answer.

Now, if it were me ( infamous last words Smile ), this is how I would accomplish the task. For a quick explanation in English . . . before the 'technical' procedure . . .
  1. I would run / save a Query that produces the results "Price" for each column . . . that is . . . each Year and Month.

  2. Copy and Paste the results from the Query into a Calc Spreadsheet . . . using F4 to get the Query results to your Calc Spreadsheet

  3. Using Regular Expressions, replace all values of 0 to nothing . . . that is . . . make sure Regular Expression is selected in the FIND . . . and . . . change ^0 to nothing . . . the reason is to make the entries of 0 ( NULL ) to nothing ( blank ) in Spreadsheet speak.

  4. Use the Calc Function Median . . . for example . . . MEDIAN(A2:A3272) . . . this assumes, just an example, your Query returned 3271 rows ( in addition to the 1 for Column Name )

  5. The above Calc formula can be dragged, so only enter it once, and just drag the formula to each additional column.

Now, the SQL you might use . . . assuming ( I just made this up ) you wanted to see data from your Base file is from 2008-01 to 2010-12. You could merely copy and paste each line as needed to include / exclude more Years / Months Smile

Important Note: The SQL below, I am using a CASE WHEN ELSE END statement . . . therefore . . . it MUST be Run Directly ( using the Toolbar . . . click on the icon with SQL and green check mark ).

Code:
-- Bring Back Data from 2008-01 to 2010-12 For MEDIAN calculation in Calc Spreadsheet
-- Calc Function might look like =MEDIAN(A2:A3272)
-- This Query with OpenOffice 3.2.1 must be Run in Direct Mode
-- In Calc, use Find and Replace . . . Find ^0 and replace with nothing

SELECT
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 1  THEN "Price" ELSE NULL END as "2008-01",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 2  THEN "Price" ELSE NULL END as "2008-02",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 3  THEN "Price" ELSE NULL END as "2008-03",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 4  THEN "Price" ELSE NULL END as "2008-04",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 5  THEN "Price" ELSE NULL END as "2008-05",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 6  THEN "Price" ELSE NULL END as "2008-06",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 7  THEN "Price" ELSE NULL END as "2008-07",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 8  THEN "Price" ELSE NULL END as "2008-08",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 9  THEN "Price" ELSE NULL END as "2008-09",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 10 THEN "Price" ELSE NULL END as "2008-10",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 11 THEN "Price" ELSE NULL END as "2008-11",
   CASE WHEN YEAR("CloseDate") = 2008 AND MONTH("CloseDate") = 12 THEN "Price" ELSE NULL END as "2008-12",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 1  THEN "Price" ELSE NULL END as "2009-01",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 2  THEN "Price" ELSE NULL END as "2009-02",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 3  THEN "Price" ELSE NULL END as "2009-03",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 4  THEN "Price" ELSE NULL END as "2009-04",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 5  THEN "Price" ELSE NULL END as "2009-05",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 6  THEN "Price" ELSE NULL END as "2009-06",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 7  THEN "Price" ELSE NULL END as "2009-07",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 8  THEN "Price" ELSE NULL END as "2009-08",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 9  THEN "Price" ELSE NULL END as "2009-09",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 10 THEN "Price" ELSE NULL END as "2009-10",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 11 THEN "Price" ELSE NULL END as "2009-11",
   CASE WHEN YEAR("CloseDate") = 2009 AND MONTH("CloseDate") = 12 THEN "Price" ELSE NULL END as "2009-12",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 1  THEN "Price" ELSE NULL END as "2010-01",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 2  THEN "Price" ELSE NULL END as "2010-02",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 3  THEN "Price" ELSE NULL END as "2010-03",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 4  THEN "Price" ELSE NULL END as "2010-04",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 5  THEN "Price" ELSE NULL END as "2010-05",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 6  THEN "Price" ELSE NULL END as "2010-06",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 7  THEN "Price" ELSE NULL END as "2010-07",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 8  THEN "Price" ELSE NULL END as "2010-08",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 9  THEN "Price" ELSE NULL END as "2010-09",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 10 THEN "Price" ELSE NULL END as "2010-10",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 11 THEN "Price" ELSE NULL END as "2010-11",
   CASE WHEN YEAR("CloseDate") = 2010 AND MONTH("CloseDate") = 12 THEN "Price" ELSE NULL END as "2010-12"

FROM "main" as "main"

WHERE "LstStatus" = 'Sold'


IMPORTANT NOTE: The above SQL Query assumes your database engine is HSQL ( you can confirm this by looking at the Status Line at the bottom ) after opening your OpenOffice Base file ( *.odb ).

I recognise that this might not the 'convoluted' approach ( database pun inteded Smile ) you were hoping for, but, it is the best I can suggest to accomplish your desired results.

I hope this helps, please be sure to let me / us know.

Sliderule

Thanks to add [Solved] in your first post Title ( edit button ) if your issue has been fixed / resolved.
Back to top
View user's profile Send private message
enjoying28
General User
General User


Joined: 17 Dec 2010
Posts: 13

PostPosted: Sat Dec 18, 2010 12:29 am    Post subject: Reply with quote

Thanks for the try I like the idea but there are a few problems when I expanded on your query to take in my three times long time line and two other groups. I endup with a over 500 column query 42K records long and it will not copy and paste into a spreadsheet. I keep trying but there is just two many groups I need. Is it possible to write code to create median just like avg. Because when I run my Avg query on the same data I get four columns aNd just over 600 records. I many have to just back your query up so that I make 40 different one to get the info but there has got to be a better way.
Back to top
View user's profile Send private message
Sliderule
Super User
Super User


Joined: 29 May 2004
Posts: 2499
Location: 3rd Rock From The Sun

PostPosted: Sat Dec 18, 2010 10:36 am    Post subject: Reply with quote

enjoying28:

I am not sure I understand what you are saying / asking in your post above.

With your permission, a few questions / comments:
  1. What version of OpenOffice are you using?

  2. Which database driver are you using -- put another way, are you using HSQL as the default 'embedded engine' ( version 1.8 ), HSQL version 2.0, MySQL, Microsoft Access, PostGRE, SQLite, some other database driver?

  3. After starting your OpenOffice Base file ( *.odb ), what does the status line say at the bottom?

  4. What I was suggesting should NOT return 42K rows, since, it will have multiple columns and your Where clause is limiting it to "LstStatus" = 'Sold' unless ALL your records have a "LstStatus" of 'Sold'. Furthermore, the number of rows returned should be the greatest number in any one month.

  5. In Calc, by using the F4 key OR from the Calc menu: View -> Data Source

    you can drag the Query results to Calc, and have Calc perform the MEDIAN on each column. That way, Calc will be responsible for doing the individual Column sorting and correct calculation of the "middle" / "median" value, whether an even number of values, or, odd. Smile
Sliderule
Back to top
View user's profile Send private message
enjoying28
General User
General User


Joined: 17 Dec 2010
Posts: 13

PostPosted: Sat Dec 18, 2010 1:33 pm    Post subject: Reply with quote

Thanks Silderule
Okay to answer your questions
1. Openoffice 3.2.1 on windows 7 ultimate 32bit
2. HSQL Embedded
3. HSQL Database engine
4. ok my table has more then 42k records about 3 times more but 42K 'LstStatus' sold records my date range is 6/2005 to 11/2010 I have a need to break that up into groups by 'LstArea' five categories and then again by 'LstType' seven categories. At the lowest division there is about 150 to 250 records in any given month. I don't know what you expected it to return but from my count it returns enough rows for every record within the whole time frame. Which sounds right to me because it places the value in the correct column but does not combined records.
5. I was doing this then copying and pasting like you said.

Now after feeling brain died for not knowing I can drag to import the data it works.
I just have to add where that make 35 different query with about 2000 to 5000 records each.
Thanks for the workable solution. I will mark this as solved but one last question. where can I suggest adding Median to the feature set of Base?
Enjoying28
Back to top
View user's profile Send private message
Sliderule
Super User
Super User


Joined: 29 May 2004
Posts: 2499
Location: 3rd Rock From The Sun

PostPosted: Sat Dec 18, 2010 2:12 pm    Post subject: Reply with quote

You said asked:

enjoying28 wrote:
but one last question. where can I suggest adding Median to the feature set of Base?


Forgive me for answering your question by giving you more information then you may want . . . but here goes.

HSQL, the database endgine, per your comments above, is probably version 1.8.

The HSQL folks have a new version that I use within OpenOffice . . . it is version 2.0.1 . . . but . . . that version will NOT be included with OpenOffice 3.3 ( to be released shortly ) . . . and . . . reportedly not until OpenOffice 3.4 .

Now, HSQL 2.0.1 DOES have an ability for a user to BUILD their own functions . . . for example, I have built a RandBetween function ( to return a Random Number between two given input values ). Also, HSQL 2.0.1 does support a GROUP_CONCAT function.

I, like you, would like to build / create a usable MEDIAN AGGREGATE function that can be user defined function. As of this minute, I have not been successful in doing it, but, I am trying. Please see the link below:

http://sourceforge.net/projects/hsqldb/forums/forum/73674/topic/4019158/

One last final comment . . . based on your description of the 'size' of your database . . . I strongly encourage you to make frequent backups of your OpenOffice Base file ( *.odb ) . . . but . . . more importantly, if it were me . . . I would NOT run this as an 'embedded database'. Keeping all the database as a separate file(s), rather than as a part of the OpenOffice Base zipped ( compressed ) file may likely cause total loss of data. In my experience, the very detailed instructions found in this form by dacm are wonderful. This way, you could run OpenOffice Base as your database BackEnd, and, the power of HSQL 2.0.1 as the backend, either in SERVER MODE ( to allow multiple folks to access the data concurrently ) OR "in file mode" . . . one person at a time, but, no need to have a server application running in the background. Smile

One more comment . . . to see examples of using 'simple' MEDIAN calculations using SQL on just ONE column . . . look at the link below:

http://scorreiait.wordpress.com/2008/10/28/how-to-compute-a-median-in-sql

however, big difference on doing it 'for one column of data' vs doing it for many columns concurrently.

Sliderule
Back to top
View user's profile Send private message
enjoying28
General User
General User


Joined: 17 Dec 2010
Posts: 13

PostPosted: Tue Dec 21, 2010 12:15 pm    Post subject: Reply with quote

Thanks
Back to top
View user's profile Send private message
dacm
Super User
Super User


Joined: 07 Jan 2010
Posts: 769

PostPosted: Tue Dec 21, 2010 4:20 pm    Post subject: Reply with quote

Here's a link to the tutorial for migration of your database to file/server mode, and then upgrade steps to HSQLDB 2.0.x. This requires some steps as you'll find in the tutorial links, but as a Windows user the setup process can be more automated using the available installers.
_________________
Soli Deo gloria
Tutorial: avoiding data loss with Base + Splitting 'Embedded databases'
Back to top
View user's profile Send private message
Sliderule
Super User
Super User


Joined: 29 May 2004
Posts: 2499
Location: 3rd Rock From The Sun

PostPosted: Thu Dec 23, 2010 5:51 pm    Post subject: Reply with quote

enjoying28:

Just so you know . . . perhaps some good news regarding the use of MEDIAN as an AGGREGATE function using HSQL as your database engine.

Per the posts below . . . the Beta Version of HSQL ( as of today ) includes the aggregate function MEDIAN . BUT, it is important to know that this is with HSQL 2.1.0 . . . not the default OpenOffice file as of OpenOffice 3.2.1 nor OpenOffice 3.3 ( when it is available ).

http://sourceforge.net/projects/hsqldb/forums/forum/73674/topic/4019158

The download for the HSQL file . . . hsqldb.jar is found in the first paragraphfound at:

http://www.hsqldb.org/support/index.html

Additionally, the link above from dacm ( thanks so much or your post here ) and at http://user.services.openoffice.org/en/forum/viewtopic.php?f=83&t=17567 can perhaps help you to migrate your current OpenOffice file to HSQL 2.1.0 if you desire.

Sliderule
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Base All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group