| View previous topic :: View next topic |
| Author |
Message |
Hemo2 General User

Joined: 20 Oct 2005 Posts: 6
|
Posted: Thu Oct 20, 2005 8:37 pm Post subject: Import CSV files into Base tables? |
|
|
I apologize for posting what I believe has been discussed somewhat already. I'm new to Open Office and can't figure out something that I'm hoping will have an easy answer. I'm a novice at this stuff, but I've got a baseball stats database in MS Access. With the release of Base in Open Office 2, I thought it might be interesting to see if I could get a similiar database setup in Base. The data for the baseball database came in the form of several CSV files. In MS Access, I created a table for each of these files setting up the fields accordingly for each table and then simply imported the CSV files into the appropriate table. However, in Base I have created the first table and setup the fields the way they should be but I see no way to import the CSV files into this table. I feel like an idiot that I can't figure out how to do this in Base so I figure I better ask for help. Is there a way to simply import the info from these CSV files into the blank tables I've created in Base?
I've searched on the forum for some answers and have seen reference to linking to text files, but this isn't what I would want to do. I also saw some info on putting the info into Calc and then dragging it to a table but this didn't seem to work for me on the one I tried plus some of the other CSV files have too many rows in them and Calc fails to open them so I don't think I could do it through this route.
Many thanks
Keith |
|
| Back to top |
|
 |
DrewJensen Super User


Joined: 06 Jul 2005 Posts: 2599 Location: Cumberland, MD
|
Posted: Thu Oct 20, 2005 8:52 pm Post subject: |
|
|
Hemo2,
OK...Base just doesn't have a general purpose import export facility at the moment. So if you have your data in CSV files and you want it in a "native" Base format it is going to be a two step process.
You can choose which two steps..but it is two.
Way one. Create a "native" Base database (HSQLDB Embedded engine). In this database you would have to use a DDL command to create a TEXT table. Really not that hard, the documentation at www.hsqldb.org is quite clear on how to do this. If you go this route and have questions ask I can help. Anyway, when you do this then you can just use a Select INTO command to move the data from the TEXT table into a HSQLDB native table. It really is the easiest way.
The second way is to create two databases, one where the database type is TEXT. (This maps to a directory and all the CSV files will come up as tables) The other will be a standard Base database. Now you need to create a ooBasic procedure that opens both databases and again moves the data from the text database to the 'native' databse.
A third way is one you mentioned. Open the CSV file in Calc, and then drag the data over to a Base database in table view. You can not drop the data into an existing table, rather you must import it to a new table. When you do this a wizard will open allowing you to map column types and the like. There are plenty of threads on the board covering transfering data from Calc to Base. Look especially at ones that cover dates ,as this can be a hassel.
Personally, I can tell you that the first way is actually the best. A little more work getting it set up, but it pays off with the control you get. Then the final step is just one simple select statement.
I HTH
Drew Jensen
Boy you gotta love those Astros... I say they do it in....4 |
|
| Back to top |
|
 |
Hemo2 General User

Joined: 20 Oct 2005 Posts: 6
|
Posted: Fri Oct 21, 2005 10:06 am Post subject: |
|
|
Thank you for the reply Drew. It's good to know the Base can't do an import of a CSV file so I can stop scratching my head why I couldn't figure it out!
I think I understand what you're saying on what needs to be done to accomplish this, but I don't know that I have a good handle on how to do it yet. I'll take a look at the www.hsqldb.org website and see if the helps me out.
Yes, the Astros surprised me by bouncing back that well after loosing game 5 the way they did. It could be an interesting series. Both teams have very good pitching so it's hard to tell what way that series will go. If you are into baseball stats, etc. the database I'm referring to is available for anyone to download and use. It's quite a nice project done by some very dedicated baseball fans. They're constantly working through various things trying to improve it. It can be found at:
http://www.baseball-databank.org/
They have CSV versions of all the tables along with the table definitions. They also have a mySQL version of it available for download. I'm 'slightly' able to do things in my MS Access 2000 so I imported the CSV version of all the tables into Access and managed to create various queries, reports, and forms for me to use for browsing & viewing the baseball stats throughout baseball history. It's nothing special, but it's functional. I thought it might be kind of cool to attempt something in Open Office Base since it is free and anyone that doesn't have MS Access would have a route available to them to enjoy this project. I admit I'm a novice at a lot of this and know nothing of the capabilities of Base so I'm not sure even if I get things into Base that I'll be able to get a similiar setup in Base like I do in Access and may have to drop this idea. (It's certainly not the end of the world if I can't get it done.) But, one step at a time and step one is to get the data into Base.
Keith |
|
| Back to top |
|
 |
DrewJensen Super User


Joined: 06 Jul 2005 Posts: 2599 Location: Cumberland, MD
|
Posted: Fri Oct 21, 2005 10:54 am Post subject: |
|
|
Hemo2,
I'll see if I can't find a few minutes to grab the csv files from the site and put together an example of how to do the import.
Drew |
|
| Back to top |
|
 |
Keith Silva General User

Joined: 17 Jun 2005 Posts: 20 Location: San Francisco, California
|
Posted: Fri Oct 21, 2005 4:00 pm Post subject: |
|
|
An evil fourth way is to open the text file in MS Excel. Excel is very good about converting such a file to the Excel file format. Once converted, it can be opened and used in calc. I'm also trying to import a csv file into Base and haven't quite got there yet.
Regards, The other Keith |
|
| Back to top |
|
 |
DrewJensen Super User


Joined: 06 Jul 2005 Posts: 2599 Location: Cumberland, MD
|
Posted: Fri Oct 21, 2005 4:25 pm Post subject: |
|
|
I downloaded the files for 'the american passtime stats', I'll put together a litle demo of bringing them into a base file - sans calc this weekend, and post it to this thread.
Drew |
|
| Back to top |
|
 |
Hemo2 General User

Joined: 20 Oct 2005 Posts: 6
|
Posted: Fri Oct 21, 2005 7:12 pm Post subject: |
|
|
Many thanks Drew. As I recall, the 'batting' and 'fielding' tables have many more rows than Base (or Excel) allows in a single sheet. I hope it's not too much of a bother for you to look into this because I'm still not sure I'll be able to accomplish what I want in Base. But I enjoy doing these types of things and am always open to learning more.
Keith |
|
| Back to top |
|
 |
DrewJensen Super User


Joined: 06 Jul 2005 Posts: 2599 Location: Cumberland, MD
|
Posted: Tue Oct 25, 2005 8:18 am Post subject: |
|
|
Hemo,
Here is a little help on this.
Download the zip file with...suppose you have already done this.
Now create a standard Base database, be sure to save it in the same directory as the *.sql and *.txt files from the zip file.
OK, because they supplied the table definitions - albeit in MySQL dialect, we can alter this to HSQLDB dialect and create an import script.
| Code: |
CREATE TEXT TABLE "Allstar_txt" (
"idxAllstar" identity,
"yearID" smallint default 0 NOT NULL ,
"lgID" char(2) default '' NOT NULL ,
"idxLahman" int default 0 NOT NULL ,
PRIMARY KEY ("idxAllstar"));
SET TABLE "Allstar_txt" SOURCE "Allstar.txt";
CREATE TABLE "Allstar" (
"idxAllstar" identity,
"yearID" smallint default 0 NOT NULL ,
"lgID" char(2) default '' NOT NULL ,
"idxLahman" int default 0 NOT NULL ,
PRIMARY KEY ("idxAllstar"));
insert into "Allstar" select * from "Allstar_txt";
drop table "Allstar_txt";
CREATE TEXT TABLE "AwardsVotes_txt" (
"idxAwardsVotes" IDENTITY,
"awardID" varchar(25) default '' NOT NULL ,
"yearID" smallint default 0 NOT NULL ,
"lgID" char(2) default '' NOT NULL,
"idxLahman" int default 0 NOT NULL,
"pointsWon" int default NULL,
"pointsMax" int default NULL,
"votesFirst" int default NULL,
PRIMARY KEY ("idxAwardsVotes")
);
set table "AwardsVotes_txt" source "AwardsVotes.txt";
CREATE TABLE "AwardsVotes" (
"idxAwardsVotes" IDENTITY,
"awardID" varchar(25) default '' NOT NULL ,
"yearID" smallint default 0 NOT NULL ,
"lgID" char(2) default '' NOT NULL,
"idxLahman" int default 0 NOT NULL,
"pointsWon" int default NULL,
"pointsMax" int default NULL,
"votesFirst" int default NULL,
PRIMARY KEY ("idxAwardsVotes")
);
insert into "AwardsVotes" SELECT * from "AwardsVotes_txt";
drop table "AwardsVotes_txt";
CREATE TEXT TABLE "AwardsWinners_TXT" (
"idxAwardsWinners" IDENTITY,
"awardID" varchar(25) default '' NOT NULL,
"yearID" smallint default 0 NOT NULL,
"lgID" char(2) default '' NOT NULL,
"idxLahman" int default 0 NOT NULL,
"tie" char(1) default NULL,
"notes" varchar(100) default NULL,
PRIMARY KEY ("idxAwardsWinners")
);
SET TABLE "AwardsWinners_TXT" SOURCE "AwardsWinners.TXT";
CREATE TABLE "AwardsWinners" (
"idxAwardsWinners" IDENTITY,
"awardID" varchar(25) default '' NOT NULL,
"yearID" smallint default 0 NOT NULL,
"lgID" char(2) default '' NOT NULL,
"idxLahman" int default 0 NOT NULL,
"tie" char(1) default NULL,
"notes" varchar(100) default NULL,
PRIMARY KEY ("idxAwardsWinners")
);
CREATE INDEX "AwardsWinners_IDX1" ON "AwardsWinners" ("idxLahman");
CREATE INDEX "AwardsWinners_IDX2" ON "AwardsWinners" ("awardID");
INSERT INTO "AwardsWinners" SELECT * FROM "AwardsWinners_TXT";
DROP TABLE "AwardsWinners_TXT";
CREATE TEXT TABLE "ManagersHalf_TXT" (
"idxManagersHalf" IDENTITY,
"idxManagers" int default 0 NOT NULL,
"Half" smallint default 1 NOT NULL,
"DivWin" BOOLEAN default NULL,
"G" smallint default 0 NOT NULL,
"W" smallint default 0 NOT NULL,
"L" smallint default 0 NOT NULL
);
SET TABLE "ManagersHalf_TXT" SOURCE "ManagersHalf.TXT";
CREATE TABLE "ManagersHalf" (
"idxManagersHalf" IDENTITY,
"idxManagers" int default 0 NOT NULL,
"Half" smallint default 1 NOT NULL,
"DivWin" BOOLEAN default NULL,
"G" smallint default 0 NOT NULL,
"W" smallint default 0 NOT NULL,
"L" smallint default 0 NOT NULL
);
CREATE INDEX "MngrsHlf_idxManagers" ON "ManagersHalf_TXT"
("idxManagers","idxManagersHalf");
INSERT INTO "ManagersHalf" SELECT * FROM "ManagersHalf_TXT";
DROP TABLE "ManagersHalf_TXT";
CREATE TEXT TABLE "Master_TXT" (
"idxLahman" IDENTITY,
"birthYear" int default NULL,
"birthMonth" int default NULL,
"birthDay" int default NULL,
"birthCountry" varchar(50) default NULL,
"birthState" char(2) default NULL,
"birthCity" varchar(50) default NULL,
"deathYear" int default NULL,
"deathMonth" int default NULL,
"deathDay" int default NULL,
"deathCountry" varchar(50) default NULL,
"deathState" char(2) default NULL,
"deathCity" varchar(50) default NULL,
"nameFirst" varchar(50) default NULL,
"nameLast" varchar(50) default ''NOT NULL ,
"nameNote" varchar(255) default NULL,
"nameGiven" varchar(255) default NULL,
"nameNick" varchar(255) default NULL,
"weight" int default NULL,
"height" double default NULL,
"bats" CHAR(1) default NULL,
"throws" CHAR(1) default NULL,
"debut" date default NULL,
"finalGame" date default NULL
);
SET TABLE "Master_TXT" SOURCE "Master.TXT";
CREATE TABLE "Master" (
"idxLahman" IDENTITY,
"birthYear" int default NULL,
"birthMonth" int default NULL,
"birthDay" int default NULL,
"birthCountry" varchar(50) default NULL,
"birthState" char(2) default NULL,
"birthCity" varchar(50) default NULL,
"deathYear" int default NULL,
"deathMonth" int default NULL,
"deathDay" int default NULL,
"deathCountry" varchar(50) default NULL,
"deathState" char(2) default NULL,
"deathCity" varchar(50) default NULL,
"nameFirst" varchar(50) default NULL,
"nameLast" varchar(50) default ''NOT NULL ,
"nameNote" varchar(255) default NULL,
"nameGiven" varchar(255) default NULL,
"nameNick" varchar(255) default NULL,
"weight" int default NULL,
"height" double default NULL,
"bats" CHAR(1) default NULL,
"throws" CHAR(1) default NULL,
"debut" date default NULL,
"finalGame" date default NULL
);
CREATE INDEX "Mstr_idxLahman" ON "Master"
("idxLahman","nameLast","nameFirst");
CREATE INDEX "Mstr_idxLahman_2" ON "Master" ("idxLahman","debut");
CREATE INDEX "Mstr_idxLahman_3" ON "Master"
("idxLahman","birthYear","birthMonth","birthDay");
INSERT INTO "Master" SELECT * FROM "Master_TXT" ;
DROP TABLE "Master_TXT";
CREATE TEXT TABLE "Pitching_TXT" (
"idxPitching" IDENTITY,
"idxLahman" int default 0 NOT NULL,
"stint" int default 0 NOT NULL,
"idxTeams" int default 0 NOT NULL,
"W" smallint default NULL,
"L" smallint default NULL,
"G" smallint default NULL,
"GS" smallint default NULL,
"CG" smallint default NULL,
"SHO" smallint default NULL,
"SV" smallint default NULL,
"IPouts" int default NULL,
"H" smallint default NULL,
"ER" smallint default NULL,
"HR" smallint default NULL,
"BB" smallint default NULL,
"SO" smallint default NULL,
"BAopp" decimal(5,3) default NULL,
"ERA" decimal(5,2) default NULL,
"IBB" smallint default NULL,
"WP" smallint default NULL,
"HBP" smallint default NULL,
"BK" smallint default NULL,
"BFP" smallint default NULL,
"GF" smallint default NULL,
"R" smallint default NULL
);
SET TABLE "Pitching_TXT" SOURCE "Pitching.TXT"
CREATE TABLE "Pitching" (
"idxPitching" IDENTITY,
"idxLahman" int default 0 NOT NULL,
"stint" int default 0 NOT NULL,
"idxTeams" int default 0 NOT NULL,
"W" smallint default NULL,
"L" smallint default NULL,
"G" smallint default NULL,
"GS" smallint default NULL,
"CG" smallint default NULL,
"SHO" smallint default NULL,
"SV" smallint default NULL,
"IPouts" int default NULL,
"H" smallint default NULL,
"ER" smallint default NULL,
"HR" smallint default NULL,
"BB" smallint default NULL,
"SO" smallint default NULL,
"BAopp" decimal(5,3) default NULL,
"ERA" decimal(5,2) default NULL,
"IBB" smallint default NULL,
"WP" smallint default NULL,
"HBP" smallint default NULL,
"BK" smallint default NULL,
"BFP" smallint default NULL,
"GF" smallint default NULL,
"R" smallint default NULL
);
CREATE INDEX "Ptch_idxPitching" ON "Pitching"
("idxPitching","idxLahman");
CREATE INDEX "Ptch_idxLahman" ON "Pitching" ("idxLahman"
,"stint","idxTeams");
CREATE INDEX "Ptch_idxTeams" ON "Pitching" ("idxTeams","idxLahman");
INSERT INTO "Pitching" SELECT * FROM "Pitching_TXT";
DROP TABLE "Pitching_TXT";
CREATE TEXT TABLE "Salaries_TXT" (
"idxSalaries"IDENTITY,
"idxLahman" int default 0 NOT NULL,
"idxTeams" int default 0 NOT NULL,
"salary" double default 0.00 NOT NULL
);
SET TABLE "Salaries_TXT" SOURCE "Salaries.TXT"
CREATE TABLE "Salaries" (
"idxSalaries"IDENTITY,
"idxLahman" int default 0 NOT NULL,
"idxTeams" int default 0 NOT NULL,
"salary" double default 0.00 NOT NULL
);
CREATE INDEX "SAL_idxLahman" ON "Salaries" ("idxLahman","idxTeams");
CREATE INDEX "SAL_idxTeams" ON "Salaries" ("idxTeams","idxLahman");
INSERT INTO "Salaries" SELECT * FROM "Salaries_TXT";
DROP TABLE "Salaries_TXT";
CREATE TEXT TABLE "Teams_TXT" (
"idxTeams" IDENTITY,
"yearID" smallint default 0 NOT NULL,
"lgID" char(2) default '' NOT NULL,
"idxTeamsFranchises" int default NULL,
"divID" char(1) default NULL,
"Rank" smallint default 0 NOT NULL,
"G" smallint default NULL,
"Ghome" int default NULL,
"W" smallint default NULL,
"L" smallint default NULL,
"DivWin" BOOLEAN default NULL,
"WCWin" BOOLEAN default NULL,
"LgWin" BOOLEAN default NULL,
"WSWin" BOOLEAN default NULL,
"R" smallint default NULL,
"AB" smallint default NULL,
"H" smallint default NULL,
"2B" smallint default NULL,
"3B" smallint default NULL,
"HR" smallint default NULL,
"BB" smallint default NULL,
"SO" smallint default NULL,
"SB" smallint default NULL,
"CS" smallint default NULL,
"HBP" smallint default NULL,
"SF" smallint default NULL,
"RA" smallint default NULL,
"ER" smallint default NULL,
"ERA" decimal(4,2) default NULL,
"CG" smallint default NULL,
"SHO" smallint default NULL,
"SV" smallint default NULL,
"IPouts" int default NULL,
"HA" smallint default NULL,
"HRA" smallint default NULL,
"BBA" smallint default NULL,
"SOA" smallint default NULL,
"E" int default NULL,
"DP" int default NULL,
"FP" decimal(5,3) default NULL,
"name" varchar(50) default '' NOT NULL,
"park" varchar(255) default NULL,
"attendance" int default NULL,
"BPF" int default NULL,
"PPF" int default NULL
);
SET TABLE "Teams_TXT" SOURCE "Teams.TXT";
CREATE TABLE "Teams" (
"idxTeams" IDENTITY,
"yearID" smallint default 0 NOT NULL,
"lgID" char(2) default '' NOT NULL,
"idxTeamsFranchises" int default NULL,
"divID" char(1) default NULL,
"Rank" smallint default 0 NOT NULL,
"G" smallint default NULL,
"Ghome" int default NULL,
"W" smallint default NULL,
"L" smallint default NULL,
"DivWin" BOOLEAN default NULL,
"WCWin" BOOLEAN default NULL,
"LgWin" BOOLEAN default NULL,
"WSWin" BOOLEAN default NULL,
"R" smallint default NULL,
"AB" smallint default NULL,
"H" smallint default NULL,
"2B" smallint default NULL,
"3B" smallint default NULL,
"HR" smallint default NULL,
"BB" smallint default NULL,
"SO" smallint default NULL,
"SB" smallint default NULL,
"CS" smallint default NULL,
"HBP" smallint default NULL,
"SF" smallint default NULL,
"RA" smallint default NULL,
"ER" smallint default NULL,
"ERA" decimal(4,2) default NULL,
"CG" smallint default NULL,
"SHO" smallint default NULL,
"SV" smallint default NULL,
"IPouts" int default NULL,
"HA" smallint default NULL,
"HRA" smallint default NULL,
"BBA" smallint default NULL,
"SOA" smallint default NULL,
"E" int default NULL,
"DP" int default NULL,
"FP" decimal(5,3) default NULL,
"name" varchar(50) default '' NOT NULL,
"park" varchar(255) default NULL,
"attendance" int default NULL,
"BPF" int default NULL,
"PPF" int default NULL
);
INSERT INTO "Teams" SELECT * FROM "Teams_TXT";
DROP TABLE "Teams_TXT";
CREATE TEXT TABLE "TeamsFranchises_TXT" (
"idxTeamsFranchises" IDENTITY,
"franchID" char(3) default ''NOT NULL,
"franchName" varchar(50) default '' NOT NULL,
"active" CHAR(2) default 'Y' NOT NULL,
"NAassoc" char(3) default NULL
);
SET TABLE "TeamsFranchises_TXT" SOURCE "TeamsFranchises.TXT";
CREATE TABLE "TeamsFranchises" (
"idxTeamsFranchises" IDENTITY,
"franchID" char(3) default ''NOT NULL,
"franchName" varchar(50) default '' NOT NULL,
"active" CHAR(2) default 'Y' NOT NULL,
"NAassoc" char(3) default NULL
);
INSERT INTO "TeamsFranchises" SELECT * FROM "TeamsFranchises_TXT";
DROP TABLE "TeamsFranchises_TXT";
CREATE TEXT TABLE "TeamsHalf_TXT" (
"idxTeamsHalf" IDENTITY,
"idxTeams" int default 0 NOT NULL,
"Half" CHAR(1) default '' NOT NULL,
"DivWin" BOOLEAN default 'N' NOT NULL,
"Rank" smallint default 0 NOT NULL,
"G" smallint default NULL,
"W" smallint default NULL,
"L" smallint default NULL
);
SET TABLE "TeamsHalf_TXT" SOURCE "TeamsHalf.TXT";
CREATE TABLE "TeamsHalf" (
"idxTeamsHalf" IDENTITY,
"idxTeams" int default 0 NOT NULL,
"Half" CHAR(1) default '' NOT NULL,
"DivWin" BOOLEAN default 'N' NOT NULL,
"Rank" smallint default 0 NOT NULL,
"G" smallint default NULL,
"W" smallint default NULL,
"L" smallint default NULL
);
INSERT INTO "TeamsHalf" SELECT * FROM "TeamsHalf_TXT";
DROP TABLE "TeamsHalf_TXT";
---- now let's clean up the database file
shutdown compact;
|
Open your new databae and the SQL window. Tools>SQL. Copy and paste the above script into the window and execute it.
When the script is done runining close the database window, since I shutdown the engine and reopen it. All but three tables are now in Base tables. (batting, Fielding, Transactions the first two are easy, the last has data problems..darn MySQL lax data constraints anyway)
For the other three tables you need to proceed slightly differently because of some default setting on Base...
I will put up the steps and code these in the next message...
Drew |
|
| Back to top |
|
 |
DrewJensen Super User


Joined: 06 Jul 2005 Posts: 2599 Location: Cumberland, MD
|
Posted: Tue Oct 25, 2005 11:15 am Post subject: |
|
|
Ok, so let's look at these last three tables.
But before we do -
I see an error - I forgot to move the FieldingOF table in last time - so copy this into the sql window and exectue it
| Code: |
CREATE TEXT TABLE "FieldingOF_TXT" (
"idxFieldingOF" IDENTITY,
"idxFielding" int default 0 NOT NULL,
"Glf" int default NULL,
"Gcf" int default NULL,
"Grf" int default NULL
);
SET TABLE "FieldingOF_TXT" SOURCE "FieldingOF.txt";
CREATE TABLE "FieldingOF" (
"idxFieldingOF" IDENTITY,
"idxFielding" int default 0 NOT NULL,
"Glf" int default NULL,
"Gcf" int default NULL,
"Grf" int default NULL,
PRIMARY KEY ("idxFieldingOF")
);
CREATE INDEX "FLDOF_idxFielding" ON "FieldingOF" ("idxFielding","idxFieldingOF");
INSERT INTO "FieldingOF" SELECT * FROM "FieldingOF_TXT";
DROP TABLE "FieldingOF_TXT";
|
Now onto the larger tables.
Batting and Fielding first.
The problem here is that with the default settings that Base uses for HSQLDB there is not enough cache space to open the text table - because the whole table must fit in memory, and then do an insert into command against the native table format table. Since the insert statement does not issue a commit until it is finished all of the data again must fit in memory, along with the roll back table.
Therefore we take a slightly different approach.
Open the database you created in the last message and again open the SQL window. Copy and paste these command into it and execute them:
| Code: |
--- create the batting table
CREATE TABLE "Batting" (
"idxBatting" IDENTITY,
"idxLahman" int default 0 NOT NULL,
"stint" int default 0 NOT NULL,
"idxTeams" int default 0 NOT NULL,
"G" smallint default NULL,
"AB" smallint default 0,
"R" smallint default NULL,
"H" smallint default NULL,
"2B" smallint default NULL,
"3B" smallint default NULL,
"HR" smallint default 0 NOT NULL,
"RBI" smallint default NULL,
"SB" smallint default NULL,
"CS" smallint default NULL,
"BB" smallint default NULL,
"SO" smallint default NULL,
"IBB" smallint default NULL,
"HBP" smallint default NULL,
"SH" smallint default NULL,
"SF" smallint default NULL,
"GIDP" smallint default NULL,
PRIMARY KEY ("idxBatting")
);
CREATE INDEX "Batting_idxBatting" ON "Batting" ("idxBatting","idxLahman");
CREATE INDEX "Batting_idxTeams" ON "Batting" ("idxTeams","idxLahman");
CREATE INDEX "Batting_idxLahman" ON "Batting" ("idxLahman","stint","idxTeams");
-- fielding table
CREATE TABLE "Fielding" (
"idxFielding" IDENTITY,
"idxLahman" int default 0 NOT NULL,
"stint" int default 0 NOT NULL,
"idxTeams" int default 0 NOT NULL,
"POS" char(2) default '' NOT NULL,
"G" smallint default NULL,
"GS" int default NULL,
"Innouts" int default NULL,
"PO" smallint default NULL,
"A" smallint default NULL,
"E" smallint default NULL,
"DP" smallint default NULL,
"PB" int default NULL,
"ZR" double default NULL,
PRIMARY KEY ("idxFielding")
);
CREATE INDEX "FLD_idxFielding" ON "Fielding" ("idxFielding","idxLahman","stint","idxTeams");
CREATE INDEX "FLD_idxLahman" ON "Fielding" ("idxLahman","stint","idxTeams");
CREATE INDEX "FLD_idxTeams" ON "Fielding" ("idxTeams","idxLahman");
CREATE INDEX "FLD_idxLahman_2" ON "Fielding" ("idxLahman","POS");
CREATE INDEX "FLD_idxTeams_2" ON "Fielding" ("idxTeams","POS");
-- and our real problem table transactions
CREATE TABLE "Transactions" (
"tranID" IDENTITY,
"date" date default '0000-00-00' NOT NULL,
"transTime" varchar(55) default NULL,
"approxIndic" char(2) default NULL,
"retroTranID" int default 0 NOT NULL,
"retroID" varchar(255) default '' NOT NULL,
"type" varchar(5) default '' NOT NULL,
"teamIDfrom" varchar(55) default NULL,
"lgIDfrom" varchar(55) default NULL,
"teamIDto" varchar(55) default NULL,
"lgIDto" varchar(55) default NULL,
"draftType" varchar(5) default '' NOT NULL,
"draftRound" int default 0 NOT NULL,
"pickNumber" int default 0 NOT NULL,
"info" varchar(255) default NULL
);
CREATE INDEX "Trans_teamFrom" ON "Transactions" ("teamIDfrom","lgIDfrom");
CREATE INDEX "Trans_teamTo" ON "Transactions" ("teamIDto","lgIDto");
CREATE INDEX "Trans_date" ON "Transactions" ("date","teamIDto");
CREATE INDEX "Trans_date2" ON "Transactions" ("date","teamIDfrom");
|
If you don't see the tables in the tables list just refresh your view, they are there.
Create a second database in Base. This time however on the fist screen of the database wizard select;
Connect to exisiting database:
In the drop down box below this line select a databse type of TEXT
Go to the next screen
On the next screen which you can see here
http://www.paintedfrogceramics.com/OpenOffice/baseball/baseball2.png
you will set the location to the driectory where your .txt files are located.
Set the other properties - Field seperator, etc as needed.
Now save the database with any name you like - DON"T register it if you like.
Here is my screen with both database files open, side by side
http://www.paintedfrogceramics.com/OpenOffice/baseball/baseball3.png
The database on the left STATS is the Base HSQLDB databae the databae Baseball1 is the TEXT database. Notice the the tables are almost the same, I have a few txt files in the directory holding some intermediar data I was working on.
Anyway, now all you need to do is grad the table from the database on the right and drag it to the database on the left. Doing the batting table first. When you drop the table object onto the table icon in the target database the"Copy table' wizard will open up.
Since the two tables have the same name leave that property alone. But in the radio buttons choose "Attach data". (another misnomer I think - it really should read append data - but so be it)
http://www.paintedfrogceramics.com/OpenOffice/baseball/baseball4.png
Now on the next page of the wizard:
http://www.paintedfrogceramics.com/OpenOffice/baseball/baseball5.png
you see all the columns from or TEXT table - because the data files did not include headers the columns are just named COL1...COLx
On the right side is the columns in our target table.
We could if we wanted to at this point assign column transfers in different orders, or say if the input file was being split into multiple tables in our target database we could uncheck those not needed.
For our purposes just leave the defaults and Click create.
Patience...remember I created indexes so it takes a few minutes to finish on my machine..
OK, now all the data has been transfered.
Actually we could have done this for all our tables, I just decided to do the smaller ones in a script.
Now do the same for the pitching table.
If you want to check that everthing came over you can do so by review both databases, or juse running select statement with a count. In the case of the two tables there should be
For exampe we have 85978 records for Batting in both databases, so pretty sure our records came over OK..well, sure enough for now.
Now try the same thing with the Transactions table...
OOPS...doesn't work..if you tell it to continue after the warning about a data error you will get I believe 689 records...meaning you missed 56,999 records....ouch...
/rant on
Would of been nice in this situation to bring in the other records it could read, rather then stoping on the first bad record...
/rant off
OK..so in the next post we talk about the transaction table..
Then I will go over why I did it all this way...and maybe other ways we could of gone.
Drew |
|
| Back to top |
|
 |
Hemo2 General User

Joined: 20 Oct 2005 Posts: 6
|
Posted: Tue Oct 25, 2005 7:18 pm Post subject: |
|
|
Wow Drew! You've gone above and beyond with your reply! I'm going through it all now and even though I don't necessarily understand some of the statements or syntax, I understand enough to see what you are doing and why. I understand what you said about the larger tables and not being able to use the script to bring them in. I imagine there isn't a 'magic size' for this and a person has to simply try it with the scripting code first and if it fails, then do it the manual way you explain.
You've put a lot of time into replying to me and I can't thank you enough! Your reply is very detailed and I "think" I can follow your example of how to do this for future reference by using scripting. If not, I think I can at least get it done the manual way. Again, thank you for helping with this.
I did notice a couple of things that I see that look different to me and since I don't know any better, I thought I'd ask about them.
In the tables "AllStar", "AwardsVotes", "AwardsWinners" where you are creating the text & non-text tables for each, you have a statement of: "PRIMARY KEY ("fieldname") but this statement is missing from the other tables getting created. I'm guessing, but Is this Primary Key statement supposed to be there for these 3 tables?
For the "ManagersHalf" table, when you get to your CREATE INDEX statement, it is being done ON "ManagersHalf_TXT". The CREATE INDEX statement for all the other tables appear to not get done on the 'text' version of the table. Should the CREATE INDEX statement for the ManagersHalf table be done ON "ManagersHalf"?
The other thing I noticed was the table "Managers" didn't get created in your example code. (You've probably gotten carpal tunnel syndrome by now with typing as much as you have in your reply to me so I can understand how this table was over looked!)
I've tried to follow your example and have come up with the below for the statements to bring in the Managers table. Does this look correct?
| Code: | CREATE TEXT TABLE Managers_TXT (
"idxManagers" IDENTITY,
"idxLahman" int default 0 NOT NULL,
"idxTeams" int default 0 NOT NULL,
"inseason" smallint default 1 NOT NULL,
"G" smallint default 0 NOT NULL,
"W" smallint default 0 NOT NULL,
"L" smallint default 0 NOT NULL,
"rank" smallint default 0 NOT NULL,
"plyrMgr" BOOLEAN default NULL,
);
SET TABLE "Managers_TXT" SOURCE "Managers.TXT";
CREATE TABLE Managers (
"idxManagers" IDENTITY,
"idxLahman" int default 0 NOT NULL,
"idxTeams" int default 0 NOT NULL,
"inseason" smallint default 1 NOT NULL,
"G" smallint default 0 NOT NULL,
"W" smallint default 0 NOT NULL,
"L" smallint default 0 NOT NULL,
"rank" smallint default 0 NOT NULL,
"plyrMgr" BOOLEAN default NULL,
);
CREATE INDEX "Mngrs_idxTeams" ON "Managers"
("idxTeams","inseason");
CREATE INDEX "Mngrs_idxLahman" ON "Managers"
("idxLahman","idxTeams","inseason");
INSERT INTO "Managers" SELECT * FROM "Managers_TXT";
DROP TABLE "Managers_TXT"; |
I took a quick look at the "Transactions.txt" file at lines 689 and 690. Assuming these are the records that gave the error, I don't see why that would happen as they seem to be the same as some of other lines in this file. I haven't tried bringing this in yet though.
Keith |
|
| Back to top |
|
 |
DrewJensen Super User


Joined: 06 Jul 2005 Posts: 2599 Location: Cumberland, MD
|
Posted: Wed Oct 26, 2005 2:25 am Post subject: |
|
|
Before we talk about the data...the important stuff..
Darn - I said the Astros would do it in 4, just didn't think they would 'do it' this way...
I am really getting old...around 1:30 AM I stretched out on the couche...told myslef..don't do this..your gonna miss the win..and yup..missed the whole 14th inning..
You are right on both accounts, ManagersHalf was a typo (the index should of been on MangerHalf not ManagerHalf_TXT) and Manager import was a miss. Did on my machine, and somehow cut it our of the script...
As for the transactions table...here is the problem:
The second column is a date field. In HSQSLDB (and many other DB - not MySQL apparaently - although it could of been a problem in the extract routine) a valid date must have a valid year, month and day. But in the import data there are 4,100 and change records that are of the format:
1899-00-00 or 1934-04-00
I can understand that given the nature of the subject perhaps sometimes they don;t know the month or the actual day of a trade...but zeros is not acceptable to this data engine.
Anyway - I started a script that pulls these out and then imports just the 49,000+ valid records..when I managed to cause the first ever Microsoft Viisuall C++ runtime error message I have seen in OpenOffice. Which was just about the time the game was starting...so...a guy has to have priorities...
Drew |
|
| Back to top |
|
 |
Hemo2 General User

Joined: 20 Oct 2005 Posts: 6
|
Posted: Wed Oct 26, 2005 8:49 am Post subject: |
|
|
| Thanks for the explanation on the transactions table Drew. It makes perfect sense now. I think for my purposes I could maybe just have that transaction date field created as a text style field and not a date field. It would be nice if it were formatted correctly as a date, but with what I would use this field for would be nothing more than to display it for information purposes so a text type field would probably work for me. |
|
| Back to top |
|
 |
DrewJensen Super User


Joined: 06 Jul 2005 Posts: 2599 Location: Cumberland, MD
|
Posted: Wed Oct 26, 2005 9:47 am Post subject: |
|
|
Hemo,
No problem. Truth is I have been looking for a dataset to use for some timing test scripts. This turned out to be just the kind of thing I was looking for.
I also wanted to get a handle on importing issues - again a good data set.
First thing I'm thinking of doing is making a 'bullet' proof conversion script that automates all the steps needed and then posting this up the board the data is found on. There may be others that would want this in Base.
Drew |
|
| Back to top |
|
 |
DrewJensen Super User


Joined: 06 Jul 2005 Posts: 2599 Location: Cumberland, MD
|
Posted: Wed Oct 26, 2005 2:11 pm Post subject: |
|
|
Hemo,
Don't know if you have had any luck with the Transaction table just making the data column a varchar(10). I had the same idea, but the table is still too big (and wide) to bring in with one gulp - Base runs out of memory.
So either chop the file into smaller peices..or use a few lines of code to bring in chuncks of records. I am going to do the latter later tonight..
For now I am into the evening chill (freezing and snow here in the Allegany mtns today) with my nephew to watch a Halloween parade...hope they have hot apple cider for the crowd..
Drew |
|
| Back to top |
|
 |
Hemo2 General User

Joined: 20 Oct 2005 Posts: 6
|
Posted: Wed Oct 26, 2005 6:00 pm Post subject: |
|
|
| DrewJensen wrote: | Hemo,
.
First thing I'm thinking of doing is making a 'bullet' proof conversion script that automates all the steps needed and then posting this up the board the data is found on. There may be others that would want this in Base.
Drew |
That's a good idea Drew. I can't imagine you or I are the only baseball fans around that would be interested in playing with this data in Base. The amount of data from this project is a goldmine for baseball stats fans. I know the folks in charge of this project may change some of the structure for the next release, but they've yet to make any decisions. I hope to provide a free MS Access version of this data with some basic queries, reports, and forms but for those that don't have MS Access, getting it in a free program like Base would seem to be an excellent option.
Keith |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|