I'm migrating a whole bunch of legacy data into mysql for a client. I export from their system, use sed to patch the nonconforming unparseable CSV it outputs, convert from macroman to UTF8, then tell MySQL to load it all into a series of tables from which the data will later be integrated into the standard system.
Today I realised the error of my ways: the column that I was using as a primary key isn't a primary key at all (at least not in the modern sense - of having a unique value - that we've become accustomed to since Codd invented the relational model in 1969).
I could have realised this earlier by just declaring it a primary key and watching MySQL spit at me, and therefore have fixed the problem sooner ... but no, who needs tests and constraints when assumptions are so much faster?
Enough about me ... the problem now is to assign each row in the exported
data a unique id, and the easiest, obviousest way to do that is to include
an id integer primary key auto_increment
column in each table
definition, let LOAD DATA INFILE etc etc
shove the data into
the table, and the primary key takes care of itself.
But it wasn't obvious how to make this work without warnings about mismatched
columns. Here's the clever bit: what I did in the end was to add the id
column after loading the data. In other words, I imported the data
into a table with no id
primary key column; and added that
column afterwards. It's easy to script, it does exactly what I want, and
it produces no spurious warnings. Perfecto!
Here's the concept in code:
> create table legacy_stuff(leg_col_1 varchar(255), leg_col_2 varchar(255), et_cetera_1 varchar(255), et_cetera_2 varchar(255));
> load data infile 'legacy_stuff.csv' into table legacy_stuff; -- plus all your favourite options;
> show warnings;
> alter table legacy_stuff add column id integer primary key auto_increment;
Good luck...
Update: for another way to look at this, see the
stackoverflow
page on this topic. It turns out you can use load data infile
with the list of columns you want to import to, in which case MySQL will
match the columns of your CSV to your specified column list. This way you
can let your auto-increment primary key column simply manage itself. This
solution does not suit my particular situation as I'm importing tables with
hundreds of columns (they didn't believe in normalisation where this data
comes from), and I don't want to have to maintain the column list in multiple
places.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.