How to migrate oozie from derby to mysql — an easy way

Апр 07 2014 Published by under BigData

Some time ago, I faced the problem with our oozie installation running on top of DerbyDB — it becomes too slow and had lots of errors with DB locks. Fast search shows that derby (default oozie backend DB) isn’t good for relatively large systems, so migration to mysql is required. The switch itself is quite straightforward, but if you have existing data in derby, it can be painful to move them into mysql. But, if you know exact steps, the whole process is not hard. Ok, let’s go.

Preparation

First, you need mysql installed with ability to connect from host with oozie master. I won’t cover it here — there are plenty of tutorials available.

Oozie uses jdbc, so, you’ll need proper connector for mysql. Grab it here. From archive, you’ll need only jar file. Place it in libext folder of oozie and rebuild war file with (you need to stop oozie first)

bin/oozie-setup.sh prepare-war

Then modify config file (but make copy of old config file, just in case). There are four options related to DB connectivity:

  1. oozie.service.JPAService.jdbc.driver, set it to com.mysql.jdbc.Driver
  2. oozie.service.JPAService.jdbc.url set it to jdbc:mysql://MYSQL_HOST/DB_NAME (for example «jdbc:mysql://localhost/oozie»)
  3. oozie.service.JPAService.jdbc.username, set it to mysql user name
  4. oozie.service.JPAService.jdbc.password for password

After that, you need to create oozie schema in mysql (and check that it can connect to DB) by issuing the command:

bin/oozie-setup.sh db create -run

Now, you almost done — everything you need is to migrate your data form Derby. But if it’s ok to start with clean oozie db, you can just skip this and start oozie.

Data dump

Ok, we get to part which caused me to write all this :). Oozie stores a lot of data in DB (too much from my point of view), so, most of DB migration tools available just don’t work. After lots of experiments, I found one which worked for me with just one small hack: JdbcTool. Grab it and unpack in a server with oozie master — it’ll need local access to derby database.

You also need to put derby jars in jdbctool’s lib directory (which allows it to use derby). To get them, download latest version of derby tarball.

After that, you can open old oozie DB with jdbctool (which provides CLI for SQL):

bin/jdbctool -u SA jdbc:derby:/path/to/oozie-db

To check that it is working, you can count rows we need to migrate:

derby:/ext/oozie-db/> select count(*) from WF_ACTIONS
----------
| 1      |
----------
| 225592 |
----------

Ok, now we dump whole oozie data into sql script with command:

bin/jdbcdump -u SA jdbc:derby:/path/to/oozie-db > oozie.sql

Be patient, it can take LOTS of time and output can be large. Our DB dump is 5GB large o_O.

Data load

Now, we just need to load data we just dumped into mysql db, and we’re done. But there is one small problem here — for some reason, jdbcdump don’t put quotes around primary key, so we need to fix this with sed:
cat oozie.sql | sed -r "/^INSERT INTO/s/VALUES \(([^,]+)/VALUES ('\1'/g" | mysql --show-warnings -uhadoop  oozie

After that, you have your data migrated. It’s time to start oozie.

No responses yet

Добавить комментарий