for purpose of reference: migrate a MediaWiki

Posted on March 10, 2009. Filed under: Debian, HowTo, sys admin | Tags: , , , , , |

[i] Recently, I had to move a MediaWiki from one server to another. Additionaly, the target server had to have a different MediaWiki version number than the source server. That is because the first one was Debian Etch while the target one was Debian Lenny.

Neither box could be put to the other one’s operating system without risk, to get around the diverting version number issue. Of course, it might have been possible to go through the burdens of installing MediaWiki from scratch on either of the boxes, but that is not the Debian way. I wanted to do the migration without itching the Debians.

This here post is intended for reference.To go back to whenever you may need to migrate a MediaWiki under similar circumstances — from one box to the other, from one version number to a different one, or even both. By what I read, I suppose it’d be even possible to do it the downward way, i.e. from a higher version number engine to a lower one.

There are several ‘the easy way’ descriptions around on how to achieve the migration smoothly, but none of them served me fully. Particularly, the detailed instructions one that can be found by way of the official MediaWiki Moving a wiki manual page produced a corrupted MediaWiki database on the target system, so I had to start over. Those descriptions I found by the official manual also implied often to make use of mysqldump which I am unfamiliar with. So I had to look up instructions for that too.

In the end, I found a well guideline that makes use of MediaWiki-immanent functionality, like exporting and importing a MediaWiki via the MediaWiki web front-end. Surprisingly, I found this guide line by Google, not by the MediaWiki manual. Someone might want to fix that.

But… that one guideline still didn’t bring me through completely, from start to end. I had to tickle around still. By the following instructions I stick closely to those of the guideline, as long as appropriately, then step over to what I figured out myself. In order to keep the readability of the HowTo, I skipped distinguishing the quoted part from mine. A simple diff can do that for you, though.

One last note ahead: In order to make the post look more fancy, WordPress may have corrupted the quotation marks and double-minusses I used thorough the HowTo. Try ordinary single and double quotes, double-minusses instead of whatever WordPress might have converted them to.
 

So, here’s my solution:

  1. Export The Source Wiki Pages
  2. Import The Wiki Pages To The Target Wiki
  3. Export The Source Wiki’s Images And Other Binary Data
  4. Import The Binary Data To The Target Wiki

 

Originally, I had the installed and content-filled MediaWiki on the first box, a rented remote virtual host. This one, I call ’source’ thorough this here article. (The box I migrated to, will be ‘target’ hence.)

Export The Source Wiki Pages

  1. On the target box, I installed the current Debian MediaWiki. A while ago, I described in detail how to install MediaWiki, therefore I skip that here.
  2. Log in to both wikis as admin. Rather than root or even admin, the admin account is usually named WikiSysop.
  3. In the source wiki, direct your browser to Special pages > All pages.
  4. Next, you need to copy the page titles you find there to a text file. In order to do that, fire up the text editor of your choice, copy the page titles listed on the All pages wiki page to that editor, save those contents to some file, say source_wiki_pages.txt.
  5. Likely, the saved text file now contains tabs, maybe even empty lines. However, we need it one page title a line, and no empty ones. — The original guideline suggests some vi comand for that find-and-replace but offers no further help on how to get along with vi. So, people not used to vi may feel lost.   For that reason, I found a workaround that can be done in a shell and has its beauty though:
    sed 's/\t/\n/g' source_wiki_pages.txt > source_wiki_pages.seded1
    followed by a
    sed '/^$/d' source_wiki_pages.seded1 > source_wiki_pages.seded2.txt
    This utitlizes the sed stream editor, a tool that’s likely to be present on virtually any Debian.
    The relevant file we get in the end is source_wiki_pages.seded2.txt. Re-open that in your editor.
  6. Done with that, now direct your browser in the source wiki to Special pages > export pages. You’ll get a page with a large text box in it.
  7. Now, copypaste the contents of source_wiki_pages.seded2.txt to the Special pages > export pages text box. If you’d hit the export pages button then, you’d get exported the current version of all wiki pages. But before you do, read on.
    • Your alternatives are to uncheck the export only the current version of the page(s) checkbox below the text box. That would give you full version history for every exported article. I prefer to do so, so in future there won’t be any situation where I might be forced to guess why someone changed what article — the article revision list would reveal that.
      So: uncheck that box.
    • The other option you have is to choose individual pages to not to export. You can achieve that by deleting their respective article names from the text box. I prefer exporting all the pages anyway, since it’ll keep you all the options, and deleting individual articles can be achieved by the usual ways of wiki administration anyway. Hence, I keep the list untouched.
  8. Hit the export pages button below the text box. That will output the whole wiki text contents (including the article histories) to your browser, as XML. Depending on the size of your wiki, this may be several megabytes. Make sure to use a browser that is capable to handle such large ‘pages’.
  9. Save that XML page to a XML file, say source_wiki.xml. Make sure you can access that file from the box your target wiki is running on. — Let’s say, you park it to /home/me/source_wiki.xml, so we’ll have a common handle for it, below.

Now, you’re done with exporting from the source wiki. Let’s continue with importing its contents to the target one.

Originally, this point — switching over to actually import the wiki to the target box — was where I got stuck: The guideline suggests to direct the browser to Special pages > Import pages in the target wiki, then just upload the file we’ve just exported. — That may work on small wikis or when we do the export without version history, but in case of my few-dozens pages wiki including version history it simply didn’t work: I hit the max upload limit by the few megabytes of file size. (Oh, and yes, I double-checked this and looked for the preset max upload limit — which should have been sufficient, but still didn’t work.)

Therefore, what follows is what I figured out myself. (And that’s also the reason why I insisted on you to put the exported pages file to the target box rather than somewhere else.)

Import The Wiki Pages To The Target Wiki

Every one of the following steps requires to be root on both boxes… I presume you know about using root in a shell, like typos and other misfortunes …or at least you need to execute the following commands by sudo.

  1. Ssh/log in to your target box. Get to the command line shell. Have your root permissions.
  2. Call
    php /usr/share/mediawiki/maintenance/importDump.php /home/me/Desktop/source_wiki.xml
    That will import the contents exported from the source wiki, but it may take some time. You might want to get yourself a fresh hot coffee or see what’s new on Hacker News, in the meantime.
  3. The output of importDump.php suggests us to run rebuildrecentchanges.php, so let’s do just that:
    php /usr/share/mediawiki/maintenance/rebuildrecentchanges.php

Next, we need to get the images, mathematical formulae and other binary files we might have uploaded to our source wiki:

Export The Source Wiki’s Images And Other Binary Data

  1. Just as for the target box before, now ssh/login to your source box. Get to the command line shell. Have yourself root permissions.
  2. Build a compressed archive of the binary uploads from the source wiki:
    tar cjf /home/me/mediawiki-files.tar.bz2 /var/lib/mediawiki1.7/upload/
    Note: On MediaWiki 1.7, this was actually .../upload/, while on MediaWiki 1.12 I found .../images/ to be current. An ls -l /var/lib/mediawiki1.7/ might reveal to you which one is appropriate.
  3. Move that tar file from the source box to the target box, say by something like this:
    scp source.box:/home/me/mediawiki-files.tar.bz2 target.box:/home/me/mediawiki-files.tar.bz2
    On doing so, make sure no firewall is in the way, and SSH is fine on all involved machines (i.e. including the one you issue the command from).

Now, we need to import those data to the target box:


Import The Binary Data To The Target Wiki

Go back to the target box shell.

  1. Go to /tmp/, untar the binary files archive there:
    cd /tmp/
    tar xjf /home/me/mediawiki-files.tar.bz2
    You’ll get a sub-tree named /tmp/var/ where all the binary files are scattered around.
  2. Gather all the files from /tmp/var/ to a single directory:
    mv `find /tmp/var/lib/mediawiki1.7/upload/ -type f` var/ && rm -rf /tmp/var/lib
    (That’s backticks around the find /tmp/var/lib/mediawiki1.7/upload/ -type f.)
  3. Now, [still on the target box' command line,] tell MediaWiki to import all those pictures:
    php /usr/share/mediawiki/maintenance/importImages.php --overwrite --user=WikiSysop /tmp/var/
    Compared to the wiki text contents import, this will take place relatively quickly.

And that’s it. You’re done.

Phew!

Now, clean up after yourself, remove all those temporary files we created in between (don’t forget to remove that /tmp/var/ tree we just created as root), close shells and windows no longer necessary, and make sure to be logged in to the wikis as a regular user again, rather than WikiSysop.

Enjoy your achivements by directing your browser to your just set up target wiki and — even more — by going to a page there that has binary content in it, say a picture. To convince yourself.

Have fun! :) Enjoy the day!
 

_____
NB. I hope, this was helpful for you.

Make a Comment

Make a Comment: ( 1 so far )

blockquote and a tags work here.

One Response to “for purpose of reference: migrate a MediaWiki”

RSS Feed for Tech/Social/Howto Comments RSS Feed

after importing the images, you might need to perform an additional “chown -R www-data.www-data /var/lib/mediawiki/images” as root to enable the wiki to create image previews and thumbnails.


Where's The Comment Form?

Liked it here?
Why not try sites on the blogroll...