Migrating your blog from any BlogML based platform to WordPress

Published on : Sep 12, 2012

Category : General



  BlogEngine.NET to WordPress First I need to apologies to my readers for couple of things. One you might have noticed lot of updates recently in your readers with old posts, that’s mainly due to the migration process and second I had to delete 100’s of comments that’s been left in the blog for over 7 years now. The main reason for simply thrashing all the comments is due to the amount of spam comments I had in my blog. When I exported the blog (as a BlogML) file, the size of the file was around 35MB. That’s mainly due over 60,000 comments in the blog from world wide spammers. It was not practical to go through all of them and filter the good ones. I also dont want to carry those spam comments to my new platform. Clearing the comments brought the file size to mere 2MB.

Why WordPress?

Technical fellow like me tend to choose the platform we are familiar with (in my case Microsoft based, ASP.NET) rather than the platform that’s matured. I moved to BlogEngine.NET some 4 years ago from BlogSpot, mainly because that was one of the familiar ASP.NET based blogging platform available that time. It gives you some level of comfort working on the known environment like IIS, Virtual directory etc. But honestly I never changed a single line of code apart from the occasional html theme file changes.  Moving to WordPress is a whole new world with php, MySQL. But with my 2 weeks experience, I’m already feeling comfortable with the rich plugin eco-system and managed to do everything I wanted to do. Just hired a designer to get a really cool custom theme within 4 days. Unfortunately the migration process was not very straight forward, none of the articles I found on search helped me to achieve what I want. Every step I had to do resulted in writing some small utilities, I combined all of them into a simple console application (BlogML.Helper.exe) with command line parameters to help people going through this process. At high level we are going to convert BlogML into WRX file (WordPress Extended RSS import file format) and import it into WordPress.  I took portion of the source code from this open source blog migrator project. Warning: I’m providing this source code and process AS-IS without any warranty :-). You can download the complete BlogEngine.NET to WordPress migration tool source code.

Do you want to keep the comments?

The very first step is to login to your blog admin console and export your content as BlogML. In BlogEngine.NET, you go to Settings>Import/Export. Once downloaded, you need to make a decision whether to keep the existing comments or not.  As I explained earlier, if you have used BlogEngine.NET there is a high chance you blog is flooded with 1000’s of spam comments. This is mainly due to the link building loop hole in the platform spammers exploited. This is your chance to rectify it. From the following screen shot you can see there are few options available with the tool. blogml.helper.exe
  • RemoveComments
  • ExportToWRX
  • QATarget
  • QASource
  • NewWRXWithOnlyFailedPosts
The very first option is to specify /Action:RemoveComments  /BlogMLFile:<You exported file name> as shown below remove comments from blogml files The results will show the name of the posts and number of comments been removed. It will also update the supplied file.

Correct the categories manually

There is a slight variation in the way categories are handled between BlogML and WRX, so open the BlogML file in your XML editor and manually replace the categories by simple “find and replace”. You need to convert the GUID based category id to text based. Example: In the following case convert “9bdfceee-7814-4d4d-b77d-deaf893e402e” to biztalk, “1550e058-44fa-42c3-a666-5826f4c50874” to biztalk-azure etc blog categories You can also use this opportunity to consolidate all your categories. Ex: If there are only 1 or 2 posts in the category, place those posts in different category and delete those categories. In my opinion there should not be more than 10 categories for your blog. For fine grained categorisation you can use tags.

Upload all the images and files to WordPress

In case of BlogEngine.NET all your uploaded blog images will reside under the folder blog\App_Data\Files. Copy them across and upload it under the WordPress upload folder blog\wp-content\uploads\files

Correct all the internal links in the blog posts

It’s much easier to correct all your internal links at this stage, since you are dealing with one single XML file. Some of the links you need to worry about Spend as much time as possible in this step to make sure you correct all the above links. This is really time consuming process depending on the size of your blog, but it’s much quicker to do it here. If you blog is really big, look in the option of updating the source code to automate this task.

Convert BlogML to WRX format

At this stage you should have a healthy BlogML file with all the links corrected, all the categories consolidated and corrected. You simply run the tool with the following command BlogML.Helper.exe /Action:ExportToWRX /BlogMLFile:BlogML.xml /SourceUrl:blogs.digitaldeposit.net/saravana /TargetUrl:blogs.biztalk360.com The above command should create 4 files as shown below blogml to wrx format BlogML.WRX.xml is the WordPress extensible RSS file ready to import. BlogML.WRX.TargetQA.txt file contains all the URL’s based on new blog for testing BlogML.WRX.TargetQA.txt file BlogML.WRX.SourceQA.txt file contains all the current URL to check for 301 redirect BlogML.WRX.Redirect.txt files contains the redirect statements for each post

Import WRX file in WordPress

Login to your WordPress admin page, on the left hand navigation select “Tools\Import”. Import page will come with list of supported import options. You pickup “WordPress” from the list.  If this is your first visit, WordPress will ask you to install the plugin and activate it. Once it’s ready, it will show you the Import page as shown below importing wordpress Navigate to your BlogML.WRX.xml file and click “Upload file and import”. Once the file is uploaded, it will show the list of options to choose the user under which you want import the posts, you can choose an existing user or create new and click import.

Dealing with Import issues

In my case the import always resulted in only partial import of posts. From your BlogML.WRX.TargetQA.txt file (looking at line numbers) generated earlier,  you will know the total number of posts that’s present in your blog. If it’s not matching the number of posts imported, you need to follow these steps to get all of them in. This is bit time consuming process. Run the following command BlogML.Helper.exe /Action:QATarget /QATargetFile:BlogML.WRX.TargetQA.txt import BlogML.WRX.TargetQA.txt file The tool will check each ?URL (your blog posts) and record the status whether “OK” or “Protocol Error” and at end of the execution it will produce a file called BlogML.WRX.TargetQA.Report.txt. The content will look like this blog urls Now run the following command in the tool BlogML.Helper.exe /Action:NewWRXWithOnlyFailedPosts /WRXFile:BlogML.WRX.xml /QAReportFile:BlogML.WRX.TargetQA.Report.txt which will produce a new WRX file called BlogML.WRX.OnlyFailed.xml  only with the posts that resulted in “Protocol Error” (in other words, those posts that didn’t import properly). Now login into WordPress admin and follow the import steps using the new file BlogML.WRX.OnlyFailed.xml . You may need to repeat these steps few times until all the posts are imported.

Don’t forget the 301 Redirects

It’s very important to make sure you redirect your original blog links to new ones.  Otherwise you risk the chance of loosing all the Google ranks you acquired over years. I’m not going to explain in this article how to do it. But if you remember, the tool creates a BlogML.WRX.Redirect.txt file automatically to assist you in the process of setting the redirects.