How I exported my Wordpress posts to Ghost

Previously, i explained why I chose Ghost as my new blog engine.

The first task I had at hand, was to export all the content I previously wrote on my previous blog.
I didn't want to start from scratch.

My previous blog was hosted on Wordpress engine and maintained by Microsoft Israel.

Export with Plugin

Browsing the web, I found Wordpress actually had a plugin which converted wordpress posts to Ghost.

You can find the complete instructions in Ghost for Beginners.

Unfortunately, I couldn't use it since it wasn't installed in the MS community, and I didn't had permissions to add plugins on my own.

I had to find a different solution.

If you build it, he will come

If you can't find a solution, build it on your own - a crawler.

First, I investigated the structure of the wordpress posts. I found out that because I used WYSIWYG editor with a crappy code plugin, the HTML generated by code sections was a mess. But besides that the structure was clear with some exceptions, like the sharing and facebook sections which were unstructured.

Second, I had to find how to import the HTML structure to Ghost. Ghost has an import/export options in the admin section - you can access it by going to /ghost/debug URL of your Ghost blog and login with your admin account.
I wrote some dummy posts, exported it and found that the structure was a pretty simple JSON structure which contains a list of posts with data on each post. I could also input the plain HTML in one of the fields.

The result of this research was a crawler which allowed you to input the address of your Wordpress blog and it outputs a JSON file which contained the data from the posts.

You can look at it in my GitHub: Wordpress2Ghost.

It also deals with downloading the images in each of the posts, but note that you will need to upload and relink the images yourself.

You can simply upload the JSON to ghost using the same admin panel, using the import option.
Note that the import never overrides or update, just adds posts.

Html to Markdown

After importing my posts, I found out that the crappy HTML had created a bad experience when reading the old posts.
So I started to edit the posts myself, and on the way, proof and patched them up, both language and semantics wise.

After a couple of days of editing, with no end in sight, I started to notice patterns in my work, so I searched a bit and found to-markdown, which is a JS based HTML to Markdown converter.

But in order to use it, I had to port it to C# (since the crawler was written in C#).

The result: https://github.com/ysa23/HtmlToMarkdown

I combined it with the crawler and it made the converting editing a bit easier.

Code sections

I have a lot of code written in the posts I wrote. From all kinds: C#, JS, Html and even XAML.
I mentioned that the previous plugin I used in Wordpress created crappy and bloated HTML with inline styles and without any sense what so ever.

I started to search for a better tool to use. Johnny recommended to use PrismJS.

I found it very useful and elegant. Simple to use in markdown and the resulting HTML was great.

var test = 3;  

Resulted:

<pre class=" language-javascript">  
    <code class=" language-javascript">
        <span class="token keyword">var</span> test <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">;</span>  
    </code>
</pre>  

It also enables you to only selected the languages you actually use, thus creating a smaller and more efficient JS and CSS resources.

You can set it up using the instructions here.

Conclusion

Its important to mention that although the tools that are mentioned here did a lot of heavy lifting, the hard labor here was to edit and fix the inner links, re-embedding the images and fixing up the grammar and syntax of the posts.
So if you want to export your data, either way, its not completely automatic.

I'll be happy to read what you think about these projects and how do you think I can make them better.

Next, I'll explain about the architecture I used setting up ghost for production usages.

Yossi Shmueli

Keeping it green since 1995

comments powered by Disqus