Tuesday, November 2, 2010

Converting our books to Kindle - work in progress

This is an as it happens log of my efforts to convert our print book We're in the Mountains, Not Over the Hill to Kindle. In the process I hope to clarify my own thinking about how to do this.

The book has about 20 b&w photos and a lot of carefully crafted layout. Several fonts, use of bold, italics, multiple paragraph formats, all to make the book look well done to the reader. The intent is to have the layout help communicate the book's message. Our print version was created with Pagemaker and resulted in a pdf file which was sent to the printer. The book sized Kindles require an Amazon proprietary format which is a subset of html. You can give Amazon a Word, PDF or HTML file, and they will automatically convert it to their format (free). You have the most control over the resulting appearance if you give them html, the least control if you give them a pdf.

There are Kindle readers on all sorts of devices: IPhones, IPads, etc. Kindle is b&w and only a couple of fonts. The other devices have color and a larger range of fonts. I plan to include color photos and a variety of fonts which will be ignored by Kindle but used by the other devices.

The possible Kindle readers are a variety of sizes, so the text must reflow. Any formatting based on a fixed size will not work. This requires a total redesign of the book to look good on any sized screen.

I plan to mostly work with the html file, but first have to get the book out of Pagemaker format. It will export to html, but that generated messy html and didn't transfer style info. PM has story mode which does have the styles, so I copied and pasted the story mode info into Word 2010. That kept the styles as Word styles, and is clearly the best way to get out of PM. (Note: 2nd book, Camino Chronicle is in Indesign CS. More on it at end)
I've made a "mini book" from the full Word file I got out of PM. I've deleted out most of the text, but kept excerpts that include examples of all the styles, and anything I think might be tricky in getting to Kindle: lists, chapter heads, poem segments, table of contents segments, index segments, etc. I'll work with the mini book all the way through to putting it on Kindle, just to get familiar with the process, and to have something small to test with for any problems that come up.

I took the mini book (MB from now on) and saved as filtered html, just to see what the generated html looked like. It looked fairly clean, so I think all I will do on Word is consolidate the styles as much as possible. I will also note all the styles that use bold and/or italics and then remove the bold italics from those styles temporarily. The bold and/or italics left in the text are those from the author and need to stay unchanged. Update: A note on styles. Don't go overboard on consolidation. For example if you have a style "left" for left justified paragraphs. You might want something like "leftlist" for a list where you want the indent, if noindent is in your normal "left". Have one for any different format that is used frequently - say more than twenty times, where you would otherwise have to change all twenty instances.

Hint: Once you have it in html, open with your browser, and drag the browser window till it is the size of your device screen. It was quite useful for me to see what headings, etc definitely needed work.

Turned off bold and italic in all the styles. Found and noted the few bold italic spots. Made 3 character styles, csitalic csbold csitalicbold. Found all italics and changed to csitalic, same for bold. Manually did the bolditalic.
At this point did the filtered html save, ran that into Mobipocket Creator, and loaded it into my Kindle, just to see how it looked. Not bad, but the justified paragraphs looked a little strange - no hyphenation and fairly large spaces between words sometimes. Neither Kindle nor html do automatic hyphenation. Tested justified vs left justified - full justified is better. Kindle justifies better than it appears on your html editor.

My guidebook on this process is mostly Kindle Formatting by Joshua Tallent. He has a couple of approaches - one is to work mostly in Word, cleaning up and simplifying your book for Kindle. The other choice is to save your Word file as filtered html, and do most of your editing in the resulting html file. You can tell that his choice is to work in html, but my testing results so far, are that it is not difficult to make a good looking Kindle book just from the Word file.

UPDATE: I used Word's save as filtered html to get my html file, but Natasha (see update link below) suggests emailing your word file to gmail, view the word attachment as html, then view as source, then save the resulting file as your working html file.

However, working with the html file gives me some needed training in html, css, etc. so I am doing some of both. Minimizing the styles in Word, and then additional cleanup in html. He talks a lot about regular expressions in the html chapter, a topic I have seen but avoided up to now. It is a very powerful way of doing massive search and replacements on a file. He also has a sample Perl script for doing some html cleanup, another topic where I have dabbled in days past.
So, I find my Perl for Dummies book, update my old Perl with the latest version, and try his script. It actually worked, dropping those unneeded line ends generated by the Word save as filtered html. First I had to find the sample programs from the Dummies book. They were not in the link listed in the book but finally found them at Perl for Dummies downloads.
UPDATE - just found another excellent blog on Kindle Formatting - more about the design side - how to make it look good: http://www.natashafondren.com/writing/category/kindle-formatting/
A day later, still dabbling. Took the cover of our book and made several test images. The correct image size for a full screen is 520 x 622 pixels, but the Kindle will adjust if over or under. I tried an rgb color image the right size, one with the largest side 1000, one the right size but grayscale not color, and one of just the red channel set to monochrome. Then tried a cmyk version. All looked the same on the Kindle. The Kindle zoom would not enlarge the big image (or any of them).

The process of getting the book & images to the Kindle is fairly simple, now that I've done it a few times. This is on Windows XP. I have a single folder where I do all the testing, called for example: MiniBook. Also inside that folder is another empty folder with the same folder name MiniBook. Inside the outer folder I have the Word file for my minibook, called minibook1.  Whenever I make changes in the Word file, I save as filtered html into the same outer folder. Most of my tests are making changes to the html and then doing the steps to get it into my Kindle. You can make changes to the html with Notepad, but the writer of Kindle Formatting suggested Notepad++ - a free download, and I've been using that.

I make a change in the html, for example, changing justification and save it. Then I open Mobipocket Creator. It has several functions. At this point I choose Convert html. It will also convert a word file, but the results are much worse than using the filtered html option of Word. The result of Convert html  is a minibook1.html file that goes into the inner MiniBook folder. If I want, I can use Notepad++ to make further changes in this file.

When ready to look at it on Kindle, I use the Build option of Mobipocket Creator. It writes several additional files into that inner MiniBook folder. One of them is minibook1.prc - the final input for the Kindle.

At that point I plug the Kindle into the computer's usb port. There is a boing as the computer recognizes the Kindle, and a Windows Explorer window pops up showing the Kindle as a drive, and the file folders in the Kindle. In my case the Kindle shows up as an F: drive. One of its folders is the document folder. All I have to do now is drag my minibook1.prc file from my inner MiniBook folder to the document folder in the F: drive. I then right click the F: drive and eject the Kindle, unplug the usb cable, and the book is there on the Kindle, ready to read.
The images need to be within that inner MiniBook folder. This took me several tries to figure out. Unrelated note: The maximum image size is slightly smaller than the built in Kindle screensaver images. When you use Mobipocket Creator there is a separate step to load the cover image. Mine was 1600 on the long side and looked good. When specified as a cover it does fill the entire screen and comes up the first time the reader opens the book. My cover doesn't have the same aspect ratio as the Kindle, so white shows on both sides. I think for the real cover I will make a special image that fills the Kindle screen.

By now I know that there are certain changes I will make in the full book, so I have setup a Kindlepending folder. It collects things I want to save for the full book - modified styles, notes to myself, etc. I will paste things I want to save into Notepad++ and save into my Kindlepending folder.
My current task is to get Table of Contents working. The TOC is setup by putting bookmarks at each point referenced by the toc, and then changing each toc entry to a link to the proper bookmark. I have done this, and the links work on Kindle, but I should be able to use the Kindle GoTo TOC function and that's not happening. Also, there is a related way to put scroll points across the bottom of the Kindle screen and that needs doing.
OK, some progress on toc and start location. You need to set a bookmark for the toc and also for the location you want the reader to start, once they push the page forward button on the Kindle. In the Mobipocket Creator, there is a Guide button. Click that, and you should add two entries one for type toc and one for type start. Each entry needs description type and location. Location must be your html filename with the bookmark appended. For example, Mtnsminitest1.html#TOC. The toc bookmark by the way, has to be in caps. Once I got the Guide part right, and did a build, Kindle showed the toc and went to the proper start.
Now for the little white bar across the bottom of the Kindle screen. If you look closely at it, you will see little vertical black bars at each navigation point that you specify- such as chapter starts. The entire bar/line across the bottom of the screen represents 100% of the book.This requires building a toc.ncx file with the navigation points and adding related info to the opf file. The contents are similar to the html table of contents in your book, but it must be in a specific format which is a little tricky. The best way to start is by copying someone's example toc.ncx file and altering it to your needs. The kindleformatting.com site has an example, or you can get a sample and more detailed explanation from spontaneousderivation.com - there is also an example in Amazon's Kindle Publishing Guide (which you should read).

I copied in one of these sample ncx files, changed title to  my test book and added some of the test book toc items, and then put that toc.ncx file in my inner Mtnstestbook1 folder with the images. I altered my opf file to have the toc in the manifest and spine just like the examples. Then I did the build. The first time it failed totally. Mobipocket said check the detailed error messages - there were none.[updated note: if you max the window you see the display errors button] I looked back through the toc.ncx and Mtnstestbook1.opf. Found I had said Testbook1... instead of Mtnstestbook1... in several places i.e. bad links. Fixed that, did a build and everything worked. Kindle showed bars on the bottom corresponding to my toc.ncx
This is going along slowly as I am doing multiple other tasks, and also following the Giants through the playoffs and now through the World Series, so I'm going to post this at this point, in case anyone else is just starting, and I will continue to update as I progress..

Several Day Gap
We had used some ZaptDingbat characters in the book as topic dividers - these characters didn't exist on the Kindle, so I did some tests of the UniCode characters listed for the Kindle to find a substitute. So far, the bullet • seems the best substitute. I'm ready now to start on the real book, and will jump back to the minibook just to test solutions for problems I encounter in formatting. I created a folder called MtnsRealTry1, and put a copy of the original word file there.

Step 1 - skipped. I already have a list of all the Word styles and their attributes. I was going to change each style with a bold or italic or bold italic attribute and remove those attributes from the style, so only the manually applied bold, italic, bold-italic would be left, then assign those to character styles. I am not going to that, instead, skipping this step. In the minitests I found that adding the character styles added a bunch of extra html - spans, etc. and now I don't see the need for it. The original reason was so that I would not accidentally lose the manually applied attributes.

Step 2 - simplify my styles. I have about 60 current styles - 12 sounds like a better number.
One Day Later
I've gotten the original down to 14 styles. Have replaced the fancy ZaptDingbat section separator characters with bullets, and converted all the table like things to simple or numbered lists. Am researching the various copyright pages different authors have used. Looking at format and placement of that information. Checking format of poems and lists.
Again several days later.
I am continuing to work in word, having found that I can set bookmarks and hyperlinks there, and they will carry through. Also am doing the index. Very tedious. I had to manually add a bookmark to represent each page in the print version, and then in the index, link each page reference to the proper bookmark, i.e. p21, p45, etc.

On interior images (revised 12/4). I've done quite a bit of experimenting with full sized photos, and have found that 600 x 755 leaves room for two caption lines below if the the user chooses a moderate text size. It is also big enough so a Kindle DX user can zoom and still get a good image.

Multiple days pass... someone at Thanksgiving asked why there wasn't more activity being posted in this post. I realized while doing the initial efforts, that Susan was going to want to make a fair amount of updates to the text of the book.  She could do that to the hmtl file, but Word is easier to use for that purpose. That means I have to wait for Susan to do the Word stuff before I can work with the final html file. That will still be a couple of weeks, so I am taking a detour to learn Perl. I can use it to do a lot of the edits required for the Word generated hmtl. I could do those edits faster manually, than writing a program to do it, but my hope is that the program will make the 2nd book conversion much faster.

Feb 3, 2011 - Susan has finally finished her editing, so I will start up again. Learning Perl was sort of fun, and allowed me to automatically cleanup some of the Word html. However, I suggest that once you have the Word styles cleaned up, just work with the html file from there on. Will keep you posted. 

Feb 5 - Mobipocket Creator requires a price, and a drm (digital rights management) decision. I'm setting 9.99 as that is the max I can set and get 70% from Amazon. On drm, I'm still undecided. However, I have learned that the Mobipocket Creator drm choice should always be set to NO. If you want drm, Amazon will give you a choice later.

In my cleanup of the html, I am finding that huge amounts of html put out by word can just be dropped, particularly spans assigning fonts, etc. Also a lot of the margin info can be dropped with no change to the appearance on Kindle. The Notepad++ program mentioned earlier is very powerful for doing these repetitious deletes.

The left margin is tricky. Supposedly the left-margin and right-margin styles have no effect, but I have found that any margin-left: value that is non zero will create about a quarter inch left margin, similar to what a blockquote does. this is useful for doing indented lists.

If you are going to specify font-size in absolute values, 12 is a different size than 12.0pt. For spacing values such as width they give the same space, but not for fonts.

Update: During a recent vacation, I used the Kindle for the first time for my own entertainment, buying some light fiction for the plane ride. I discovered that I preferred using the Kindle in the landscape mode, and as I walked thru the plane, noticed that the two other Kindle users I saw were also reading in landscape mode. I don't know any stats on this, but would like to know. It has implications particularly for any images you have in the book, and possibly for other layout issues.

Kindle Links: While searching for Kindle orientation stats (didn't find any) found Stephen Windwalker who writes widely on Kindle & other matters, founded Kindle Nation blog. He put out a useful list of kindle links.

PDF Conversion notes: While waiting for Susan to do some text revisions, I started on Camino Chronicle. Its source is Indesign CS, which has no epub conversion and no word conversion, so I started from the pdf that went to the publisher, using Adobe Acrobat 6.0 to convert from pdf to Word 2010. After working quite a while on editing the Word file, I noticed that there were a number of errors in the converted text that were not in the original. I had put in too much effort to go back and search for a better pdf converter, so worked with what I had. The first thing that popped up was ligature conversion. Ligatures such as ff fi fl etc. in Indesign were converted as ligatures, i.e. a single character, followed by a space. Word spell check caught most of these. For example in Indesign and the pdf first would be 4 characters. The fi ligature character followed by rst. It converted to fi rst. It took a while to find all those. Had to learn how to do unicode searches in Word. The other problem has been randomly inserted spaces. Sometimes spell check or grammar check finds them, sometimes not. I am just going through, reading the text word by word, looking for errors. Another minor problem is that running footings and headings are converted, and are usually not appropriate any more.

April 21, 2011 - finally have We're in the Mountains on Amazon Kindle, and linked with the paperback edition. It took a long time, mainly because every time we looked at the text, we found another minor thing we wanted to change. The Kindle version is not a new edition, but it has many changes, some just typos, others reflecting the passage of time - new links, some of the people have died, etc.

Re html editing. I've been using Notepad++ for the most part, and like it. However, when Susan was doing her final edits, she needed something a little more WYSIWYG. We have Web Expression, but don't use it much, planning to when we finally convert backpack45.com from Frontpage to something supported. Anyway, we used Expression to do the final Kindle edits, as you could switch easily from design view to code view. That edit process turned up a number of html errors that didn't bother the Kindle, but were clearly wrong, and could very well cause problems for Kindle readers on other devices. The errors found were either missing or extra tags, highlighted in yellow as errors in Expression.

1 comment:

  1. Hello Ralph

    It is interesting reading your first accounts of converting books into ebooks.

    This is something we have considered for our book An Italian Odyssey. I even started the process back in August converting the file from InDesign to ePub for the iPad and html for Kindle. I had high hopes that we could include color photos and still maintain the level and quality of formatting that appears in the paperback in the the ebook version, at least for the iPad.

    However I can not say that I was impressed with the results - even disappointed might be a better description.

    So I look forward to hearing more about your experiences. It might motivate me to try once again.


    Neville Tencer
    An Italian Odyssey: One Couple's Culinary and Cultural Pilgrimage