Import from Microsoft Word
Importing from Microsoft Word is complex, because it is not a structured format. So the results will depend a lot on the input quality. The most important consideration is usually that Word styles should have been applied consistently.
There are three options when preparing import from Microsoft Word:
-
Use the Import Wizard to directly import the Ms Word content. It can handle most well-structured Word documents.
-
Convert to DocBook using the (optional) Oxygen XML editor with the Paligo plugin.
-
Convert to XML with a purchased customization from support. This is used if your content is not easily imported directly. For very large documents or many documents, the latter method is sometimes preferable.
The simplest method is to import the Ms Word files directly by using the Import Wizard. Paligo can handle most well-structured Word documents.
-
Zip each individual Word file. Do not include multiple Word files in one zip file.
Tip
To find out about zipping files in Word, refer to the Microsoft Word help.
-
Use the Import Wizard to import the zip files.
Select Word (.docx) as the type of file to import.
By using Oxygen XML editor, you can convert the content to a DocBook document before importing it.
-
In Oxygen, create a DocBook 5.1 Article, by selecting File > New, and then selecting the proper DocBook template.
-
Remove the first "sect1" element.
-
Insert a new
section
element by pressing enter on your keyboard and selectSection
in the element list. This will not really be used, and we'll remove it at the end. But it's needed because of a quirk in Oxygen. -
Place the cursor inside the
section
element. -
Save the document with any name you choose.
It is however important to save it with a name, otherwise images will not be properly saved.
-
Copy the text from your Word document. You should leave out any Table of Contents or similar, since it won't be needed.
-
Paste the content into the
section
element in Oxygen.You will get a warning saying that it needs to place it inside the closest Article element. Go ahead and accept that.
-
When you paste the content, it will be automatically converted to the proper XML elements.
-
Remove the empty
section
tag. It will be at the bottom or at the top of the document. -
Save the document again.
-
Use the Import Wizard to import the resulting DocBook document.
The third method involves a preconversion to XML using a script package that requires a purchased customization. It is slightly more complex, but with good results. It is especially preferable for very large documents or many documents, as it is very fast and can be tweaked more to adapt to your content.
Note
Contact Paligo support if you would like to use this method instead of the above procedures
.
Depending on the complexity of your content and mapping the structure, there may be a charge to do this conversion.
If you are experiencing problems with the MS Word import, you may find useful information about the most common issues in the following sections.
Note
If you are experiencing different types of problems with your MS Word import, please contact customer support for help.
If the import process is not complete, make sure to properly Prepare MS Word Document for Import. It needs to have a title and use headings correctly as a minimum.
Note
If you continue to experience problems, contact customer support for help.
If Paligo's Word import process completes successfully but the import folders contain no topics, or some topics are missing, it is likely due to incorrect formatting.
Make sure that your Word document has a title and uses headings correctly, as described in Prepare MS Word Document for Import.
If you prepare your MS Word document correctly, the import should create a topic for each section of content that has a heading. If you continue to experience problems, contact customer support for help.
If you imported a table from MS Word and the table extends beyond the boundaries of the Paligo editor, this is due to the table being too wide. You can still edit it in Paligo - use the scroll bar at the bottom of the Paligo editor to scroll horizontally to the additional cells.
For HTML outputs, Paligo will add a scrollbar feature to the web page so that your readers can scroll to cells beyond the display area.
For PDF outputs, the table will go off the edge of the page. For this reason, you should consider setting the table to display as landscape rather than portrait, see Rotate a Table (PDF). You may also need to redesign the table if it is too large for the page size, for example, you may need to create several smaller tables instead.
In MS Word, there are two ways to add content between steps:
-
Make two lists appear as one list
To achieve this, you create a list and when you want to add content between steps, you add the content after a list item. After the added content, you start a new list. You set the second list to "Continue numbering" so that the two lists appear as one continuous list.
1. List item 1 2. List item 2 3. List item 3 Content between steps 4. List item 1 for second list, set to Continue Numbering so becomes list item 4. 5. List item 2 for second list, follows numbering from previous step.
-
Inline content with manual line break
With this approach, you create one list. When you get to a position where there needs to be content between list items, you press Shift and Enter after the list item above the content. This creates a line break. You then add the content inline and press Enter to create the next list item. This creates one list with the added content inside a list item, like this:
1. List item 1 2. List item 2 3. List item 3 Content between steps 4. List item 4. 5. List item 5.
If you import MS Word content into Paligo, lists with inline content will usually import correctly. With some imports, the Shift and Enter can sometimes be interpreted as the literallayout element instead of a regular para element.
If you import MS Word content with lists that use Continue numbering, Paligo creates one list. It ignores the "two lists made to look like one" structure. The content between list items is nested inside the list item, like this:
1. List item 1 2. List item 2 3. List item 3 Content between steps 4. List item 4. 5. List item 5.
Note
If your MS Word content has nested lists (lists inside lists), Paligo will not restructure them in the same way and they may appear with incorrect numbering and structure.
You can fix the list structure manually in Paligo, for example, by dragging and dropping list items in the XML tree view. Alternatively, Paligo professional services can set up an import customization that disables the list reassembly during imports. This may provide better results for imports of multi-level lists. Please contact customer support to ask for an import customization.
If your lists in MS Word use a soft return (shift + enter) to place content on the next line, it will be imported into Paligo inside a literallayout
element. You can see it in the Paligo editor as there is a shaded box around the content and, if you select it, you can see the
literallayout element in the Element Structure Menu.
While this is valid, it is not the way text and images are usually added to a list item in Paligo.
The more commonly used structure for this content is:
<orderedlist> <listitem> <para>This is step one.</para> <para>In MS Word, this line was created by using a soft return at the end of the previous line.</para> </listitem> <listitem> <para>This is step two.</para> </listitem> </orderedlist>
So the difference is that the extra line of text in the step is inside an additional para
element rather than a literallayout
element. If the content in the list was an image, it would be inside a mediaobject
structure instead. But in both cases, they should be inside the list item.
To correct your content you can either:
-
Edit the lists in MS Word
Remove the soft returns and then add a new list item for the text or image. This will create an extra step that you do not want, but that is intended for now. Next, position the cursor at the start of the new list item and press backspace. This will make the text/image an indented part of the previous list item. In this form, it will import into Paligo cleanly, with no
literallayout
element.For details on adding content inside a list item, see Prepare MS Word Document for Import
-
Edit the lists in Paligo
For text between lists, add a
para
element inside thelistitem
element and then add your text content to that.For images between lists, either insert an image inside the
listitem
element or use the XML tree to move the mediaobject element into thelistitem
.
Paligo imports Microsoft Word number lists as ordered lists. If you want them to be imported as procedures, it may be possible to do that as a customization project (there is usually a fee for customization projects). Please contact customer support for details.
The Paligo editor always shows tables as portrait. For HTML outputs, wide tables are given a scroll bar so that users can access all of the data in the table. For PDF outputs, you can set them to display in landscape mode instead, see Rotate a Table (PDF).