www.ebbemunk.dkarrowHow to edit long texts for HTML

Danish Book: "Dansk landbrug i det politisk styrede marked"

.

The document is the dissertation that I wrote in 1987. I have chosen to publish this document because of:

  • The layout of the document is a proposal of how to set up a non-fiction text for HTML purposes. The order of sections is changed compared to the printed versions. I moved the preface to the end of the document in section 5.1, because the beginning of the web text should not be disturbed by a preface.
  • The content is still interesting to read – for those who understand Danish and are interested in the political economics of how to manage the market for agricultural commodities.

Hyperlink connections:

Inline Frames Example

.

The best way to show a text is by using inline frames, see example

Benefits:

  • Good overview of the document
  • Line length not too long: When using an ordinary screen and a full-screen browser window, the lines will approximately be 60 or 80 characters per line.
  • The URL of the single files are shown, i.e. easy to bookmark

Concerns:

  • Not shown in Netscape 4.x – even though inline frames are a part of the HTML4 standard
  • Need for adding manual corrections in all text files, because WordToWeb cannot create inline frames directly.

Corrections after Conversion

  • The "0" number in the first heading is deleted in the files teori.html and contents.html.
  • Added TARGET="_parent" to make the Home button return to the proper home page
  • In all text files inserted a table from /HEAD to /BODY with left column pointing at contents.html. Nearly all links in contents.html and w2windex.html have been supplied with TARGET="_top"

Ordinary Frames Example

Benefits:

  • Good overview of the document
  • Line length not too long: When using an ordinary screen and a full-screen browser window, the lines will approximately be 60 or 80 characters per line.
  • Shown in nearly all browsers

Concerns:

  • The browser cannot respond to arrow down, page down, etc. until you click in left or right part of the screen
  • The browser don’t know by itself which part of the screen you want to print – until you click in left or right part of the screen
  • It is impossible to bookmark a certain HTML text file. The URL shown will always be "index.html", etc.

Corrections after Conversion

  • The "0" number in the first heading is deleted in the files teori.html and contents.html.
  • Added TARGET="_parent" to make the Home button return to the proper home page

Page without Frame

Benefits:

  • This layout is displayed and printed without problems in any browser.
  • The URL of the single files are shown, i.e. easy to bookmark

Concerns:

  • You can either read or look at the table of contents, not both.
  • When using an ordinary screen and a full-screen browser window, the lines will be much longer than the recommended 60-80 characters per line. This will slow down reading and reduce understanding.

Corrections After Conversion

  • The "0" number in the first heading is deleted in the files teori.html and contents.html.
  • Added TARGET="_parent" to make the Home button return to the proper home page

File Sizes

The conversion time is around 9 minutes for all of the examples.

Danish Book: Dansk landbrug

Inline Frames

Ordinary Frames

HTML text file size

345 kB (13 files)

340 kB (13 files)

HTML text + figures file size

463 kB

458 kB

HTML / doc relative file size

463 / 1,061 = 44%

458 / 1,061 = 44%

Size of index.html and contents.html

1: none

2: 6 kB

1: 1 kB

2: 8 kB

Average text file size

345 /13 = 27 kB

340 /13 = 26 kB

Average text file size with figures

463 /13 = 36 kB

458 /13 = 35 kB

Average need to download

0 + 6 + 36 = 42 kB

1 + 8 + 35 = 44 kB

The file sizes and the needs for download are complicated to calculate when using frames, as the index.html and contents.html files always must be downloaded. The file index.html is invisible; its only task is to manage the table of contents and the text.

English Book: Downes and Mui: "Unleashing the Killer App"

   (link to Amazon.com)

Larry Downes and Chunka Mui published the book "Unleashing the Killer App – Digital Strategies for Market Dominance" in May 1998. The text is on the possibilities and risks of the internet, and is propagating the advantages of free download etc. Later, the authors have taken their own medicine – they published their work for free reading at http://www.killer-apps.com/.

Unfortunately, the layout is very difficult to read:

  • The home page contains a lot of black graphics and blue headlines on black background, etc.
  • The text is shown in a tiny window (40 percent of the screen size of an 800 x 600 screen, 55 percent of a 1024 x 768 screen)

I thought: "I could do that better", and here is the result:

English Book: Killer App

Inline Frames

Ordinary Frames

HTML text file size

462 kB (15 files)

459 kB (15 files)

HTML / doc relative file size

462 /658 = 70%

459 /658 = 70%

Size of index.html and TOC.html

1: none

2: 16 kB

1: 1 kB

2: 16 kB

Average text file size

462 /15 = 31 kB

459 /15 = 31 kB

Average need to download

0 + 16 + 31 = 47 kB

1 + 16 + 31 = 48 kB

Set-up in MS Word

Headlines converted to lowercase as in the book, numbering shown as in the on-line version. I omitted the numbering of the principles so that the numbering is as follows: 4.2: Outsource to the Customer, 4.3: Cannibalize your Markets, etc.

Section 3.7: Caution: Value Chains under Extreme Pressure: This portion of the book is featured in the May 15th issue of CIO Enterprise Magazine only. Five paragraphs are missing in the online version.

I added index (firms only) and external URL links, and I added hyperlinks between parts and chapters.

References to table 3.1, etc. are kept, even though there are no tables in the on-line version. References to figure 2.A, etc are kept likewise.

Corrections after Conversion

  • Deleted the section numberings "Part 0: ", "1.2: ", "1.3: ", and "Chapter 11: "
  • Headline font changed from Arial, Helvetica to Courier New, Courier
  • Added "Buy this book"-gif in the TOC.html-files and in all noframes-files
  • Inline frame files changed like the description for example 1 above.

How to Edit for HTML Publishing

Any text meant for internet use must be well edited:

  • Easily readable – using as few words as possible to express the meaning
  • With informative headings based on the Word styles "Heading 1", Heading 2", etc.
  • Concise, with scarce use of irony, etc.

Read on at useit.com:

Frames or no Frames

Frames are a good thing and a bad thing. When using frames, you have the following benefits:

  • A good overview of a document of 100 pages
  • Easy readable lines with less than 80 characters
  • Easy generation of Table of Contents (TOC) and index

On the other hand there are concerns as well:

  • The browser cannot respond to arrow down, page down, etc. until you click in left or right part of the screen
  • The browser don’t know which part of the screen to print until you click in left or right part of the screen
  • Some browsers cannot show frames, and in Navigator 2 the Back key is not working

See Jakob Nielsen's Alertbox December 1996: Frames Suck Most of the Time.

The fearless reader may try Dr. Bandwidth's Seven Deadly Sins of the Internet, especially the very detailed "Any kind of frames"

When not using frames, there are the following benefits:

  • The user knows where he is, and the navigation keys are always working (Back, Down, Page Down, etc.)
  • The file can be shown on all screen sizes

Concerns by not using frames:

  • A size limit of approximately 10 pages, see below

File Sizes

There is a limit of approximately 10 printed pages (3,000 words) per HTML file. Larger files must be shortened or divided, else:

  • The user will loose his orientation, and
  • The file size will grow over 30 kB and result in too long download time through an ordinary modem

Creation of HTML Links

This section is based on the Solutionsoft text "Creating Links within your Publications"

Links Created in Word Processors

  • Internal hyperlinks as created in a word processor. The link may be created either manually or automatically.
  • MS Word offers good possibilities for inserting hyperlinks and "mail to:"

In MS Word, an index may be created automatically from a text file with the command Insert / Index and Tables / Automark / Open Automark File – and afterwards generated with the command Insert / Index and Tables / Enter.

Links Created in the Conversion Tool

  • WordToWeb create a table of content in the HTML file (or in the left frame) based on the Word styles "Heading 1", Heading 2", etc.
  • WordToWeb offers the possibility of creating cross references defined as "this link will lead you to another page containing one of the indexed word from this page" – not very useful, because you don’t know which of the indexed words it is.
  • WordToWeb can create a reference to another HTML file, that is to find certain words and create links to a explanation of the words. This is dependent of proper input.

Links Created in an HTML Editor

  • All HTML editors offer the possibility of inserting links.

All external links on this page are created in MS Word. All internal links have been created automatically by WordToWeb.

Choosing Tools for Conversion

The various tools are compared with two main criteria:

  • HTML file size around the 30 kB limit
  • Provides navigation tools in the HTML file(s)

This is not meant as a WordToWeb advertisement. Of cause there are other tools than MS Word, WordToWeb, and Adobe Acrobat, and maybe some tools offer easier HTML file download and better navigation tools. If you know another conversion tool that is both:

  • Easy to use, and
  • Not requires editing after conversion

– then please let me know!

Note on calculation: The "HTML figures file size" is the sum for all jpg and gif files.

I have tried to make a lot of HTML conversions with three documents. The "ordinary document" is the Danish example used throughout the site.

Three MS Word Documents Used for Analysis

Simple Document

Ordinary Document

With a lot of figures

Number of pages

87 pages

88 pages

64 pages

doc-file size

587 kB

1,061 kB

555 kB

Number of figures

0

6

46

.

MS Word 2000 Stand-alone

MS Word 2000 can work and save in HTML format and standard ".doc" format. Then why bother with any other tools?

Because:

  • Word 2000 is saving one doc-file as one HTML file, no matter how long it is. This is one reason for an HTML file to turn very large. There are no navigation tools, so the user will download one huge file and only be able to browse with Arrow Down, Arrow Up, Page Down, and Page Up!
  • Word 2000 tries to make the HTML file look like the printed doc-file – even when you don’t want it to. The program is doing this by saving a lot of information in the HTML file header. That makes the file larger and slower to download.
  • Word 2000 creates the Table of Contents (TOC) from the doc-file, but the content is not working with hyperlinks. Instead, the HTML TOC is showing the doc-file’s page numbers, which is not useful. The doc-file’s index has the same problem.

Word 2000 HTML File Sizes

Simple Document

Ordinary Document (910 kB)

With a lot of figures

Conversion time

< 1 minute

< 1 minute

1 minute

HTML text file size

1,048 kB (1 file)

910 kB (1 file)

1,154 kB (1 file)

HTML text + figures file size

1,048 kB

1,028 kB

1,464 kB

HTML / doc relative file size

1,048 / 587 = 178%

1,028 / 1,061 = 97%

1,464 / 555 = 264%

.

In the figure below the HTML text file size is compared to the 30 kB limit.

.

MS Word + WordToWeb

The WordToWeb tool is created by Solutionsoft, www.solutionsoft.com. It is built up in Visual Basic and has a lot of possible choices.

Here are extracts from an online manual on WordToWeb – created with WordToWeb only:

.

WordToWeb HTML File Sizes

Simple Document

Ordinary Document

With a lot of figures

Conversion time

3 minutes

9 minutes

4 minutes

HTML text file size

441 kB (47 files)

375 kB (14 files)

213 kB (10 files)

HTML text + figures file size

441 kB

475 kB

590 kB

HTML / doc relative file size

441 / 587 = 75%

475 / 1,061 = 45%

590 / 555 = 106%

Average text file size

9 kB

27 kB

21 kB

Average text file size with figures

9 kB

34 kB

59 kB

These examples are created without frames. The average text file sizes are all below the 30 kB limit.

.

Word 97 or Word 2000?

Solutionsoft recommends Word 97 as the basis for conversion instead of Word 2000. Word 2000's HTML support is designed to provide a very high level of fidelity to the original Word document. This is great if you want your HTML publication to be exactly like the Word document, but it can be a problem if you want a simple HTML formatting.

Here are some of the major changes in Word 2000 (as opposed to the HTML support in Word 97):

  • HTML is now a "native" Word format. This means that you can save a Word document in HTML format, modify it and then later resave it in .DOC format without losing any information. The obvious implication of this is that the HTML files created by Word need to include all the information contained in a .DOC file. Word documents are very complex. So as a result, the HTML files created by Word 2000 are also very complex. Much of the information required to maintain compatibility with the .DOC format is encoded in an XML header at the top of the page.
  • The HTML Files created by Word 2000 rely heavily on cascading style sheets (CSS). Unfortunately, Microsoft has not provided an option to save in a simpler HTML format without style sheets. This means that some HTML files created with Word 2000 may not display properly in older browsers. The style sheet support varies between Microsoft Internet Explorer and Netscape, even in the later versions.

If you cannot get acceptable results with Word 2000, you may wish to consider reverting to Word 97 – which uses a much simpler HTML converter. (Parts of this text stems from the help text for WordToWeb’s Word 2000 patch)

Adobe Acrobat for Print-out

Adobe Acrobat is used for creation of PDF files, which are showing a document like a word processor does in Print Preview mode.

There are two main advantages of using PDF files instead of offering download of Word files:

  • The file size is smaller
  • The user cannot change the file by accident

You can open the file with the Adobe Acrobat Reader program, which can be downloaded from http://www.adobe.com/products/acrobat/readstep.html (5.6 MB)

Adobe PDF File Sizes

Simple Document

Ordinary Document (377 kB)

With a lot of figures

Conversion time

1 minute

1 minute

1 minute

PDF file size

245 kB

377 kB

3,418 kB (sic!)

PDF/doc relative file size

245 / 587 = 42%

377 / 1,061 = 36%

3,418 / 555 = 616%