It’s all very well to state that the text buffer code needs to be aware of logical structure beyond the character, but that still leaves the task of deciding which structures to recognize and implement. Characters, words, and paragraphs are the three obvious candidate.
Characters are tricky. On the one hand, I need them around so I know what to draw. On the other hand, they are something of an obstacle to a more coherent structural model. Which of the possible styles, if any, can legitimately be applied to characters? Certainly not indentation or justification.
You could make a case for the type settings, such as size, font and color, but varying any of those mid-word is likely to result in an unreadable mess. I could disallow it on the basis of aesthetics, but fortunately the Glk API guarantees that within the context of any single output call, all text will share the same style.
This is manifestly not true when dealing with HTML TADS, which dumps a set of tags into the buffer every time. But there the tags rather than the function calls provide the style boundaries. As long as I never mix characters from different calls or different tags into the same stylistic unit, I can safely ignore attributes at the character level.
On to words. Although indentation and justification still do not apply, all of the type settings do. It’s also important for the line breaking code to take words into account, as well as non-breaking characters and hyphens. Those can be treated as parts of words and word boundaries, respectively.
Whitespace – the stuff between words – deserves some special attention. Gargoyle features whitespace trimming. Spaces after a full stop will be tweaked. Runs of consecutive whitespace will generally be collapsed, unless the attributes suggest some formatting significance beyond mere spacing. My plan thought is to treat formatted spaces as words, in whole or part, and otherwise to eliminate whitespace altogether, handling it when deciding how to lay out a given line.
Font size presents a special headache. For height calculations, a line needs to be at least as high as the font size of its largest word. Paragraph height becomes a function of the number of lines; lines are constrained by the horizontal space available, and I need enough of them to display all the words in the paragraph. If words within the paragraph can have different heights, I have to add each line height separately. I may also have to adjust the line metrics above and below to make room for ascenders and descenders: an ugly business.
It’s tempting to decree that all words within a paragraph shall have the same height, as determined by the font size in effect when the first word is added to the paragraph. So tempting that this is in fact what I will do, absent some compelling reasons to reconsider. The FONT tag may force my hand, but as long as the rest of the foundation is solid, it should be straightforward to revisit the decision.
Otherwise, words take on the style attributes in effect at the time I demarcate them, as part of the process of printing text into the buffer. Words will never be combined across function calls or between HTML tags; once I’ve parsed a given round of output, I’ve got a complete set of words in hand. The characters making up those words will not change but the attributes might; both the Z-Machine and HTML TADS can retroactively apply colors to existing text. Something to bear in mind; not something to explore in great detail at this juncture.
Last but not least: the paragraph. Whatever the formal definition, I use the term to mean “all the words between two newline characters.” Here the indentation and justification styles become relevant, and the orphaned font size finds a home at last.
Paragraph indentation specifies the whitespace before the first word on the first line, while indentation covers the whitespace before the first word on every line. Spacing between words depends on the justification style. For center, flush left, and flush right, whitespace is synthesized between words: zero after a hyphen; one after a word; a configured number after a full stop. For justified text, additional spaces are distributed throughout the line to align it with the left and right margins.
If Glk CSS adds DIV and SPAN to the mix, these will constitute paragraph style entities. HTML tables do as well, though they pose a unique layout challenge. Depending on how it breaks down, I may need a structural element above the paragraph level to keep the code from getting bogged down in special cases.
Tomorrow I will work on the layout algorithm, and see if it still makes sense as high level code.