Mellel HTML export — experimental XSLT stylesheet

For all things Mellel

Moderators: Eyal Redler, redlers, Ori Redler

Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Mellel HTML export — experimental XSLT stylesheet

Post by Stephen Still »

Rather than continue to lead the main XML exporter thread down the garden path, I'm opening up a new thread on my HTML export XSLT for Mellel.

For those who don't know, I've been playing around with learning XSLT by creating a HTML exporter for Mellel. It is by no means perfect yet, nor will it ever be given the limitations of CSS2.1 and XHTML 1, but it's getting to the point of Not Being So Very Bad.™

You can find it here.

To transform a Mellel 2.1+ document to HTML:

Fire up the included Mellel HTML exporter script, then:
(1) Choose a Mellel file;
(2) Choose the mellel2html.xsl transformation;
(3) Choose a save location.

Or, in the Terminal:

xsltproc mellel2html.xsl [file to convert/main.xml] > [savefile]

Comments, bug reports and patches would be most appreciated. If you can accompany bug reports with sample documents for me to test the rendering on, that would be excellent.

Since the last posted version, I've done a lot of work on:

- tables: should now support multicolumn and multirow table cells, and will preserve more of the size of the cell as set up in Mellel;
- background colours;
- "exotic" numbering sets: i.e. numbering styles not supported natively by HTML or xsltproc, including arabic-indic, greek-academy, persian, and hebrew. Support for these may not be perfect, because I'm not an expert in any of the languages associated with these;
- lists are a bit less broken than they were;
- Autotitles should now have their formatting (numbers, full-stops etc) respected, with the exception of document variables in the autotitles.
- more support for list and reference number styling ([], 1., etc.)
- note streams are now separated out from each other, and numbered independently of one another.

Images are still broken. I need to add support for pre- and post-processing to my XSLT runner before I can get this to work. The HTML is also a little invalid in places. Rest assured that I'm aware of and working on this.
donb
Knows everything, can prove it
Posts: 326
Joined: Thu Oct 20, 2005 7:43 am

Post by donb »

I've tried your html app. and had varying results from it.
Unfortunately, since I do not have your email address, I cannot send you screen shots to show the problems.

Don Broadribb

donbroadribb@optusnet.com.au
Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still »

Hi Don,

I've emailed you. If anybody else feels like emailing me, my email address is firstname@lastname.id.au (replace firstname and lastname, obviously).

-S
nvalvo
Read the guide, knows everything
Posts: 50
Joined: Mon Nov 14, 2005 10:08 am
Location: The Train between Davis and San Francisco

Post by nvalvo »

Interesting. Thanks for working on this. I learned a fair amount poking through your script and xsl thingy, too.
Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still »

nvalvo wrote:Interesting. Thanks for working on this. I learned a fair amount poking through your script and xsl thingy, too.
I learned a fair amount writing it — it's my first XSLT experience! So if I were you, I would _definitely_ not assume that any solution I've used is necessarily the most optimal one for any particular problem. I'm quite proud of my numbering support though, especially persian/arabic-indic, which I think I solved quite elegantly. :)
SteveH
Read the guide!
Posts: 36
Joined: Thu Oct 20, 2005 8:21 pm
Location: Edinburgh, Scotland

Post by SteveH »

Thanks for that, it was very useful.

I have a mellel document that I always wanted to convert to HTML to avoid the need to use PDF files.

Once converted, I just needed to extract a the image files from the Mellel package, convert them and then a little bit of hacking within the html to have the finished doc. Easy.

Interestingly the finished HTML document does not print, or rather it is a single blank document.
Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still »

Well, the CSS is a bit messy, and parts of it use bits of the CSS spec which aren't implemented in all browsers, but it should print... I'm not sure why it wouldn't. Possibly I could hack around it by adding an @media directive to enforce a simple print CSS.

Hmm. I've been meaning to do some more work on this. I may get around to testing printing.

I hope you didn't find the generated HTML too painful!
esland
Got the styles thing figured out
Posts: 7
Joined: Mon Feb 13, 2006 4:11 pm

Post by esland »

Hi Stephen, many thanks for your HTML exporter. Please allow me to make a few comments.

The thing I like about your exporter is that it produces ready for the web documents in a flash. Great stuff. The problem I have with the files your xml exporter outputs, though, is that the CSS contains a lot of stuff I don't need. Also, the spans surrounding each paragraph causes the document to be difficult to reformat. Considering this, it struck me that what your translator does, essentially, is translating a document that looks nice in Mellel, to a document that looks as similar as possible in HTML.

Now I applaud the result you have achieved, but here is why I, and maybe others as well, do not appreciate it as much as I would like: I am not aiming at converting the look of a document in Mellel to a similar look of that document in HTML. I would like to write text in Mellel, and then transfer that text to HTML where I want to style the text conform the rest of my website and not conform its appearance in Mellel. I have set up Mellel to be easy to write, with a font that looks beautiful on my Mac at 126% in full screen but which just doesn't render well in a browser on a Windows PC.

I think there is a marked difference in having a nice and comfortable writing environment in Mellel on a Mac and the way we have to format CSS to ensure that our text renders well on a Windows PC.

Now I understand that the way to overcome this would be by formatting my writing environment in Mellel to match the target formatting, but one of the joys of working on a Mac is that we can ignore Times and Georgia and Arial. Therefore, I do not want to limit my options in Mellel to the restrictions of Windows.

So I guess what I would look for, is an exporter that simply gives us paragraphs in a p class=p1 tag (and differently formatted paragraphs in a p2 and p3 class), bold in strong tags, italic in em tags and headings in h1 class is h1, h2 class = h2 tags. Then it would be simple to write a CSS stylesheet to match the output to an existing website. Ideally, I would like to write text in Mellel, then export it so that it retains bold, italic, underline, headline and paragraph identifyers (but no styling), and copy - paste the result into a field in a MySQL database. This would allow maximum flexibility to style the text according the any website it appears on.

Is there any chance you may consider this approach? Or could you maybe explain what parts of the XML translator to delete to get rid of the spans around paragraphs and to do away with the font styling in the spans around bold and italic?

Many thanks.
Robert
joewiz
Knows everything, can prove it
Posts: 199
Joined: Sun Oct 23, 2005 9:42 pm

Post by joewiz »

Seeing this thread bumped up by the last post made me take another look at Stephen's XSLT translator. It's a really nice piece of work, and particularly it's very faithful to the original Mellel document. I just have two comments:

1. If the Mellel document itself contains HTML or XML tags, they aren't preserved as such in the XSLT's output. I ran a document I had through the translator; the document was a simple introduction to XML for non-techies, and the XML tags were missing in the HTML version. Of course they were a part of the HTML document, but they were ignored. Potentially this could even cause problems, right? Is there a workaround? Mellel itself has figured out a way to deal with 'faux tags' in Mellel documents.

2. Images aren't automatically moved. This isn't a big deal, but it's probably something that a script could handle pretty easily.
Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still »

Hi guys,

Good to see that people are actually using this :)

On your points:

(1) Faithfulness vs. clean markup: you're right, the initial intention was to make the document as faithful to the Mellel original as possible. This was largely because I was using the translator as a vehicle for improving my CSS and XSLT knowledge. It would be really good to have an option to just output clean, CSS-free markup, and probably not all that hard - I could just have a command-line switch which turns off most of the markup-producing stuff. I have been tempted to do this on a number of occasions, but I just haven't gotten around to it. On the subject of spans, they're inserted into the document as a proxy for the character-style tag, because of Mellel's spit between character and paragraph styling. If we accept that we are not going to use any of the Mellel styling, and only style things on a paragraph level, they can be dispensed with. Unfortunately, if you've been using character styles to do italics and bold, they would then disappear.

(2) Images: Yep, this could be done with a relatively simple script (probably about 4 lines, in fact). I just haven't had the time to get around to doing it. :)

(3) Embedded XML tags: can you send me a text document? I must admit that I've overlooked this as an issue…

Thanks for your input guys!

-S
esland
Got the styles thing figured out
Posts: 7
Joined: Mon Feb 13, 2006 4:11 pm

Post by esland »

Hello Stephen,

The option to produce CSS-free markup is exacly what I meant and would be terrific! As regards the italics and bold, I understand they would be lost if they had been formatted as character style. But would there be a way to retain just the b and i tags if I had inserted them using Command-b and Command-i during writing? Also, would header indicators be possible from Mellel's outline?

Anyway, an option to have markup from Mellel without the spans for the styling is an exciting prospect. Thank you for considering it.

Best regards,
Robert
Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still »

esland wrote:But would there be a way to retain just the b and i tags if I had inserted them using Command-b and Command-i during writing? Also, would header indicators be possible from Mellel's outline?
Mellel implements Cmd-B and Cmd-I as character-override elements, with the content of the override defined in the document header. So b, i, and a b nested within an i would be three different character-override classes.

What you could do is when you hit a character-override element, sniff for whether that override includes b or i, and then write them out as strong or em tags respectively.
Anyway, an option to have markup from Mellel without the spans for the styling is an exciting prospect. Thank you for considering it.
It was definitely something I wanted to do from day 1 — I just got a bit obsessed with fidelity as my first goal. :)

-S
esland
Got the styles thing figured out
Posts: 7
Joined: Mon Feb 13, 2006 4:11 pm

Post by esland »

Hi Stephen,

I've just looked long and hard at your mellel2html.xsl file with the intention of maybe getting started on editing out the spans, but I found it hopelessly difficult to understand all the tags you've used in there. So I've decided to dedicate myself to cheering you on while you are working behind the screens to achieve what you had wanted to do from day 1. :)

Thanks,

Robert
Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still »

esland wrote:Hi Stephen,

I've just looked long and hard at your mellel2html.xsl file with the intention of maybe getting started on editing out the spans, but I found it hopelessly difficult to understand all the tags you've used in there. So I've decided to dedicate myself to cheering you on while you are working behind the screens to achieve what you had wanted to do from day 1. :)
Yep, it's a pretty scary XSLT. It was over 1600 lines last time I checked. :)

Turning off the styling won't be hard. Off the top of my head, you should be able to do it in a quick and dirty way by deleting lines 28-36 (the style tag in the header, and all its contents) and deleting lines 368-394 (underneath the comment which says "Character spans" - inside the template match="p/c") and replacing them with a line saying with <xsl:apply-templates/>. (Note: I haven't tested that, so I could be wrong!) That won't remove any of the various class attributes scattered around the place, by the way, or the "character overrides".

I'm particularly proud of lines 1399-1430.
Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still »

OK, I've done it. I've added a command line parameter, StyleOff. If you set it to anything other than 0, the vast majority (if not all) style information will be stripped.

So, to use from the command line, here is what I would do on my test file:

Meretrix:~ stephen$ xsltproc --param StyleOff 1 ~/workingsource/mellel2html/mellel2html.xsl /Users/stephen/Desktop/Completely\ finished\ Latin\ thesis\ copy.mellel/main.xml > test.html;open test.html

I've added it to subversion (see http://cinaedulus.org/svn/mellel2html/) for the impatient, I'll package it up sometime tomorrow and upload it.

A diff of rev 59 against 60 is instructive. This actually added about 60 lines to the stylesheet.
Post Reply