XML expert wanted to write converter

For all things Mellel

Moderators: redlers, Eyal Redler, Ori Redler

Post Reply
jpavel
New to all this
Posts: 4
Joined: Tue Aug 29, 2006 7:53 pm

Re: XML format docs/notes

Post by jpavel » Tue Sep 05, 2006 2:48 am

Thanks for the comments.
I'll read through a few of my documents to get a feel for the format, but wait for the official docs before attempting a cross-reference-er.
~Jesse

Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still » Fri Sep 08, 2006 3:57 am

Well, I'm at the stage where I'd kind of like some feedback, so I've whacked together an Applescript to make running an XSLT a little easier, and zipped it up with the latest version of my HTML exporter.

They can be found here.

If you run the Applescript it will prompt you in turn for three things:

(1) The file you want to transform;

(2) The transformation you want to apply;

(3) The save location.

It can handle both compressed and non-compressed files.

Let me know if you have any problems with either the transformation or the Applescript. Both probably require 10.3.9 at a minimum.

Things I haven't done yet:

- separate note streams (they're all lumped into one at the moment)
- image support
- a great deal of the list styling
- headers and footers (which probably will never be done, as I don't think they make sense in a HTML document)
- various other things.

I'm particularly interested to hear how people go with bidi text.

You should note that I'm attempting to go for full-fidelity reproduction, but using as much CSS as humanly possible. For obvious reasons, this means the CSS is a little bit… hefty.

Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still » Fri Sep 08, 2006 6:59 am

Maria wrote:Steven,

great effort, thanks.

I tried it with a table loaden document, works fine, some cells are not rendered correctly, pictures loose their path. But really good. I hope you go on that way...

Maria
It's good to hear someone likes it. :) I certainly intend to keep going with it.

Re. cell rendering, what exactly are the problems? Diagonals won't currently be rendered, because there's no equivalent functionality in HTML. I also haven't implemented the various line types (dotted, dashed etc). Was there anything else? If there was, would you mind giving me a reduction in a Mellel file (i.e., a file, just with the table in it)?

Re. images, my problem at the moment is that I have no way of working out the valid extension for the image from the XML — it simply says nothing about it, as I'm sure you've noticed. I think I'm going to have to post-process the file via Perl or similar to get images working. Or maybe I could pass the image names in as a parameter at run-time. Hmm. Any ideas would be appreciated. :)

Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still » Fri Sep 08, 2006 9:37 am

Hopefully the Redlers will integrate something like that into their Export option (though I hope for a separate CSS file). As long as they do not publish a Schema or so, they will be the ones who know the format the best.
Indeed. A schema/DTD/whatever would be nice... I'm also hoping for a nice extensible import/export package system like the Omni apps have. Something like:

Code: Select all

HTML.mellexport/
                      export.plist  -- giving name, version etc of exporter, plus name of script to run and parameters it can take
                      resources/   - scripts/stylesheets for the transform
                                       mellel2html.xsl
                                       export.pl
                                       etc.
                      images/
                                       images need for export support
                      etc.
So that export plugins have nice names, versioning etc, and can have pre- and post- processing scripts. Obviously this would need to be carefully documented.

On the subject of separate CSS files — that's certainly doable, I just haven't done it yet for the sake of simplicity.

nicka
Knows everything, can prove it
Posts: 673
Joined: Thu Oct 20, 2005 2:55 pm
Location: Oslo
Contact:

Post by nicka » Fri Sep 08, 2006 10:55 am

This is fantastic, really. Thank you!

The exported files look absolutely great.

It's really cool that auto-title level 1 gets mapped to h1, level 2 to h2 etc.
A question: in a test I did, auto-titles at equation and figure level come out as h1. Is this better than making them plain text?

It might be useful to have another version at some stage that produces really stripped-down html -- just heading levels, paragraphing, tables and numbered/bulleted lists -- and just throws away most formatting information.

Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still » Fri Sep 08, 2006 12:04 pm

nicka wrote:This is fantastic, really. Thank you!

The exported files look absolutely great.

It's really cool that auto-title level 1 gets mapped to h1, level 2 to h2 etc.
I thought it was pretty neat (semantic even!), although I haven't implemented numbering for them yet…
A question: in a test I did, auto-titles at equation and figure level come out as h1. Is this better than making them plain text?
Actually, I never use either function, so it hadn't even occurred to me.

I could just output them as normal paragraphs, I suppose. What do people think?
It might be useful to have another version at some stage that produces really stripped-down html -- just heading levels, paragraphing, tables and numbered/bulleted lists -- and just throws away most formatting information.
This is true. It would also be pretty simple to achieve.

Having said that, the vast majority of the styling information lives in the CSS declaration in the head, so you can just chop it out if you like. The exceptions are inline direction and character overrides, which have to go in inline style attributes, unfortunately. If IE6 supported multiple classes on each element, I could avoid that, alas.

nicka
Knows everything, can prove it
Posts: 673
Joined: Thu Oct 20, 2005 2:55 pm
Location: Oslo
Contact:

Post by nicka » Fri Sep 08, 2006 12:32 pm

I could just output them as normal paragraphs, I suppose. What do people think?
I wonder what you are 'supposed' to do in HTML. Apparently there's a caption tag for tables, but not for images. You can use figure auto-titles with either in Mellel, or just floating freely in the text, for that matter, so probably caption tag is useless here.

I guess we want a a principled, semantic solution. How about <p class=figure>, <p class=equation> and so on? That way they will just look like ordinary paragraphs unless someone wants to style them with css, in which case they can.

Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still » Fri Sep 08, 2006 12:55 pm

nicka wrote:
I could just output them as normal paragraphs, I suppose. What do people think?
I wonder what you are 'supposed' to do in HTML. Apparently there's a caption tag for tables, but not for images.
You're quite right. I had forgotten about that.
You can use figure auto-titles with either in Mellel, or just floating freely in the text, for that matter, so probably caption tag is useless here.
Not really, because I can at least try to make an intelligent guess as to what the figure relates to based on the elements surrounding it (e.g. if it comes right after a table, or right after an image). This won't work 100% of the time, but it might be good enough.
I guess we want a a principled, semantic solution.
I agree, generally speaking.
How about <p class=figure>, <p class=equation> and so on? That way they will just look like ordinary paragraphs unless someone wants to style them with css, in which case they can.
That could work, although generally I would have applied the paragraph class of the parent paragraph to the autotitle (so that it got the correct line-spacing and margins). Otherwise I would have to add the paragraph attributes through an inline style attribute, which isn't really the best solution.

We could go down the multiple class route, however, and say <P class="figure ps-0"> so that it got the style information, as well as some semantic content. Obviously in a totally bare stripped down HTML we could just go for class="figure", however. IE6 will ignore the first class, but that's alright, since it won't actually have any CSS selectors associated with it in the first instance.

d_h
Got the auto-title mojo working
Posts: 23
Joined: Fri Oct 21, 2005 2:33 am

Post by d_h » Thu Sep 14, 2006 3:41 am

Reiner wrote:please don't laugh but has anyone already written a xsl-file for the conversion from Mellel to WordPerfect for DOS 5.1 or for Windows 6.0? Important for me would be that styles would stay in place.
I noticed, the other day, the wordperfect to html utility - part of
http://libwpd.sourceforge.net/index.html

Which suggests it may be easier than it seemed to go WordPerfect to Mellel

BHD
Got the styles thing figured out
Posts: 5
Joined: Wed Nov 02, 2005 5:47 pm

Post by BHD » Sat Sep 16, 2006 5:07 pm

aechallu wrote:I was just thinking about a sequence to have some sort of bibtex support taking advantage of the new xml format.

1) The user adds all citations as \cite{key} statements.

2) A utility extracts the \cite statements and pastes them into a regular tex file structured in way like:

Code: Select all

TEXCitationSection
key \cite{key}
...
TEXBibliographySection
\bibliography{file}
3) the utility compiles the tex files (four times of course :-); and transforms the result to HTML

4) The utility transforms the HTML formatting to MeXML formatting. (For instance all the <i></i> into Character Variation 3...). This could be messy, because I think that the Latex2HTML exporter doesn't produce very clean HTML.

5) By using the key (first word in each paragraph of the citation section), the program looks up each \cite{key} statement in the mellel file and replaces them with \nocite{key} and the formatted citation.

6) The utility appends the TEXBibliographySection (reformatted into MeXML) at the end of the file.
Yikes; you have this all backwards. Way more complicated than it needs to be.

I'm not sure how Mellel stores citations either internally or in the XML, but you would expect that they would have dedicated citation structure that fill the same role as the BibTeX cite command. (and BTW, any Apple user who cares about this should ask Apple to standardize citation encoding in Cocoa so that we can have interop across applications here)

So there's really no need for BibTeX. Just use Bookends if you want a ready-made solution, or wait just a little bit for the new (free) Firefox extension Zotero, and write a little script to hook it up the Mellel. They have a new citation formatting engine, written in Javascript, and using a new XML citation style language (http://xbiblio.sf.net/csl/).

Stephen Still
Knows everything, can prove it
Posts: 113
Joined: Thu Oct 20, 2005 2:59 am

Post by Stephen Still » Sun Sep 17, 2006 4:03 am

So, I'm still working away on my Mellel -> HTML XSLT. At the moment, I'm working on adding support for numbering styles in notes and note references which aren't natively supported by xsltproc, namely: asterisk, symbol, hebrew, arabic-indic, persian, greek, and greek-academy. Arabic-indic and Greek-academy are under control, but I do have some questions about some of the other numbering types:

(1) Looking at Hebrew numbering, I've been working off this page. I note that there are exceptions to the normal numbering scheme for the numbers 15 and 16. Are there any other exceptions that I should be aware of? Also, does Mellel use the newer symbols for 500-900?

(2) Greek (non-academy) numbering. How is this different to Greek Academy?

TIA,

Stephen.

matthias
Knows everything, can prove it
Posts: 131
Joined: Thu Oct 27, 2005 4:12 pm
Location: Berlin, Germany

Post by matthias » Sun Sep 17, 2006 9:18 pm

Stephen Still wrote: (2) Greek (non-academy) numbering. How is this different to Greek Academy?
Greek numbering has an accent on the letters. Academic numbering has not, it is just plain greek letters. You can see the difference in Mellel's auto-flow numbering scheme pop up list.

Matthias

Post Reply

Who is online

Users browsing this forum: No registered users and 8 guests