Feature Request: True multi-lingual spell check.

Feature requests, and in-depth discussions of features and the way Mellel works

Moderators: Eyal Redler, redlers, Ori Redler

What do we want? Multi-lingual spellcheck! When do we want it?

Now!
23
42%
Soon!
13
24%
Sometime, maybe!
17
31%
Huh?
2
4%
 
Total votes: 55

TLS
Read the guide!
Posts: 42
Joined: Sat Oct 22, 2005 1:48 pm

Post by TLS »

rpcameron wrote:I feel that OpenType language settings should be linked to a language meta-style if Mellel precedes in that direction.
I strongly agree. There are a host of similar issues related to language that OpenType is responsible for handling depending on the language and the script, just not in English.

One thing I do not want to see is the mandatory use of a given language's typographically appropriate quotes. Just because I have a sentence in French, and want proper French hyphenation, I do not want to be forced to use «angle quotes» with it.
transalpin
Read the guide!
Posts: 44
Joined: Tue May 30, 2006 1:42 pm

Post by transalpin »

TLS wrote:There are a host of similar issues related to language that OpenType is responsible for handling depending on the language and the script, just not in English.

One thing I do not want to see is the mandatory use of a given language's typographically appropriate quotes. Just because I have a sentence in French, and want proper French hyphenation, I do not want to be forced to use «angle quotes» with it.
Neither do I, as I prefer manual input of (typographer’s) quotation marks.
Probably, none of the four language related features should be mandatory.
I guess, the language palette mockup posted earlier shows quite well how this could be solved:
Image

Select the Typographer’s Quotes you like or set to “none”. New schemes can be added in the Preferences.

Select the language for OpenType features or “none”.
(edit: updated img url)
Last edited by transalpin on Tue Feb 03, 2009 5:34 am, edited 1 time in total.
Mart°n
Knows everything, can prove it
Posts: 672
Joined: Fri Oct 21, 2005 2:09 am
Location: Germany

Re: What does a "Language Style" actually affect?

Post by Mart°n »

joewiz wrote: But I lost you on this last point. If Language comes between Character and Paragraph, how could you "attach" it to a Section Style? Are there language-specific aspects of section definitions? (I can see how there would be for List Styles.)
The ideas why I thought of attaching the Language Style to a Section- or Paragraph Style are:

• Section Style

If you change the Section Style and therefore change the number of columns, you may also change the hyphenation settings, defined in the Language Style. While specific hyphenation settings could be used in most cases, you may adjust them as your number of columns changes.
You could – for example – create a setup that seldom hyphenates a word (large hyphenation zone and low hyphenation limit) and therefore doesn’t create much hyphenation dashes (-) at the end of the line. This could be enhance the readability of your text if you use wide or a single column.
But if you switch to a 5 column layout, you have to create a hyphenation setup that hyphenates as much as possible (low hyphenation zone and high hyphenation limit) to achieve a decent layout.
So because a specific Section Style may require its own hyphenation settings, it may be a good idea to attach a specific language style to the Section Style in order to change the hyphenation settings automatically while you change the Section Style.

• Paragraph Style:

The reason to attach a Language Style to a Paragraph Style is simply that you could define a default language for your work or most paragraphs of it. A “default Language Style” inside Mellel’s preferences would do the same, so this also isn’t a necessary feature. I initially thought of this, because the current hyphenation settings are inside the Paragraph Style and this may have been done for a specific reason. If you could link a Language Style to a Paragraph Style you have the same possibility of changing the hyphenation settings by changing a Paragraph Style as you could do now.

In summary, there’s no language specific settings that affect sections or paragraphs, it’s only the hyphenation-settings that may be changed along with a specific section or paragraph.
joewiz wrote: Second, let's assume that Language Style determines 'typographers quote' styles. If I apply a new Language Style to a region of text, will the quotes already in that section be 'updated'?
That’s a really good question. I think it may be possible to change the typographer’s quotes in text that you’ve written with the typographer’s quotes option already set up and turned on. If that is the case, you (or Mellel) have inserted a defined character for “open quotation mark” and “close quotation mark”. Mellel could then change the used character if you change the Language Style. As discussed already in an other lengthy topic, the same would hardly be possible with text that you’ve copied from somewhere else and if those text contains straight quotes. In this case the same glyph is used for the open and close quotation mark and it is not possible to convert those and guarantee 100% perfect results at the same time.

This leaves the question if such a thing should be built in (and make those happy who write every single word by themselvels) or leave it as it is (and avoid confusion to all those people that copy a lot text from other sources and wonder why only some of their quotation marks change when they switch to another Language Style and then cry out: “it’s a bug!”).
joewiz wrote:
It seems like that could be tricky.
Indeed.
Mart°n
Knows everything, can prove it
Posts: 672
Joined: Fri Oct 21, 2005 2:09 am
Location: Germany

Post by Mart°n »

transalpin wrote:Keep it simple and stupid!
My thought was to keep it a working and hassle free solution. If it is stupid, you have to do work, the computer is made for and that’s not what I like to do. On the other hand, I think it is simple and elegant to use, even if one may not get the idea first (which does not mean that it really is simple or that I’m right)
transalpin wrote: A new language style between Paragraph (= block) and Character (= inline) Level is likely to cause confusion among users.
I think the user won’t notice that the Language Style is between the Paragraph and Character Level as it is not necessarily there. You simply select a language setting and your text is tagged with that, no matter where the Language Layer is positioned in the hierarchy.
transalpin wrote: Where in the document would one find the physical unit of such a “language” level? Is it a “word level”?
What about CJK languages which don’t separate words with white-spaces? A Japanese word could very well appear in a Chinese text!
That’s a very good point I haven’t thought of. Thanks for bringing this ball into the game. Unfortunately I don’t speak or write one of those languages so I haven’t the insight to post any useful comment about this but one thing. I’ve looked a the Mac OS spelling dictionaries and Mellel hyphenation dictionaries and haven’t found one of the CJK languages here so my assumption is that neither spell checking nor hyphenation is used for those languages (which is clear to me for traditional chinese with it’s picture-based writing system but I don’t know about the others). So I might think that the whole point of this discussion (true multi-lingual spell checking) doesn’t affect those languages and therefore one doesn’t have to think about them in this case.
As said above, I don’t know enought about those languages, so would be happy to get enlightened by someone more knowledgeable.
transalpin wrote: Let’s assume, you want to use a number of French terms or expressions in an otherwise German essay. You might want to assign the French spelling and hyphenation dictionaries to these parts of the text. But beyond this functional markup you would also want the presentational distinction of an italic font and French OpenType features.
If you intersperse some non-latin words now, it would become really complicated.
Now it’s getting interesting. You might want to highlight those french or non latin words in the german essay and you could do that by changing the character style or variation. In my case, I usually don’t want to highlight foreign words but I like the spellchecker to check them with the appropriate dictionary.
If you put the language settings into the Character Style or variations you mix up two levels of markup that really shouldn’t be blended. A Character Style defines how a character looks, nothing less and nothing more. A Language Style defines a non visible attribute of a word. This is a completely different semantic level.
If you mix those two together, a user might get the idea that a word from a different language also have to look different but this is not the case.

If you would like to make those words look differently, you could assign a Character Style to a Language Style, so every time you change the language, the optical appearance also changes. But this is a can and not a must.
transalpin wrote: I think these examples above make quite clear that the character level is the place where the language should be defined, because language is strongly related to font, size, OpenType features and other means of character representation.
I don’t think so. Let’s look at the topic again: true multi-lingual spell checking. What most of the people (a guess) would like to see, is a spell checker, that uses the right dictionary for a given text. During the discussion, other needs have been brought into the game, mostly hyphenation and Typographer’s quotation marks. All those wishes rely on a word. The spell checker looks at a word, looks into his dictionary and defines the word as wrong or right. The hyphenation option does (nearly) the same but also uses some general rules to hyphenate the word at the right position.
If you put the language settings at the character level, you easily could create wired words:

red=english | green = german

wonderful

by accidentally switching to another dictionary or by some copy | cut | paste operations where you cut out a word or a part of a word (or a sentence) at the position where the correct language change have been.

Maybe you’ve also seen the effect, that if you work with a heavy styled text, you coudn’t be sure, what result you get, if you insert a word after a bold one. Will the bold style be applied or the regular of the following word?
With such a “wonderful” word as shown above, both the spell checker and the hyphenation option would give wrong results. If you use typographer’s quotes, you may get a result like “wonderful« which looks terribly.

As a language is a non visible style, youl couldn’t easily see which language has been applied to character. Would you really like to check a document by jumping from one character to the next via the arrow keys and look at the Language Palette only to find out if you have one of those mixed words in your document that will be hyphenated with the wrong rule?

I think every of the language options (spell checker, hyphenation, quotation marks, OpenType settings) only make sense if they could be applied to one word. That’s what I think is simple and useful. It only has (in my – maybe limited – point of view) benefits.

To make clear what I think of, I’ve created two more pictures:

Image

Instead of adjusting the spelling dictionary, hyphenation settings and other things evey time you switch the language via single drop down menus, I think it would be easier to simply click on a pre-defined language setup that sets all options you like to use for a specific word. With this palette you could not only see which language you’ve set for the word your cursor is at but you could easily switch between them with a single click (or via a keyboard shortcut). I think that’s simple, maybe I’m wrong here.
You could create different setups of one language (as done with French in the picture) so it would be easy to use different quotation marks, OpenType options, dictionaries… if one likes to. You also could switch off the flag image because some may find the spot of color to grab to much attention.

If you edit one of the Language Styles, you may edit all the details in the following window:

Image

That’s not so simple anymore but very powerful.

Within the First “Language” area, you could set the Name of the Style (which is only shown to show that you could change the name like of all other styles),
the OpenType Language
the Icon (or none) that will be shown in the Palette
a Keyboard setup (the keyboard will change if you change the Language Style and the Language Style will be changed if you change the keyboard via Command+Space)
and a Keyboard-Shortcut to activate the Language Style

The second area “Hyphenation” shows the hyphenation settings that are currently available within the Paragraph Style.

The third area “OpenType Options” shows exactly them

On the right side, you could set “Dictionaries” for spell checking and (currently not available) thesaurus

The next section “Typographer’s Quotes” offers the options you could find in Mellel’s preferences at the moment

The same is true for the “Decimal Tab”. I think it also should change with the current language, as different countries use different decimal tabs.

The last section “Style Options” allows you to associate a Character Style and a Variation that would automatically be used if you select this style via the Palette above.

One last thing (that is not shown in the picture) is a background highlight color. You could set a color (red) that you could show via the Show menu at the bottom border of the Mellel window. If you use a different background color for every language and display those colors, you could easily find any foreign word in your document (a screen only option).

That’s what I have in mind, now you could lacerate me.
Mart°n
Knows everything, can prove it
Posts: 672
Joined: Fri Oct 21, 2005 2:09 am
Location: Germany

Post by Mart°n »

TLS wrote: One thing I do not want to see is the mandatory use of a given language's typographically appropriate quotes. Just because I have a sentence in French, and want proper French hyphenation, I do not want to be forced to use «angle quotes» with it.
As shown in my last post, you could edit a Language Style to your specific needs. You could create a “French” Style with “curly” or «angle» quotation marks as you like it.
transalpin
Read the guide!
Posts: 44
Joined: Tue May 30, 2006 1:42 pm

Post by transalpin »

To me, the facility of ad-hoc changes via the language palette is indispensable!
I don’t want to be forced to add a new style for each and every modification, neither would I want to apply them to existing styles.
This is something I miss a lot in the current implementation of hyphenation.
Ori Redler
One of the boys
Posts: 342
Joined: Wed Oct 19, 2005 11:45 pm
Contact:

Post by Ori Redler »

From the various suggestions here it seems obvious that most of you agree that the language options should be somehow part of the character style. I would ignore here some of the minor considerations (e.g., master-setting all language options) and would like to comment on each item here:

A. Hyphenation: that would mean removing this from the paragraph style and into the character style. The main problem I see here is that Hyphenation is, properly, a paragraph setting, as it deals with how text is laid out in a paragraph, rather then how a character or a word is laid out. This can be solved, of course.

B. Typographer's Quotes: This is now an on-the-fly setting, that happens while-you-type, and the suggestion is to make it a character style option. This is also problematic, and for similar reasons, but less so. Practically, that would mean that the type of typographer's quote (", <<, “, etc.) would be selected based on the the character style, while the mechanism in general (i.e., determining whether we should have open or close quotation marks) will not be part of the style but will be determined based on the existing setting.

C. Spelling: means that the spelling dictionary will change depending on the current character style.

The problem with doing this with variations too is that it's, well, very problematic. For example, if you have in a word two variations, you would not be able to hyphenate or spell it unless you determine arbitrarily to which variation this belongs.

Setting the language option to a new "level" between paragraph and character, as Mart°n suggests, is confusing. Obviously, there is no such level in terms of hierarchy, so the setting will ultimately be a "fake" one -- posing as a separate level, but actually belonging to either the paragraph level setting (which none here wants) or the character level setting.
Ori Redler from RedleX
transalpin
Read the guide!
Posts: 44
Joined: Tue May 30, 2006 1:42 pm

Post by transalpin »

Thank you, Ori!
Ori Redler wrote:Hyphenation is, properly, a paragraph setting, as it deals with how text is laid out in a paragraph, rather then how a character or a word is laid out. This can be solved, of course.
I’m reading a hilarious novel about an Englishman in Paris. Such a book could never be written in Mellel: It is bilingual and uses hyphenation.
rpcameron
Knows everything, can prove it
Posts: 980
Joined: Wed Oct 26, 2005 12:48 am
Location: IE, CA, USA

Post by rpcameron »

transalpin wrote:Thank you, Ori!
Ori Redler wrote:Hyphenation is, properly, a paragraph setting, as it deals with how text is laid out in a paragraph, rather then how a character or a word is laid out. This can be solved, of course.
I’m reading a hilarious novel about an Englishman in Paris. Such a book could never be written in Mellel: It is bilingual and uses hyphenation.
Not to be pedantic, but you could have written the book in Mellel, but not have used automatic hyphenation; it would have to be hyphenated by hand. Also, most publishing houses have people the proof a manuscript to correct hyphenation, as well as index, &c.
— Robert Cameron
joewiz
Knows everything, can prove it
Posts: 199
Joined: Sun Oct 23, 2005 9:42 pm

Language: A skewer through the onion (of Styles)

Post by joewiz »

Ori Redler wrote:Setting the language option to a new "level" between paragraph and character, as Mart°n suggests, is confusing. Obviously, there is no such level in terms of hierarchy, so the setting will ultimately be a "fake" one -- posing as a separate level, but actually belonging to either the paragraph level setting (which none here wants) or the character level setting.
If we return to the central issue here of multilingual spell-checking, it is a fundamentally word-based operation. It is not characters and paragraphs that are spell-checked, but words. If this 'word-ness' is unique to spelling (and doesn't apply as well to hyphenation, opentype, quotes, etc), then let's call this Spelling Style. Otherwise, let's use a broader term like Language Style.

Regarding your point about hierarchy: If words can't fit into a hierarchy between character and paragraph, they're not alone, right? Lists, Auto-titles, and Notes, all combine paragraph and character settings but are not part of (or above/below/in-between) the character-paragraph hierarchy.

So why shouldn't the following solution be pursued: 'Spelling/Language Style' should be granted a meta-level status -- outside the 'hierarchy' if it can't be there, but still accessible to other settings that need to associate with it. For example, a character style could have an 'Associated Spelling/Language Style' which can be set or left undefined. 'Spelling/Language Style' itself should be set-able on an ad-hoc basis. (If hyphenation is 'properly' a paragraph-level function but is not exclusive to a language, per se, then hyphenation should be made its own 'style' which paragraphs associate with.) Still, let's keep our eyes on the multilingual spell-checking prize.

(Again, this isn't a huge thing for me, but I can see how this spell checking situation can make life very hard for some people.)
Mart°n
Knows everything, can prove it
Posts: 672
Joined: Fri Oct 21, 2005 2:09 am
Location: Germany

Post by Mart°n »

transalpin wrote:To me, the facility of ad-hoc changes via the language palette is indispensable!
I don’t want to be forced to add a new style for each and every modification, neither would I want to apply them to existing styles.
This is something I miss a lot in the current implementation of hyphenation.
The suggestions I’ve made doesn’t exclude a ad-hoc palette. As there is a ad-hoc palette for everything in the current version of Mellel, it would only make sense to add a language one.
The things that I’d like to know to get an idea of how others use or like to use languate settings are:

• What exactly do you plan to modify if you have a language style. In which situation do you need to change a given language setting?

• You miss the possibility to change hyphenation settings ad-hoc. In which case would you like to modify them, and why?
Mart°n
Knows everything, can prove it
Posts: 672
Joined: Fri Oct 21, 2005 2:09 am
Location: Germany

Post by Mart°n »

Abadsterjoted3 wrote:Shakira is having sex!
Great!

Ori, have you thought about improving the spam protection (more complicated fuzzy code images, moderated subscription, confirmation mails that contain a question or a small math task)?

I visit some more forums than just this but I think this one is most spam infected.
Mart°n
Knows everything, can prove it
Posts: 672
Joined: Fri Oct 21, 2005 2:09 am
Location: Germany

Post by Mart°n »

Ori Redler wrote:From the various suggestions here it seems obvious that most of you agree that the language options should be somehow part of the character style.
I don’t agree here. It may be o.k. to put the language options on a character level but it shouldn’t be put into the character style. Not only the number of variations are very limited today for some of us, but maintaining or chancing a minor detail of a style would become a time intensive and error prone task, if I have to repeat the setting not only for 3 of my normal text styles but have to repeat them 9 times (3 styles multiplied by 3 languages).
Ori Redler wrote: A. Hyphenation: The main problem I see here is that Hyphenation is, properly, a paragraph setting, as it deals with how text is laid out in a paragraph, rather then how a character or a word is laid out.
It’s a three-headed monster. Some settings of the hyphenation options belong to the paragraph, some to the section and some to the chosen language (but none to the character itself). To split up the settings and scatter them into 3 setup windows may be technically correct but I think this would cause users going mad.
Ori Redler wrote: B. Typographer's Quotes: This is now an on-the-fly setting, that happens while-you-type, and the suggestion is to make it a character style option.
Don’t agree here too. If I take my style example from above (9 styles) and like to have 2 quotation marks on each setting (for example German quotes if I use a single English word in a German sentence and English quotes, if I cite a sentence or a paragraph) I now have to use 18 styles in total (only for the main text, notes and headlines not counted) which would be a mess.
Ori Redler wrote: C. Spelling: means that the spelling dictionary will change depending on the current character style.
As a “character” couldn’t be spell checked (to which language belongs the character “b”?) I don’t find the character style the right place for this.
Ori Redler wrote: The problem with doing this with variations too is that it's, well, very problematic. For example, if you have in a word two variations, you would not be able to hyphenate or spell it unless you determine arbitrarily to which variation this belongs.

Only one problem I tried to avoid with the thing called “Word Style”.
Ori Redler wrote: Setting the language option to a new "level" between paragraph and character, as Mart°n suggests, is confusing. Obviously, there is no such level in terms of hierarchy, so the setting will ultimately be a "fake" one -- posing as a separate level, but actually belonging to either the paragraph level setting (which none here wants) or the character level setting.
Interesting point of view, I couldn’t reproduce. Both, paragraph settings and character settings affect the appearance of the written text. Every change you make in one of those styles could be seen (not live, unfortunately) after you click the “Save” or “O.K.” button. A language of a word isn’t something you could see. It’s a semantic style that doesn’t belong not neither, the paragraph nor the character setting.
I couldn’t see the confusion of such a level (I wouldn’t name it a style). You select a word, a sentence or a paragraph and apply a language to this selection. That’s it. Problems like non hyphenated or spell checked words or the red|green “wonderful” example from above couldn’t happen and you could even enable the highligh-color to show which word uses a language. I couldn’t see a more transparent solution (which doesn’t mean, that there isn't one).
On the other side, maintaining a big number of styles which you probably couldn’t select any longer with the currently available shortcuts (the F-keys are limited), problems of non hypenated, non spell checked and bilingual words. That are things that sounds confusing and may cause some headace.

As I only use latin languages, I may not have the proper insight if the language-level causes confusion in other languages and therefore would make such a level useless or impossible.

If language settings will be tied to the character level, I really would like to see them as a setting that could be changed independent from the visual character style (not a word level but a level on top of character styles) to be able to tame the beast.

I would also be happy to hear the need of the other 21 people that have voted for multi-lingual spell checking but haven’t said a word.
rpcameron
Knows everything, can prove it
Posts: 980
Joined: Wed Oct 26, 2005 12:48 am
Location: IE, CA, USA

Post by rpcameron »

Mart°n wrote:I would also be happy to hear the need of the other 21 people that have voted for multi-lingual spell checking but haven’t said a word.
This definitely is a difficult topic to wrangle. Ori is correct about the "fake" nature of a "word" meta-level for style and/or language. However, I can see the language settings being applied on the character basis. There are a few reasons for that:
  • The current OpenType language setting for font and language specific featuers is at the character level.
  • It is already quite easy to employ multiple languages within a paragraph, so tying language to paragraph would be a bit odd. The next logical step down from paragraph is character.
What I imagine is a new style type of language. This is what I see the new style including:
  • It will be implemented on the character level, and allow for options of hyphenation, spelling, quotation marks, OpenType features and text direction.
  • Hyphenation and other repeated settings will be moved from the paragraph style into the language style. The paragraph style will instead include a "Language" setting that references the default or majority language for that style.
  • The language style will also include a base character style. This is the basis for the settings of the language style, and modification to the OpenType features, style variations, &c. will be set in the language style.
  • While language is on the character level, it will be a "fake" word–level style. The language style will also have settings for word boundaries. This includes whitespace and/or punctuation to let Mellel differentiate between one word and another of that language. In the case that there are two or more language styles within the same "word", then the rules for the language that's set as the paragraph default take precedence.
I know it's not quite refined, but that's the closest approximation I can see for the implementation of such a language setting/feature. I know that this will involve a lot of consideration on the part of the Redlers, but I am confident that they will find an elegant way to distill this difficult topic into one that seems intuitive and beautiful.
— Robert Cameron
transalpin
Read the guide!
Posts: 44
Joined: Tue May 30, 2006 1:42 pm

Post by transalpin »

rpcameron wrote:The language style will also have settings for word boundaries. This includes whitespace and/or punctuation to let Mellel differentiate between one word and another of that language.
That’s the point! Why should the application decide for me whether the question mark belongs to the Hebrew or to the English part? Based on which rules? (see also the objections with regard to Chinese and Japanese)
In the case that there are two or more language styles within the same "word", then the rules for the language that's set as the paragraph default take precedence.
Why should anyone want to assign only half a word to a foreign language? Do we really need those ludicrous “security measures”?

Maybe we should follow the less-is-more principle and let the Redlers do their job. I’m sure they’ll come up with a usable, user-friendly solution.
Post Reply