Hey community,
I spent some time trying to make PDF export work for CJK (Chinese,
Japanese, Korean) characters, and managed to get it working quite well.
Searching for some good open source fonts, I finally decided on the
following:
- CJK Unifonts (Linux re-packaging of the Arphic fonts)
- IPAGothic
- Baekmuk
The first one comes in two variants, serif (a.k.a. ming or song) and
script (regular script, kai), and has good support for Chinese, with
good, but not complete, support for Japanese, and no support for Korean.
It looks very good in both variants, but we should decide on one of
them. I uploaded samples on
http://jira.xwiki.org/browse/XWIKI-7106 to
see how they would look.
***
Q1: Should Kai or Ming be used as the default export font for Chinese?
***
I'm far from being an expert here, but my opinion is that the Kai
variant, with it's handwritten look, is better suited for printed
material. Still, PDFs are also used on screen, be that a large computer
monitor or a handheld device, and on screen the legibility of the Ming
variant is better. One option that I like is to use Kai for normal text
and Ming for tt/code elements, as a kind of monospace.
The second font, IPAGothic, is centered on Japanese, so it has good
support for Japanese, some support for Chinese, and no support for
Korean. It is a sans-serif variant.
The third font, Baekmuk, brings support for Korean (laking from the
other two fonts), along with little support for some Chinese and
Japanese characters. This one comes in more variants, but only two are
complete enough to be considered, Batang as the serif equivalent, and
Gulim as the sans-serif equivalent.
***
Q2: Should Batang or Gulim be used for Korean?
***
My opinion is that the serif variant looks better on print, although
less readable. Still, I've seen Gulim much more often used in practice.
I attached two samples for this as well to the Jira issue.
***
Q3: Should the current FreeSerif font be used for non-CJK characters, or
the font face defined in the font specific to each language?
***
While I prefer FreeSerif for all English text, I've seen in practice
that the preferred solution is to use a bulkier font for numbers and
latin characters.
***
Q4: Does italics/oblique make sense for CJK characters?
***
The concept of Italics is defined only for latin-like characters, and no
font provides support for italics CJK. Still, Firefox does render
slanted characters for CJK text inside <em>. FOP, the rendering engine
used for generating PDFs, does not have support for automatically
slanting fonts that don't provide an italics variant, and will insist on
choosing a font that comes in an italics variant. So, this means that by
default any text that is emphasized in the wiki will not be displayed in
the PDF correctly (they would appear as # characters). There is a simple
solution, and that is to alter the font file so that is says that both
the regular and italic version of the font are in the file. Another
option is to actually provide an oblique version of the font, which
FontForge seems to be able to do quickly and with good results. Still,
this will double the size of the fonts, so I'd rather not provide italic
fonts if they don't actually make much sense for native CJK users.
Some other fonts that I looked at were:
* the Droid font used in Android devices, which is a sans-serif font IMO
not suited for print; its advantage would be that it provides a unitary
look for all CJK languages, less good looking, but more legible
* the Hanazono font, which has impressive support for all the characters
in CJK Unicode sets, but was created in a wiki way, so IMO it's not very
consistent throughout the whole spectrum, and not as esthetically
looking as the others
***
Q5: Should a less good looking, but smaller and more consistent font be
used? If yes, which one?
***
The Droid font is actually quite small compared to the others, and on
smaller font sizes it is more readable.
I would really appreciate some feedback on this topic.
--
Sergiu Dumitriu
http://purl.org/net/sergiu/