Right-to-Left (RTL) support for Hebrew and Arabic #219

boustanihani · 2014-04-05T18:09:09Z

Please add Right-to-Left (RTL) support for languages like Hebrew and Arabic...

Something like:

doc.rtl(true);

doc.text('...', {rtl: true});

devongovett · 2014-04-05T20:38:31Z

I don't know much about RTL languages, but it seems to me that you could reverse the string and align to the right to get this working (I'm probably wrong here). However, if the text contains a combination of LTR and RTL text, then we'll need an implementation of the Unicode Bidi Algorithm. Those who know more than I do, please fill me in. I'd love to see this implemented so PDFKit is more widely usable.

devongovett · 2014-04-05T20:40:12Z

A separate issue is vertical text support (e.g. Japanese), which I'd also like to see and which has its own challenges.

boustanihani · 2014-04-05T21:46:41Z

Arabic also has its own challenges because letters get a different shape depending on their position in a word (beginning, middle, end) so this is anything but easy :)

devongovett · 2014-04-06T16:39:47Z

Interesting, I assume there is some sort of algorithm out there to determine this? Starting to sound like a lot of work.

etodanik · 2014-08-05T16:15:29Z

I'm just parachuting in, but isn't this something like what you need:
https://github.com/mathiasbynens/node-unicode-data

Why re-implement unicode algorithms?

EDIT: Wait a moment, this seems to be quite far from what's needed, my bad. But isn't there a ready implementation?

devongovett · 2014-08-05T16:17:40Z

No, that's just unicode character metadata, not any actual algorithms. RTL support will require an implementation of the Unicode Bidi Algorithm. Shaping of Arabic text with contextual substitutions is a separate problem to solve.

devongovett · 2014-08-05T16:19:58Z

Yeah, this library from Twitter might work but I haven't tried it.

etodanik · 2014-08-05T16:21:04Z

I'll go ahead and fork pdfkit, and see what i can come up with. Any pointers for where to start and how you'd approach it?

devongovett · 2014-08-05T16:24:53Z

I'd try Twitter's library and see if it produces the results you expect. Sorry for being so ignorant on this, but does it work to run the text through that library, then send the result to the PDFKit doc.text method?

etodanik · 2014-08-05T16:31:39Z

I found something that might be even more to the point:
https://github.com/cscott/node-icu-bidi

devongovett · 2014-08-05T17:00:01Z

Yeah, the problem is that node-icu-bidi is a node C++ module, but PDFKit also works in the browser, so everything must be pure JavaScript. If it works for your needs, feel free to use it, but PDFKit won't take on a non-JS dependency.

etodanik · 2014-08-05T17:05:31Z

I understand, so an acceptable solution would be to extract the BIDI
algorithm from the twitter library, correct?

On Tue, Aug 5, 2014 at 8:00 PM, Devon Govett notifications@github.com
wrote:

Yeah, the problem is that node-icu-bidi is a node C++ module, but PDFKit
also works in the browser, so everything must be pure JavaScript. If it
works for your needs, feel free to use it, but PDFKit won't take on a
non-JS dependency.

—
Reply to this email directly or view it on GitHub
#219 (comment).

yelouafi · 2015-08-24T22:09:27Z

maybe i'm late to the party; just wanted to mention i've implemented (a looong ago) a similar solution in DOS (with the old fashioned 16x16 bitmap fonts); but i think the same approach can be applied here

1- reorder the input string using the Bidi algorithm
2- reshape by applying single glyph substitution depending in the context (beginning, middle, end of the word or standalone glyph).
3- ligatures
4- inverse the alignment (possibly using a RTL flag); if this is supported then a more appropriate naming of alignment options should be : leading/trailing instead of right/left

1 and 4 are the 'easy parts'; for 2 and 3 it's another story: for the OpenType fonts i think there is a GSUB table that can be used for this; but for other font types the only option i think is to implement the specific algorithm for each script (as you said this is a lot of work)

yelouafi · 2015-08-25T13:30:58Z

it seems another solution to Arabic shaping is the use of 'Text based Shaping' that transforms the characters on the string level rather than in the Glyph level (further details are there). And it seems there is already an implementation of this kind in Javascript by the ibm-js team. From the sources it appears that the text engine performs a bunch of operations at the character level:

1- Bidi reordering
2- Text shaping (AFAIK applies only to Arabic scripts)
3- symmetrical swapping (replace [(.. with their symmetrical RTL )]... )
4- Number shaping (replace 'Western-Arabic)' numbers 0, 1,2 ... with their Eastearn Arabic counterparts ٠‎,١,‎٢ ...‎‎)

This can be also a possible fallback to non OpenType fonts which doesn't have a GSUB table

devongovett · 2016-08-27T14:57:53Z

Getting closer. With v0.8.0 the font engine changed to fontkit, which supports an Arabic shaper (e.g. @yelouafi's steps 2 and 3). Still need to implement the bidi algorithm for mixed script text.

soryy708 · 2017-03-16T17:35:08Z

If you prioritize Bidi reordering, and symmetrical swapping, it's enough for Hebrew support.
While technically Hebrew has characters that look different when they're in the end of the word, you shouldn't care about it because unicode defines them as separate characters.
Text & number shaping can be added later for Arabic support.

StigP1337 · 2017-03-17T11:57:03Z

I found the following infos related to this topic. Python Arabic Reshaper is a library which can be used in cases when native Arabic support is not available. The readme contains a good explanation of the issue and the solution. This library has been ported to Javascript.

On the BIDI topic I found this test program written in Javascript.

sayamqazi · 2017-08-12T09:34:26Z

There are GSUB (Glyph substitution) tables in font files for Complex languages.
This link explains those tables with example.
https://www.microsoft.com/typography/otfntdev/arabicot/features.aspx

mohanagy · 2017-10-01T21:31:36Z

PDFKIT still has a problem with RTL
any updates? @devongovett

setpixel · 2017-11-22T20:07:46Z

Hi! @devongovett any update on RTL support? Question 2, is this project dead?

setpixel · 2018-03-27T16:56:53Z

Please don't be dead :(

aminify · 2018-05-04T19:54:49Z

@setpixel I needed this too, but since this doesn't sound that they have added this feature I want to inform you I found jsPDF really useful. they support arabic now.

ninbit · 2019-03-18T19:57:47Z

pdfkit has more functionality than jsPDF. jsPDF doesn't have full unicode support but pdfkit does. The project and its committers deserve the praise. For RTL, right-aligned text works very well. However, when we want to use columns, things change. The need is just to start from right-most column through the left most column. @devongovett we don't need anything except this I think because the RTL text has its RTL way, no need to reverse the strings. (same for LTR inside RTL)

etodanik · 2019-03-19T06:06:10Z

RTL is much more than right aligned text. There’s the issue of comma and dot positions, and what happens when LTR stuff like numbers and English text are mixed in a sentence.

mayassalman · 2019-07-10T10:37:50Z

weera-tech · 2019-12-12T00:12:43Z

Simply reversing the text before it goes to pdfkit seems to work for both Hebrew and Arabic (I'm just eyeballing the text however since I speak neither)

const isHebrew = (text) => {
  return text.search(/[\u0590-\u05FF]/) >= 0;
};

const isArabic = (text) => {
  return text.search(/[\u0600-\u06FF]/) >= 0;
};


const rightToLeftText = (text) => {
  if (isHebrew(text) || isArabic(text)) {
    return text.split(' ').reverse().join(' ');
  } else {
    return text;
  }
};

rightToLeftText('أنا أتحدث اللغة العربية');
rightToLeftText('אני מדברת עברית');

This is exactly what I am looking for. Just a bit improvement:
For RTL languages like persian (as I use it), add a space to the end of the string:
text.split(' ').reverse().join(' ') + ' ';
This will work like a charm!!!
Remember that if your string have special characters (e.g. ":") at the end, put it before added white space.

andreialecu · 2019-12-12T11:06:00Z

Just a note for whoever is still stuck on this that reversing the text is not a good idea. It will reverse things like numbers and various other things that should not be reversed. 123456 might result in being reversed to 654321

Use a library meant for this, like TwitterCldr, see #219 (comment)

weera-tech · 2019-12-12T13:29:05Z

Just a note for whoever is still stuck on this that reversing the text is not a good idea. It will reverse things like numbers and various other things that should not be reversed. 123456 might result in being reversed to 654321

Use a library meant for this, like TwitterCldr, see #219 (comment)

Note: We are reversing array of words, not array of characters!!!
I am trying twitterCLDR and problem still persists. In my case, problem isn't about character ordering, it is about white spaces. If you are using linux, as I, just install suitable language package, this will resolve character ordering and it will not be a problem anymore. TwitterCLDR is good for white space ordering but it operates character ordering simultaneously, and it is not good. The best manipulation is reverse() for me.

andreialecu · 2019-12-12T13:40:47Z

@weera-tech the actual letters need to be reversed too. Not just the word order is supposed to be reversed in rtl writing.

weera-tech · 2019-12-12T13:43:54Z

@weera-tech the actual letters need to be reversed too. Not just the word order is supposed to be reversed in rtl writing.

You are right, but I said that first install suitable language package, in RTL direction, you have to set align to right. Therefore it will have conflict with TCLDR character ordering. simple: -1 * -1 = 1 :)

andreialecu · 2019-12-12T13:56:23Z

I'm not sure what sort of mechanism would actually reverse characters for you, but not words, considering pdfkit has no rtl support whatsoever. Perhaps something weird is happening on Linux. I'm using pdfkit in the browser with webpack.

In my experience, and I have a production app using this approach with TwitterCLDR and pdfkit, simply reversing words resulted in support tickets being issued for exactly this problem. Words where in the correct order, but letters were in the wrong order.

weera-tech · 2019-12-12T14:12:00Z

Ooops!!!
You are using it in client-side? I am using server-side. Probably this is our difference.

devongovett · 2019-12-12T15:26:04Z

The only correct implementation will be the Unicode bidi algorithm. Anything else, especially reverse(), will be incorrect.

andreialecu · 2019-12-12T16:21:33Z

There is a recent WASM build of the HarfBuzz engine which is a text shaping engine used by Firefox Chrome, and others.

https://github.com/harfbuzz/harfbuzzjs

It does support Unicode bidi algorithms among other things. I believe it could be integrated with pdfkit to solve RTL once and for all.

There is a demo here: https://harfbuzz.github.io/harfbuzzjs/

Some discussion about it being used to solve RTL issues for Photopea, which is a very popular online image editor: harfbuzz/harfbuzzjs#10

Unfortunately I'm not familiar at all with pdfkit's text rendering, but perhaps someone could look into it.

AlexeiLevinzon · 2020-01-16T10:51:30Z

Hey,

Any news with RTL support?

andreialecu · 2020-01-16T13:36:17Z

@devongovett from my limited understanding of fontkit it seems that it does indeed support rtl.

I found this site and I was able to see rtl text being rendered properly.
https://fontkit-demo.now.sh/

Also from what I understand, pdfkit is based on fontkit so what is stopping this from working?

alex-enchi · 2020-01-16T14:23:53Z

@andreialecu because RTL support is more than glyph rendering

The only proper way to render rtl language is

determine flow of the paragraph (rtl or ltr)
run text through unicode bidi
render text, start position is determined by is paragraph rtl or ltr

amitm02 · 2020-03-31T05:29:19Z

I too would love to have an RTL support (Hebrew).

afsheen1 · 2020-09-30T05:31:23Z

+1 for rtl support

mayassalman · 2020-09-30T07:28:02Z

Think out of the box
use puppeteer

RMS21 · 2021-02-02T08:54:36Z

I was able to use Persian font like this, I used this link
http://pdfkit.org/docs/text.html#fonts

doc.font("your language font here")
   .text("text");

in my case, I used a Persian font you can use the font you need

rodrigonzalz · 2021-05-23T06:11:52Z

How is this still not supported?

pubmikeb · 2021-06-30T20:33:39Z

Wow, 7 years and still no full RTL-support out of box?…

NadavRosenberg · 2021-08-26T09:50:17Z

So I tried pretty much everything but nothing works.
I tried twitter-cldr-js like this:

const bidiText = TwitterCldr.Bidi.from_string('hello שלום world', { direction: "RTL" });
bidiText.reorder_visually();
return bidiText.toString();

but it gets rendered like this: world םולשhello.
Trying icu-bidi results in:

PS C:\Users\...> npm i icu-bidi
npm WARN EBADENGINE Unsupported engine {
npm WARN EBADENGINE   package: 'salt@0.5.5',
npm WARN EBADENGINE   required: { node: '>=0.6.x <=0.11.x' },
npm WARN EBADENGINE   current: { node: 'v14.17.0', npm: '7.20.6' }
npm WARN EBADENGINE }
npm ERR! code 1
npm ERR! path ...
\icu-bidi
npm ERR! command failed
npm ERR! command ...
k-to-build
npm ERR! 'node-pre-gyp' is not recognized as an internal or external command,
npm ERR! operable program or batch file.

npm ERR! A complete log of this run can be found in:
npm ERR!     C:\Users\...
ebug.log

The "solution":

const textWithDoubleSpaces = '!world ,שלום'.replace(' ', '  ');
return textWithDoubleSpaces.split(' ').reverse().join('  ');

will handle Hebrew but not combination of RTL and LTR (it's result with world! ,שלום).
unicode-bidirectional give me the following error:

Any working suggestions? 🙏

ghost · 2021-11-07T13:27:20Z

How come this superior library isn't supporting RTL languages?!!
That's ridiculous:)
Though the package has implemented dozens of great functionalities, it's utterly incapable of supporting RTL text.
7 years and still no support:| That's a complete shame for the core developers!

naizapp · 2021-12-06T14:18:34Z

For me I get all the arabic letters parsed correctly on { rtl: true }, but only the numbers are in reverse direction. So I wrote a function, pass the string into it before adding it to the text() function of PdfKit

Before

مروحة (002 - 001 م)

Code

revNumsInString = (s) => {
    var x = 0, keep = "", r = 0;
    s.replace(/(?:[\d])/gi, (i, q) => {keep += (r == q - 1 ? "" : "|") + i; r = q;});
    keep = keep.split("|").map(x => x.split("").reverse().join("")).join("");
    return s.replace(/(?:[\d])/gi, (i) =>keep[x++]);
}

Result

مروحة (200 - 100 م)

advance512 · 2021-12-06T18:04:20Z

@AmirABody Kinda wondering, why would say this is a superior library, then?

devongovett · 2021-12-06T18:15:30Z

It requires a higher level layout algorithm than what pdfkit offers, for example https://github.com/foliojs/textkit. React PDF uses it under the hood: https://github.com/diegomura/react-pdf. Not sure if it supports bidi yet but the architecture is there to support it. Personally I think pdfkit is too low level for advanced text layout, and that it belongs in a higher level library like React PDF or pdfmake, but I also don't work on pdfkit much anymore.

r4wand · 2023-11-22T12:41:09Z

still an issue 9 years later.

DirkSW · 2024-01-17T12:53:01Z

to my understanding there are 2 challenges:

bi-directional text rendering (to support RTL and LTR and mixed) --> the words must be in the right order
layout of the document

PDF with locale: e.g. ar (arabic) shall be rendered from right to left
PDF with locale: e.g. en (english) shall be rendered from left to right

regarding point 1. which was discussed above
i think the solution might be to use from opentype specification ...
https://learn.microsoft.com/en-us/typography/opentype/spec/featurelist
the feature rtla
this works with pdfkit already since long time...

please test something like

var doc = new PDFDocument({})
const customFont = fs.readFileSync('./NotoSansArabic-Regular.ttf')
doc.registerFont(Regular, customFont)
doc.fontSize(15)
doc.font(Regular).fillColor("black").text("مرحبا كيف حالك")
doc.font(Regular).fillColor("black").text("مرحبا كيف حالك" , {features: ['rtla']})
doc.font(Regular).fillColor("black").text("مرحبا كيف حالك" , {features: ['']})

additionally you can mix arabic and non arabic texts and it shall render correctly

or am i wrong ?

devongovett added word wrapping fonts labels Aug 27, 2016

moravcik mentioned this issue Sep 20, 2016

Arabic support bpampuch/pdfmake#315

Closed

liborm85 mentioned this issue Dec 27, 2016

Right to left language (RTL) bpampuch/pdfmake#184

Open

liborm85 mentioned this issue Apr 29, 2019

LTR rendering of RTL " arabic " words #905

Closed

liborm85 mentioned this issue Nov 16, 2019

Metadata #1043

Closed

liborm85 mentioned this issue Dec 6, 2019

PDF creation for Hebrew #1068

Closed

NathanaelA mentioned this issue Apr 11, 2020

Support Multi Language (Arabic) NathanaelA/fluentreports#56

Open

liborm85 mentioned this issue Jul 11, 2021

is it support urdu/arabic fornt #640

Closed

moshfeu mentioned this issue Feb 15, 2024

Allow to set text features to support rtl natancabral/pdfkit-table#89

Open

Right-to-Left (RTL) support for Hebrew and Arabic #219

Right-to-Left (RTL) support for Hebrew and Arabic #219

Comments

boustanihani commented Apr 5, 2014

devongovett commented Apr 5, 2014

devongovett commented Apr 5, 2014

boustanihani commented Apr 5, 2014

devongovett commented Apr 6, 2014

etodanik commented Aug 5, 2014

devongovett commented Aug 5, 2014

devongovett commented Aug 5, 2014

etodanik commented Aug 5, 2014

devongovett commented Aug 5, 2014

etodanik commented Aug 5, 2014

devongovett commented Aug 5, 2014

etodanik commented Aug 5, 2014

yelouafi commented Aug 24, 2015

yelouafi commented Aug 25, 2015

devongovett commented Aug 27, 2016 • edited

soryy708 commented Mar 16, 2017

StigP1337 commented Mar 17, 2017 • edited

sayamqazi commented Aug 12, 2017

mohanagy commented Oct 1, 2017

setpixel commented Nov 22, 2017

setpixel commented Mar 27, 2018

aminify commented May 4, 2018

ninbit commented Mar 18, 2019

etodanik commented Mar 19, 2019 via email

mayassalman commented Jul 10, 2019

weera-tech commented Dec 12, 2019

andreialecu commented Dec 12, 2019

weera-tech commented Dec 12, 2019 • edited

andreialecu commented Dec 12, 2019

weera-tech commented Dec 12, 2019

andreialecu commented Dec 12, 2019

weera-tech commented Dec 12, 2019

devongovett commented Dec 12, 2019

andreialecu commented Dec 12, 2019

AlexeiLevinzon commented Jan 16, 2020

andreialecu commented Jan 16, 2020

alex-enchi commented Jan 16, 2020

amitm02 commented Mar 31, 2020

afsheen1 commented Sep 30, 2020

mayassalman commented Sep 30, 2020 • edited

RMS21 commented Feb 2, 2021 • edited

rodrigonzalz commented May 23, 2021

pubmikeb commented Jun 30, 2021

NadavRosenberg commented Aug 26, 2021 • edited

ghost commented Nov 7, 2021

naizapp commented Dec 6, 2021 • edited

advance512 commented Dec 6, 2021

devongovett commented Dec 6, 2021

r4wand commented Nov 22, 2023

DirkSW commented Jan 17, 2024

devongovett commented Aug 27, 2016 •

edited

StigP1337 commented Mar 17, 2017 •

edited

weera-tech commented Dec 12, 2019 •

edited

mayassalman commented Sep 30, 2020 •

edited

RMS21 commented Feb 2, 2021 •

edited

NadavRosenberg commented Aug 26, 2021 •

edited

naizapp commented Dec 6, 2021 •

edited