Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right-to-Left (RTL) support for Hebrew and Arabic #219

Open
boustanihani opened this issue Apr 5, 2014 · 60 comments
Open

Right-to-Left (RTL) support for Hebrew and Arabic #219

boustanihani opened this issue Apr 5, 2014 · 60 comments

Comments

@boustanihani
Copy link

Please add Right-to-Left (RTL) support for languages like Hebrew and Arabic...

Something like:

doc.rtl(true);

doc.text('...', {rtl: true});
@devongovett
Copy link
Member

I don't know much about RTL languages, but it seems to me that you could reverse the string and align to the right to get this working (I'm probably wrong here). However, if the text contains a combination of LTR and RTL text, then we'll need an implementation of the Unicode Bidi Algorithm. Those who know more than I do, please fill me in. I'd love to see this implemented so PDFKit is more widely usable.

@devongovett
Copy link
Member

A separate issue is vertical text support (e.g. Japanese), which I'd also like to see and which has its own challenges.

@boustanihani
Copy link
Author

Arabic also has its own challenges because letters get a different shape depending on their position in a word (beginning, middle, end) so this is anything but easy :)

@devongovett
Copy link
Member

Interesting, I assume there is some sort of algorithm out there to determine this? Starting to sound like a lot of work.

@etodanik
Copy link

etodanik commented Aug 5, 2014

I'm just parachuting in, but isn't this something like what you need:
https://github.com/mathiasbynens/node-unicode-data

Why re-implement unicode algorithms?

EDIT: Wait a moment, this seems to be quite far from what's needed, my bad. But isn't there a ready implementation?

@devongovett
Copy link
Member

No, that's just unicode character metadata, not any actual algorithms. RTL support will require an implementation of the Unicode Bidi Algorithm. Shaping of Arabic text with contextual substitutions is a separate problem to solve.

@devongovett
Copy link
Member

Yeah, this library from Twitter might work but I haven't tried it.

@etodanik
Copy link

etodanik commented Aug 5, 2014

I'll go ahead and fork pdfkit, and see what i can come up with. Any pointers for where to start and how you'd approach it?

@devongovett
Copy link
Member

I'd try Twitter's library and see if it produces the results you expect. Sorry for being so ignorant on this, but does it work to run the text through that library, then send the result to the PDFKit doc.text method?

@etodanik
Copy link

etodanik commented Aug 5, 2014

I found something that might be even more to the point:
https://github.com/cscott/node-icu-bidi

@devongovett
Copy link
Member

Yeah, the problem is that node-icu-bidi is a node C++ module, but PDFKit also works in the browser, so everything must be pure JavaScript. If it works for your needs, feel free to use it, but PDFKit won't take on a non-JS dependency.

@etodanik
Copy link

etodanik commented Aug 5, 2014

I understand, so an acceptable solution would be to extract the BIDI
algorithm from the twitter library, correct?

On Tue, Aug 5, 2014 at 8:00 PM, Devon Govett notifications@github.com
wrote:

Yeah, the problem is that node-icu-bidi is a node C++ module, but PDFKit
also works in the browser, so everything must be pure JavaScript. If it
works for your needs, feel free to use it, but PDFKit won't take on a
non-JS dependency.


Reply to this email directly or view it on GitHub
#219 (comment).

@yelouafi
Copy link
Contributor

maybe i'm late to the party; just wanted to mention i've implemented (a looong ago) a similar solution in DOS (with the old fashioned 16x16 bitmap fonts); but i think the same approach can be applied here

1- reorder the input string using the Bidi algorithm
2- reshape by applying single glyph substitution depending in the context (beginning, middle, end of the word or standalone glyph).
3- ligatures
4- inverse the alignment (possibly using a RTL flag); if this is supported then a more appropriate naming of alignment options should be : leading/trailing instead of right/left

1 and 4 are the 'easy parts'; for 2 and 3 it's another story: for the OpenType fonts i think there is a GSUB table that can be used for this; but for other font types the only option i think is to implement the specific algorithm for each script (as you said this is a lot of work)

@yelouafi
Copy link
Contributor

it seems another solution to Arabic shaping is the use of 'Text based Shaping' that transforms the characters on the string level rather than in the Glyph level (further details are there). And it seems there is already an implementation of this kind in Javascript by the ibm-js team. From the sources it appears that the text engine performs a bunch of operations at the character level:

1- Bidi reordering
2- Text shaping (AFAIK applies only to Arabic scripts)
3- symmetrical swapping (replace [(.. with their symmetrical RTL )]... )
4- Number shaping (replace 'Western-Arabic)' numbers 0, 1,2 ... with their Eastearn Arabic counterparts ٠‎,١,‎٢ ...‎‎)

This can be also a possible fallback to non OpenType fonts which doesn't have a GSUB table

@devongovett
Copy link
Member

devongovett commented Aug 27, 2016

Getting closer. With v0.8.0 the font engine changed to fontkit, which supports an Arabic shaper (e.g. @yelouafi's steps 2 and 3). Still need to implement the bidi algorithm for mixed script text.

@soryy708
Copy link

If you prioritize Bidi reordering, and symmetrical swapping, it's enough for Hebrew support.
While technically Hebrew has characters that look different when they're in the end of the word, you shouldn't care about it because unicode defines them as separate characters.
Text & number shaping can be added later for Arabic support.

@StigP1337
Copy link

StigP1337 commented Mar 17, 2017

I found the following infos related to this topic. Python Arabic Reshaper is a library which can be used in cases when native Arabic support is not available. The readme contains a good explanation of the issue and the solution. This library has been ported to Javascript.

On the BIDI topic I found this test program written in Javascript.

@sayamqazi
Copy link

There are GSUB (Glyph substitution) tables in font files for Complex languages.
This link explains those tables with example.
https://www.microsoft.com/typography/otfntdev/arabicot/features.aspx

@mohanagy
Copy link

mohanagy commented Oct 1, 2017

PDFKIT still has a problem with RTL
any updates? @devongovett

@setpixel
Copy link

Hi! @devongovett any update on RTL support? Question 2, is this project dead?

@setpixel
Copy link

Please don't be dead :(

@aminify
Copy link

aminify commented May 4, 2018

@setpixel I needed this too, but since this doesn't sound that they have added this feature I want to inform you I found jsPDF really useful. they support arabic now.

@ninbit
Copy link

ninbit commented Mar 18, 2019

pdfkit has more functionality than jsPDF. jsPDF doesn't have full unicode support but pdfkit does. The project and its committers deserve the praise. For RTL, right-aligned text works very well. However, when we want to use columns, things change. The need is just to start from right-most column through the left most column. @devongovett we don't need anything except this I think because the RTL text has its RTL way, no need to reverse the strings. (same for LTR inside RTL)

@etodanik
Copy link

etodanik commented Mar 19, 2019 via email

@mayassalman
Copy link

Capture

@weera-tech
Copy link

Simply reversing the text before it goes to pdfkit seems to work for both Hebrew and Arabic (I'm just eyeballing the text however since I speak neither)

const isHebrew = (text) => {
  return text.search(/[\u0590-\u05FF]/) >= 0;
};

const isArabic = (text) => {
  return text.search(/[\u0600-\u06FF]/) >= 0;
};


const rightToLeftText = (text) => {
  if (isHebrew(text) || isArabic(text)) {
    return text.split(' ').reverse().join(' ');
  } else {
    return text;
  }
};

rightToLeftText('أنا أتحدث اللغة العربية');
rightToLeftText('אני מדברת עברית');

This is exactly what I am looking for. Just a bit improvement:
For RTL languages like persian (as I use it), add a space to the end of the string:
text.split(' ').reverse().join(' ') + ' ';
This will work like a charm!!!
Remember that if your string have special characters (e.g. ":") at the end, put it before added white space.

@andreialecu
Copy link

Just a note for whoever is still stuck on this that reversing the text is not a good idea. It will reverse things like numbers and various other things that should not be reversed. 123456 might result in being reversed to 654321

Use a library meant for this, like TwitterCldr, see #219 (comment)

@weera-tech
Copy link

weera-tech commented Dec 12, 2019

Just a note for whoever is still stuck on this that reversing the text is not a good idea. It will reverse things like numbers and various other things that should not be reversed. 123456 might result in being reversed to 654321

Use a library meant for this, like TwitterCldr, see #219 (comment)

Note: We are reversing array of words, not array of characters!!!
I am trying twitterCLDR and problem still persists. In my case, problem isn't about character ordering, it is about white spaces. If you are using linux, as I, just install suitable language package, this will resolve character ordering and it will not be a problem anymore. TwitterCLDR is good for white space ordering but it operates character ordering simultaneously, and it is not good. The best manipulation is reverse() for me.

@andreialecu
Copy link

@weera-tech the actual letters need to be reversed too. Not just the word order is supposed to be reversed in rtl writing.

@weera-tech
Copy link

@weera-tech the actual letters need to be reversed too. Not just the word order is supposed to be reversed in rtl writing.

You are right, but I said that first install suitable language package, in RTL direction, you have to set align to right. Therefore it will have conflict with TCLDR character ordering. simple: -1 * -1 = 1 :)

@andreialecu
Copy link

I'm not sure what sort of mechanism would actually reverse characters for you, but not words, considering pdfkit has no rtl support whatsoever. Perhaps something weird is happening on Linux. I'm using pdfkit in the browser with webpack.

In my experience, and I have a production app using this approach with TwitterCLDR and pdfkit, simply reversing words resulted in support tickets being issued for exactly this problem. Words where in the correct order, but letters were in the wrong order.

@weera-tech
Copy link

Ooops!!!
You are using it in client-side? I am using server-side. Probably this is our difference.

@devongovett
Copy link
Member

The only correct implementation will be the Unicode bidi algorithm. Anything else, especially reverse(), will be incorrect.

@andreialecu
Copy link

There is a recent WASM build of the HarfBuzz engine which is a text shaping engine used by Firefox Chrome, and others.

https://github.com/harfbuzz/harfbuzzjs

It does support Unicode bidi algorithms among other things. I believe it could be integrated with pdfkit to solve RTL once and for all.

There is a demo here: https://harfbuzz.github.io/harfbuzzjs/

Some discussion about it being used to solve RTL issues for Photopea, which is a very popular online image editor: harfbuzz/harfbuzzjs#10

Unfortunately I'm not familiar at all with pdfkit's text rendering, but perhaps someone could look into it.

@AlexeiLevinzon
Copy link

Hey,

Any news with RTL support?

@andreialecu
Copy link

@devongovett from my limited understanding of fontkit it seems that it does indeed support rtl.

I found this site and I was able to see rtl text being rendered properly.
https://fontkit-demo.now.sh/

Also from what I understand, pdfkit is based on fontkit so what is stopping this from working?

@alex-enchi
Copy link

@andreialecu because RTL support is more than glyph rendering

rtl is something weird

The only proper way to render rtl language is

  1. determine flow of the paragraph (rtl or ltr)
  2. run text through unicode bidi
  3. render text, start position is determined by is paragraph rtl or ltr

@amitm02
Copy link

amitm02 commented Mar 31, 2020

I too would love to have an RTL support (Hebrew).

@afsheen1
Copy link

+1 for rtl support

@mayassalman
Copy link

mayassalman commented Sep 30, 2020

Think out of the box
use puppeteer

@RMS21
Copy link

RMS21 commented Feb 2, 2021

I was able to use Persian font like this, I used this link
http://pdfkit.org/docs/text.html#fonts

doc.font("your language font here")
   .text("text");

in my case, I used a Persian font you can use the font you need

@rodrigonzalz
Copy link

How is this still not supported?

@pubmikeb
Copy link

Wow, 7 years and still no full RTL-support out of box?…

@NadavRosenberg
Copy link

NadavRosenberg commented Aug 26, 2021

So I tried pretty much everything but nothing works.
I tried twitter-cldr-js like this:

const bidiText = TwitterCldr.Bidi.from_string('hello שלום world', { direction: "RTL" });
bidiText.reorder_visually();
return bidiText.toString();

but it gets rendered like this: world םולשhello.
Trying icu-bidi results in:

PS C:\Users\...> npm i icu-bidi
npm WARN EBADENGINE Unsupported engine {
npm WARN EBADENGINE   package: 'salt@0.5.5',
npm WARN EBADENGINE   required: { node: '>=0.6.x <=0.11.x' },
npm WARN EBADENGINE   current: { node: 'v14.17.0', npm: '7.20.6' }
npm WARN EBADENGINE }
npm ERR! code 1
npm ERR! path ...
\icu-bidi
npm ERR! command failed
npm ERR! command ...
k-to-build
npm ERR! 'node-pre-gyp' is not recognized as an internal or external command,
npm ERR! operable program or batch file.

npm ERR! A complete log of this run can be found in:
npm ERR!     C:\Users\...
ebug.log

The "solution":

const textWithDoubleSpaces = '!world ,שלום'.replace(' ', '  ');
return textWithDoubleSpaces.split(' ').reverse().join('  ');

will handle Hebrew but not combination of RTL and LTR (it's result with world! ,שלום).
unicode-bidirectional give me the following error:
image

Any working suggestions? 🙏

@ghost
Copy link

ghost commented Nov 7, 2021

How come this superior library isn't supporting RTL languages?!!
That's ridiculous:)
Though the package has implemented dozens of great functionalities, it's utterly incapable of supporting RTL text.
7 years and still no support:| That's a complete shame for the core developers!

@naizapp
Copy link

naizapp commented Dec 6, 2021

For me I get all the arabic letters parsed correctly on { rtl: true }, but only the numbers are in reverse direction. So I wrote a function, pass the string into it before adding it to the text() function of PdfKit

Before

مروحة (002 - 001 م)

Code

revNumsInString = (s) => {
    var x = 0, keep = "", r = 0;
    s.replace(/(?:[\d])/gi, (i, q) => {keep += (r == q - 1 ? "" : "|") + i; r = q;});
    keep = keep.split("|").map(x => x.split("").reverse().join("")).join("");
    return s.replace(/(?:[\d])/gi, (i) =>keep[x++]);
}

Result

مروحة (200 - 100 م)

@advance512
Copy link

@AmirABody Kinda wondering, why would say this is a superior library, then?

@devongovett
Copy link
Member

It requires a higher level layout algorithm than what pdfkit offers, for example https://github.com/foliojs/textkit. React PDF uses it under the hood: https://github.com/diegomura/react-pdf. Not sure if it supports bidi yet but the architecture is there to support it. Personally I think pdfkit is too low level for advanced text layout, and that it belongs in a higher level library like React PDF or pdfmake, but I also don't work on pdfkit much anymore.

@r4wand
Copy link

r4wand commented Nov 22, 2023

still an issue 9 years later.

@DirkSW
Copy link

DirkSW commented Jan 17, 2024

to my understanding there are 2 challenges:

  1. bi-directional text rendering (to support RTL and LTR and mixed) --> the words must be in the right order
  2. layout of the document
  • PDF with locale: e.g. ar (arabic) shall be rendered from right to left
  • PDF with locale: e.g. en (english) shall be rendered from left to right

regarding point 1. which was discussed above
i think the solution might be to use from opentype specification ...
https://learn.microsoft.com/en-us/typography/opentype/spec/featurelist
the feature rtla
this works with pdfkit already since long time...

please test something like

var doc = new PDFDocument({})
const customFont = fs.readFileSync('./NotoSansArabic-Regular.ttf')
doc.registerFont(Regular, customFont)
doc.fontSize(15)
doc.font(Regular).fillColor("black").text("مرحبا كيف حالك")
doc.font(Regular).fillColor("black").text("مرحبا كيف حالك" , {features: ['rtla']})
doc.font(Regular).fillColor("black").text("مرحبا كيف حالك" , {features: ['']})

additionally you can mix arabic and non arabic texts and it shall render correctly

or am i wrong ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests