Limit how low the table cell width can go while exporting to PDF

View issue · Add comment

1 comment

Marius Dumitru Florea on 18/Nov/25 10:06

I managed to set up the proxy inside the headless Chrome Docker container. The approach I chose is this:

fetch the standard image, femtopixel/google-chrome-headless
create a container based on it
start the container

install the socat proxy in the running container:

 
                              docker exec -it -u root <containerId> bash -c "apt update && apt install -y socat"

start the socat proxy:

 
                              socat TCP-LISTEN:9222,fork,reuseaddr TCP:127.0.0.1:9223

The socat proxy is lightweight, and we set it up only once, so the additional time spent on this is not significant.

With this I was able to connect to the headless Chrome running inside the Docker container on the bridge network, but WebSocket connections were still refused. It seems that even if Chrome removed the -remote-debugging-address flag, they kept the -remote-allow-origins flag, and it's used to limit access to the remote debugging WebSocket end-point. Adding this flag back allowed me to fully connect to the headless Chrome.

With this in place, the next problem I hit was that creating a new incognito Chrome tab failed. The JSON reply was this:

 
                              {"code":-32000,"message":"Failed to open new tab - no browser is open"}

We're using this library https://github.com/kklisura/chrome-devtools-java-client to communicate with Chrome. This is basically a client for the Chrome DevTools Protocol (CDT). The last release of this library is from 2021 but the protocol has changed since... Fortunately, the https://github.com/kklisura/chrome-devtools-java-client provides the steps to update the protocol, which I did, and after some fixes I published https://maven.xwiki.org/externals/com/github/kklisura/cdt/cdt-java-client/5.0.0-xwiki/ . Next I had to update our code that uses the CDT client, because some calls / parameters have changed. Take for instance https://chromium-review.googlesource.com/c/chromium/src/+/5952980/2/chrome/test/chromedriver/chrome/chrome_impl.cc#243 . The newWindow needs to be null and not false, as before, when communicating with a headless Chrome instance.

With all this in place I managed to run our integration tests and I got a few failures. Most of them required some test code update. Only one was a real problem, largeExcelImport, and it was related to the usage of colspan and rowspan which paged.js doesn't support very well. I debugging a bit the paged.js code to see if there is an easy way to workaround this problem, but unfortuantely there isn't. The root problem is that when a table row with col/rowspan is overflowing the print page, paged.js looks for the more recent row that has all columns and uses that as split point. This leads to either having duplicated rows in the generated PDF, best case, or having an infinite rendering loop, which paged.js catches but it blocks the rendering on the next pages, so you end up with partial content in the generated PDF, worst case.

Knowing that paged.js has refactored this code in the 5.x branch I tried to upgrade to the 0.5.0-beta.2 version to see if it behaves better. It didn't work out of the box, because some new code in Paged.js is conflicting with Prototype.js (because Prototype.js is overwritting standard JavaScript types like Array). I managed to fix this and run our integration tests. It did improve a bit the rendering on tables with col/rowspan, but it showed some regressions. One the most important ones being:

which cuts images at the end of a print page, and also leads to missing text at the end of the print page when there is an image there. We have a test for this, and it was failing. I debugged a bit paged.js code, but I couldn't find a simple workaround, so in the end I decided to go back to the last stable version. 0.4.3 and try to fix the rendering of tables with col/rowspan.

The temporary workaround I implemented for tables with col/rowspan is to remove / replace the col/rowspan attributes with actual cells. E.g. if a table cell spans 2 columns, I insert a new empty cell after it and remove the colspan attribute. Similarly, if a table cell spans 3 rows, I remove the rowspan attribute and insert 2 empty cells below it. This of course changes the table layout, but it's the compromise we have to make to get all the content in the generated PDF. I hope we'll be able to drop this hack soon, when paged.js 5.0 final version will be released.

This message was sent by Atlassian Jira (v9.3.0#930000-sha1:287aeb6)

If image attachments aren't displayed, see this article.