Update to WordPress Posts to PDF – Code Blocks

At the back end of last year I released a script that allowed you to export your WordPress posts to a nicely formatted PDF file. In the last update, I said that I wanted to add proper formatting for code blocks and today I have done that. You can find it on the GitHub page.

This proved to be more difficult than I had anticipated. The code uses the FPDF package and extends out the “Links and flowing text” example provided. I knew I needed to convert all <pre></pre> blocks to be displayed as a fixed-width font so I amended fpdfexts.php to look for PRE tags and switch the font family and text size to Courier 10pt on opening and then back to Arial 12pt on closing. This was all straightforward enough and worked but the code block was displayed without any line breaks.

What I needed was to replace all the line breaks with break tags (<br>) but only in between the PRE tags so I got ChatGPT to write me a function to do just that:

function html_entity_decode_exclude_pre($html) {
        // Step 1: Extract content inside <pre> tags
        $pre_pattern = '/<pre.*?>(.*?)<\/pre>/is';
        preg_match_all($pre_pattern, $html, $pre_matches);
        
        // Step 2: Replace the <pre> content with placeholders and handle \r to \n replacement
        $placeholders = [];
        foreach ($pre_matches[0] as $index => $pre_block) {
            $placeholder = "%%%PRE_PLACEHOLDER_{$index}%%%";
            $pre_block_modified = html_entity_decode(str_replace(chr(10), "<br>", $pre_block), ENT_QUOTES | ENT_HTML401, 'UTF-8');
            $placeholders[$placeholder] = $pre_block_modified;
            $html = str_replace($pre_block, $placeholder, $html);
        }
        
        // Step 3: Decode HTML entities in the remaining content
        $html = html_entity_decode($html);
    
        // Step 4: Replace placeholders with the original (modified) <pre> content
        foreach ($placeholders as $placeholder => $pre_block) {
            $html = str_replace($placeholder, $pre_block, $html);
        }
        
        return $html;
    }

This seemed to do the trick and now we have code blocks in the output. Obviously, there’s no code formatting but it is a great improvement on what was there before. The next challenge is to see if I can get in-line images to be displayed. Watch this space.

Leave a Reply

Your email address will not be published. Required fields are marked *