Feb 29, 2024 11 min read AI

Kotahi Unveils AI PDF Designer: A New Era for PDF Production

Today, we introduce the Kotahi AI PDF Designer, a tool that re-imagines PDF production from the ground up. We believe this tool is the first of its kind.

The Kotahi AI PDF Designer, transforms PDF design into a straightforward, interactive process. Click on any part of your article—like text or images—and simply tell the AI what you'd like to change, for example, "make this text green and bigger." The AI instantly understands and applies your design choices, letting you see the effects right away. You can then choose to render the PDF with each change or on-demand. This means you can quickly adjust and refine the look of your PDF, ensuring it's both beautiful and easy to read, without needing to know complex design software.

0:00

/1:54

AI PDF Designer Demo

The possibilities extend far beyond the basics. For instance, you can use the tool to inquire about the current CSS attributes and values in use, providing a deeper insight into the CSS framework at play (more on this below). This feature empowers users not only to apply styles but also to gain a comprehensive understanding of the CSS architecture underlying their projects, enhancing both their design and technical skills.

0:00

/1:19

Using PagedJS functions for page control

Capabilities

It is early days for this tool and we are discovering many interesting possibilities as well as work we need to do to overcome some initial challenges. At present the list of capabilities for the tool include:

Conduct Design Conversations: The system is designed to facilitate discussions centered around article design, allowing users to specify and achieve the desired aesthetic and functional outcomes.
Multiple Ways to Style Elements: This can be accomplished in various ways, such as specifying color changes for distinct parts of the text, e.g., "Color the 4th paragraph green." Clicking on elements directly, or using common names like 'title' to apply styles, e.g., "Make the title sans serif," enables precise customization. It is also possible to apply styles with one prompt across the entire document using the article as the context.
Style Based on Reference: Users can command the system to "Make it look like a chemistry article," guiding the overall design to match a referenced style, ensuring a cohesive look throughout the document.
Page Element Styling: The system supports the application of page-based rules to style page elements such as running headers, footers, and page numbering, including specifications for bleed, e.g., "Put the title in the header."
Context Preservation: The AI system is designed to maintain a separate chat history for each specific context within a document. This means if you make multiple adjustments to a paragraph and later modify the title style, upon returning to the original paragraph, the AI will present the chat history relevant to that paragraph. This feature ensures that users can track the evolution of their design choices for each element individually.
Feedback on Changes: Post-application of style changes, the AI assistant gives detailed feedback on the styles that have been altered, enhancing user understanding of the impact of their requests and confirming that the changes have been applied. For example, if a change is applied on the title color, the AI will respond with something like "the title color has been changed to [color]".
Clarification Requests for Unclear Queries: To ensure the accuracy and relevance of style changes, the AI assistant may ask for more details in response to ambiguous requests. For example, if user asks "I want the background to look to like my office carpet", the AI will respond with something like "what is the color of your office carpet?"
Inquire About Existing Style Details: Users can inquire about the current styling details, from expert-level questions like "What is the inner page margin?" to more basic queries such as "Why is there space on each page?" The assistant is equipped to provide this information.
Enable/Disable Live Preview: Live preview can be toggled on or off, providing users with the flexibility to instantly see the effects of their design choices or to work without real-time visual feedback.
Save as PDF: Documents can be easily saved in PDF format, allowing for secure sharing and printing.
Hide/Show Chat: Users have the option to hide or show the chat interface, depending on their need for assistance or preference for a cleaner workspace.
Multiple Views: The platform supports various viewing modes, catering to different work methods of the design process or user preferences. It is possible, for example, to use the live preview rendering of the PDF along with the prompt to have 'realtime' style application direct to the PDF.

0:00

/1:34

The AI PDF Designer is brand new, we expect to bring it to more Coko products soon including for book production (Ketty) and document creation (CokoDocs). My thanks to Mathias Romeo for his work on the designer, and thanks to Julien Taquet for the Paged.js help.

Currently Tailored for Teams without Professional Production Staff

As it stands now, the Kotahi AI PDF Designer will probably not be of interest in it's current state for dedicated production teams. The tool's current sweet spot is teams and individuals who seek to refine content generated by templates without the need for professional production expertise. Given the nature of AI and its probabilistic approach to interpreting natural language commands, the tool offers a level of flexibility and accessibility for this group that traditional production tools like Adobe InDesign cannot. However, until natural language interfaces provide the exactness that expert production people want, they are unlikely to find this tooling enticing. Right now, however, I believe the tool is an ideal solution for:

Teams Utilizing Templates for Quick Adjustments: For those who rely on templates but need to make specific tweaks to their documents, the Kotahi AI PDF Designer offers an intuitive way to make those adjustments through simple commands. This is especially useful for scenarios where rapid customization is needed, without deep technical knowledge of design software.
Diamond Open Access (OA) Publishing: A prime example of the tool's utility is in the realm of Diamond OA publishing, where resources for professional typesetting and design may be limited. The Kotahi AI PDF Designer enables these publishers to produce well-designed PDFs that elevate the presentation of their content, making it more accessible and engaging for readers and (especially important for academia) help the readers take the content 'more seriously'.
Print on Demand (POD) Scenarios: For books and other materials that benefit from POD models, the tool facilitates the generation and fine-tuning of PDFs using Paged.js templates. This capability is particularly valuable for self-publishers creating visually appealing and properly formatted print materials for print on demand services themselves.
Supporting Researcher-Led Publishing: The primary target of tools like Overleaf has been to simplify the creation of visually appealing preprint PDFs. Its a large market segment (relative to academic literature). Kotahi is also targeting this segment by integrating a new workflow (follow via the Kotahi releases) that simplifies the process for researchers and labs to create and publish articles themselves. In scenarios where labs may lack dedicated production staff, the Kotahi AI PDF Designer can significantly streamline the production of high-quality PDFs suitable for preprint servers or other dissemination channels.

As the tool evolves and improves, its potential applications will expand. Future enhancements aimed at better training the AI to follow style guidelines for specific journals or to adjust rules for page flow decisions based on PDF analysis will make the Kotahi AI PDF Designer even more compelling for production teams. Initially, however, its greatest impact will be felt among teams and individuals without professional production staff, addressing a significant and widespread need within the publishing / researcher ecosystem.

Innovating for All Publishers

Traditionally, PDF design requires significant technical expertise and painstaking effort. Considering the volumes published each year, an immense amount of time and money gets spent on PDF production work across scholarly publishing. Additionally, in many open access models, the costs of manual PDF production is passed onto researchers in the form of Article Processing Charges. As a result, authors not only have to pay to make their work openly accessible, but also fund the resource-intensive overhead to generate basic formats.

Unfortunately, there has not been much effort in the open infrastructure domain to solve this problem. For example, most OJS journals produce PDF by exporting PDF from MS Word. The side-effect is that such PDFs, including many Diamond OA content, look raw and inaccessible. Well-designed content, including PDFs, are vital for effective scholarly communication. Like many, we believe quality presentation demonstrates respect for readers and subjects alike.

At Coko, we have made addressing production challenges a core part of our mission from the very start. In 2022 we added an automated production system to Kotahi enabling anyone to generate beautiful PDFs, valid JATS, and semantic HTML. No manual typesetting expertise is required.

Now the Kotahi AI PDF Designer takes automated PDF production one step further. Rather than demanding users tweak code, we have focused on even more intuitive natural language controls. By allowing plain English prompts like “make all headers bold” our system handles translating instructions into precise style manipulations automatically.

The goal is to make professional design achievable for publishers of all sizes and budgets - from independent diamond open access journals through to major mega-journals. Kotahi’s AI PDF Designer lets all levels create production-ready designs quickly, affordably, and without deep desktop publishing expertise. This tool has potential not just to save costs and resources industry-wide, but to empower emerging publishing models evolving out of the preprint ecosystem. New researcher-led models can leverage automation to easily produce beautiful, accessible PDFs optimized for both print and web without extensive production overhead.

Notably, although Kotahi is an end-to-end scholarly communications platform, it could also be used as a production-only tool (foregoing the submission/review workflows) to manage PDF production queues - complementing any existing technology choices.

Now, a bit on how this works under the hood:

Community Led Innovation: Paged.js and the HTML-first Philosophy

At the heart of the Kotahi AI PDF Designer is an HTML-first approach.

"HTML-first" prioritizes using HTML as the primary format for creating and manipulating digital content, contrasting with XML-first approaches or systems relying on document attachments like MS Word files. This strategy focuses on leveraging web standards for content structure, presentation, and behavior from the outset, facilitating a more seamless integration with web technologies and digital publishing workflows. It offers greater flexibility, accessibility, and efficiency in content creation and distribution compared to legacy scholarly publishing methods.

In a pivotal 2018 workshop organised by Coko (outlined on PagedMedia.org), we brought together a critical mass of innovators and experts from the open infrastructure domain dedicated to advancing HTML-first publishing technologies.

Arthur Attwell and other luminaries at the Boston meeting.

In the true spirit of community, inspired individuals and organizations came together not only to talk, but to contribute. Coko, seeing the immense potential in this united effort, provided vital financial and organizational backing that enabled the next leap forward. This collaborative environment set the stage over time for the development of Paged.js, an entirely open-source typesetting engine that adheres to the W3C Paged Media standard.

For more information on the history of Paged.js and a very interesting discussion on why it is an important approach, please see Julie Blanc's recently published thesis on the topic.

Paged.JS

Paged.js is a 100% open source typesetting engine designed for creating print-ready PDFs using web technologies. It allows users to apply complex document layouts and styles through CSS, directly in the browser, transforming HTML content into professionally formatted print documents. This innovative approach bridges web publishing and traditional print production, offering flexibility and efficiency in generating high-quality printed materials.

Unlike any of its predecessors, Paged.js introduces a 'preview' mode within the browser, allowing for real-time adjustments and instant visualization of the final PDF. For detailed insights, visit Paged.js documentation.

Paged.js's impact extends far beyond its initial conception; it is now utilized in a variety of products and projects, including RStudio/PageDown, PubPub, Pandoc, PanWriter, Hederis, OpenQuire (Getty), Lulu (coming shortly), the Louvre, Kotahi, Ketty, CokoDocs among many others. I have also written about how we have been using Paged.js to create highly structured, beautiful, textbooks with the Open Education Network.

We've also successfully automated PDF production for 80% of eLife-reviewed preprints using Paged.js, striving for 100%. Challenges like identifying specific image types for appropriate treatment remain, but we're actively working to resolve these and further streamline the process.

Example eLife PDF automated with Pagedjs

Paged.js' 100% open-source nature ensures that it remains accessible to all, fostering innovation and collaboration across the publishing industry.

How the AI PDF Designer Works

For many years Paged.js PDF design was done by manually writing CSS. Indeed, the Kotahi PDF designer interfaces enable just this.

Now the Kotahi AI PDF Designer (beta) allows us to interpret natural language commands to style PDF.

It works like this - in the Kotahi AI PDF Designer, we utilize Wax, our advanced HTML word processor, to display articles in the left pane (without the editor toolbar displayed). This allows users to interact (click on elements) with the document as if it were a static article, while behind the scenes, Wax manages the insertion of element IDs into the document, essential for identifying elements for targeted CSS changes.

We then use AI to process these user selections together with natural language commands provided by user prompts. The commands are applied directly to the CSS that Paged.js uses to render PDF.

PDF previews can be generated after each change or on demand, providing a flexible and user-friendly experience in document design. This process ensures that users can immediately see the impact of their design choices, greatly enhancing the document creation workflow. The PDF can then be printed directly utilizing the browser print engine.

We realized early on that an HTML-first approach represented the future, even if it wasn't a popular direction yet within scholarly publishing. The real technical innovations were happening on the web, not in aging formats like XML or Word. By embracing modern web standards, we could position ourselves to capitalize on emerging capabilities like AI to drive new methods of document creation. Now, we are able to harness those web-powered advancements to bring unprecedented automation to journal layout and production through tools like the Kotahi AI PDF Designer. Had we clung to legacy systems, we would have missed this opportunity. Our investment in web technologies has positioned us advantageously to capitalize on the advancements and opportunities they bring.

Next on the RoadMap

Now we have the designer running we're committed to enhancing it with a roadmap full of exciting features. We have a lot of work ahead of us to improve the basic beta functionality and make it bulletproof. But there are additional features we have started designing. We are, for example, working out how to merge editing capabilities with the AI design functions, offering production teams a unified interface for document editing and design. We are also thinking about how the AI designer could automate the last 20% of the eLife PDF production pipeline mentioned above (which would be of great value to every journal publisher). Working out how to integrate the tool with AI vision capabilities for 'auto adjusting' output is of obvious interest. This is just the beginning; we have a wealth of other ideas we're eager to explore and implement.

Where This Could Go: An Open Invitation

We built Kotahi guided by the real needs of emergent new researcher-led publishing models, publishing service providers, and journals. It serves as an evolving nexus – exchanging insights daily as developers, publishers, and partners collaborate. This open collaboration stays aligned with the shared values of democratizing access to publishing tools and putting into practice the much-discussed vision of shared infrastructure for disseminating scholarship.

By sharing ideas (like this article) and code freely, we welcome fellow pioneers to build alongside us in pushing publishing forward. Kotahi’s true test is scaling access to a best-of-class tool not just to elite circles but across the entire scholarly communications spectrum and we want to do it with you.

If you would like to innovate with us, then please don't hesitate to reach out for a friendly chat (adam@coko.foundation).

The development is supported by NGI Zero. We are entirely grateful to Michiel Leenaars, NLnet and NGI Zero.

The demo article used in the videos is here: https://doi.org/10.7554/eLife.89465.3