Structured XML in the JATS (Journal Article Tag Suite) format has long been a challenge for the scholarly publishing industry. The process of tagging content to meet the JATS specification has typically been expensive, requiring either dedicated internal staff at publishers or outsourcing the work to external vendors. Both options are slow and costly.
Kotahi takes a different approach to JATS production that brings the cost down significantly for all publishers. No expertise is required to produce publication-ready JATS using Kotahi's system. The approach can be summarized in three key steps:
- Metadata Capture
Customizable submission forms allow article metadata to be captured, with each form element assigned a unique identifier (e.g. submission.title). This facilitates structured data collection.
- Manuscript Markup
Manuscripts are submitted in HTML format, then editors can visually markup the document to tag sections as 'abstract', 'keywords', etc using simple tools.
- Automated JATS Population
The enriched manuscript content is automatically populated into a JATS XML template containing the required structure. Data elements from submission forms and manual markup fill the relevant places in the template.
This approach delivers baseline JATS output easily. Additional manuscript markup can further enhance the JATS as needed and the custom forms enable nested metadata to also be integrated into the structured output.
We are delighted to see that OJS has recently implemented similar functionality for automated JATS exports, inspired by the innovations in this area pioneered on Kotahi. Open sharing of ideas across open source publishing platforms allows breakthroughs to spread widely, benefiting the whole community. While the Kotahi approach includes more advanced features (at the time of writing OJS features are beta and do not include document ingestion or semantic mark up tools), it is wonderful to see the core concepts around simplifying JATS productivity being propagated.
The pioneering work on Kotahi has unlocked automated structured XML workflows, eliminating historical bottlenecks around JATS production. Automated output of publication-ready, valid JATS XML is now practical for any open source publishing platform, and any publisher, and we are delighted to bring this innovation to the market and also to see the approach being adopted by other platforms.
The entire codebase for this is open source, further it is actually possible to use Kotahi as a production workflow engine only, forgoing any of the other features.
How We Got There: An Innovation on-top of Innovations
This end-to-end JATS automation is not the result of a single breakthrough, but rather the combination of several pioneering innovations we have been focused on for years. Delivering this new approach required building all the component parts first, each involving substantive new developments:
- HTML-first manuscript ingestion.
We needed to build systems to ingest author manuscripts into the system as HTML converted from MS Word. This enables downstream editing and semantic enrichment. Our DOCX to HTML converter, XSweet, interpolates document structure for scholarly content far better than popular, generic, conversion tools. Maintaining integrity of semantic intent through conversions is crucial for publisher workflows.
- In-browser HTML editor.
We had to build a customisable scholarly word processor. Wax, enables visual editing and formatting of HTML articles directly in the browser. This facilitates both authoring and content enrichment post-submission.
- Drag and drop markup tools.
We extended Wax with new drag and drop tools for visually highlighting semantic elements in manuscripts, like figures, tables, abstracts etc. This simplifies the process of tagging content (requiring no JATS expertise). Further, these 'tagged areas' needed to be addressable as metadata inserts into the JATS templates. In other-words, we effectively had to develop an API for manuscript content.
- Customizable submission system.
We needed to build a sophisticated submission form builder that enables unique identifiers to be assigned to all metadata fields (some special components for nested data had to be developed eg author information). The captured submission data can then be propagated across the system (eg into JATS templates).
- Validation and exports engine.
Finally we developed an automated system for injecting content from submission forms and visually tagged manuscripts into JATS templates that validate on export. This delivers structured XML absent of errors.
And of course...the entire system had to be designed and engineered to be a HTML-first platform (Kotahi is the first HTML-first scholarly publishing platform of it's kind) ...otherwise all this work would have 'nowhere to go'.
By combining separate innovations like high fidelity HTML manuscript ingestion, simplified markup tools, and configurable data models (submission forms), and building this on-top of a modern scholarly publishing platform, we have shown how the entire JATS production process can be revolutionized. We are excited this 8 year long (largely unfunded) journey has resulted in such an impactful outcome for publishers and editors (of course if it had been funded we could have done it a lot quicker!).
What Innovation Really Feels Like
It’s easy to view “innovations” as overnight breakthroughs that instantly solve problems. You build it, everyone cheers and someone hands you an industry award. But the reality is often far different - long, winding journeys requiring critical thinking, experimentation and unwavering commitment to smaller innovations that collectively build breakthrough solutions over time.
Innovation mostly isn’t a magical “eureka moment” that instantly convinces skeptics. It’s often a lonely process of backing ideas no one else believes in yet. Without funding, we have had to, like innovators do everywhere, bootstrap and self-finance creative risks. At Coko we have reinvested surpluses from commissioned (open source) platform development (for the likes of DataCite, HHMI, NCBI, Caltech etc) into innovations like the JATS system. We build open source, to build open source (so to speak).
When we first proposed the use of semantic HTML manuscripts and submission systems as an alternative to dedicated 'JATS editors', our ideas were met with significant skepticism. However, as the industry evolved, the limitations of costly, failed JATS editors became apparent. Our method, once seen as unconventional, has now gained recognition and adoption, underscoring the value of proactive innovation in solving complex challenges.
We didn't wait for a solution to 'fall out of the sky' but did the necessary hard work to solve the hard problem, proving that great forward steps can happen that challenge norms with creative thinking and persistence. We're proud this lonely journey resulted in such a transformative outcome that can now benefit publishers everywhere.
Most exciting is that these cutting edge innovations are available not just to large publishers, but to very small and mid-sized publishers too. Kotahi is, after-all, 100% open source.
© Adam Hyde, 2024, CC-BY-SA
Image public domain, created by MidJourney from prompts by Adam.