PAPER2WEB: LET'S MAKE YOUR PAPER ALIVE!

Introduction

Traditional academic dissemination relies on static PDFs, which lack interactivity, multimedia integration, and responsive layouts, severely limiting accessibility and engagement. While recent tools like Paper2Poster and PresentAgent advance visual summaries, they sacrifice fine-grained content or fail to leverage rich media.

We introduce PAPER2WEB—a new task transforming full papers into interactive, layout-aware webpages to bridge the gap between scholarly rigor and digital exploration. Our framework addresses the core shortcomings of existing approaches: disordered HTML rendering, static content, and negligible interactivity. We propose PWAGENT, a multi-agent system using a Model Context Protocol (MCP) to enable structured asset parsing, intelligent layout allocation, and iterative refinement. The result is a generation pipeline that achieves high content fidelity, visual balance, and usability—outperforming even human-designed templates in key metrics.

Figure 1: With our work, PAPER2WEB, we constitute the final piece of the puzzle for the presentation and dissemination of academic papers.

Dataset Overview

PAPER2WEB is the first large-scale dataset linking 10,716 peer-reviewed AI papers (2020–2025) from ICML, NeurIPS, ICLR, WWW, and others to their verified official project homepages. Data was collected through automated crawling from paper bodies and code repositories, followed by LLM-assisted relevance scoring and human verification.

Each entry includes metadata (title, authors, venue, year, citation count), semantic category (13 topics, e.g., computer vision, NLP), and website URL. We manually audited 2,000 samples to characterize features: 38% of websites are static, 45% include multimedia like embedded videos, and only 19% offer interactive elements such as toggling curves or clickable data points. The dataset enables quantitative study of modern knowledge dissemination patterns in academia.

Figure 4: To study how to transform static papers into exploratory web pages, we collect the first paperwebpage dataset by crawling across multiple top-tier conferences and filtering by online search and human annotators.

Figure 6: The figure on the right shows the categorization of our data. We divided the dataset into 13 categories and counted items in each. In addition, we show distributions by conference and by year. The top-left panel presents, for each category, the relative proportions of papers without and with a website among papers with low citation counts. The bottom-left panel depicts the distribution of papers without and with a website restricted to highly cited papers (those with over 1,000 citations).

Methodology/Approach

PWAGENT is a multi-agent system that converts scholarly PDFs into interactive webpages through three stages. First, it deconstructs papers into structured assets (text, figures, tables, links) via Docling/Marker + LLM parsing, producing a JSON schema with summaries and metadata.

Second, it ingests these assets into a Model Context Protocol (MCP) server, creating a queryable repository with unique IDs, layout budgets, and relational links. Third, an MLLM orchestrator agents iteratively refines the webpage by: analyzing rendered views, segmenting the output into visual tiles, detecting layout imbalances, and issuing precise tool calls to adjust spacing, emphasis, and alignment.

This loop continues until the visual hierarchy is optimized, ensuring balanced image-text ratios and semantic coherence while preserving core content.

Figure 7: PWAGENT turns papers into project homepages. Papers are deconstructed via Docling/Marker + LLM into multiple assets and stored in an MCP repository. An agent drafts a page, then optimizes until layout and UX are solid.

Results/Evaluation

On the PAPER2WEB benchmark, PWAGENT outperforms all automated baselines. It achieves 28% higher connectivity than arXiv-HTML and 12% gains in completeness on average. Human and MLLM evaluations show PWAGENT ranks highest in interactivity (3.16), aesthetics (3.35), and informativeness (4.31), surpassing even template-aided models.

In PaperQuiz, it scores highest across open and closed-source VLMs, with a 4.04 average (verbal/interpretive). After penalty for verbosity, PWAGENT still leads with a 2.03 score. Efficiency-wise, it costs only $0.025 per site—82% cheaper than GPT-4o ($0.141).

Qualitative comparisons confirm PWAGENT's structural integrity, balanced visuals, and superior visual hierarchy match human-designed sites more closely than any other autogeneration method.

Figure 8: Our evaluation metrics include multiple modules: (1) scoring Connectivity & Completeness by parsing HTML links and structure with image-text balance and information-efficiency priors, (2) using a MLLM/Human Judge to rate interactivity, aesthetics, and informativeness, and (3) running a QA PaperQuiz on webpage screenshots with a verbosity penalty

Table 1: Detailed evaluation of PAPER2WEB across Completeness & Connectivity and Human & MLLM Evaluation.

Table 2: PaperQuiz evaluation on the PAPER2WEB, based on open and closed-source MLLMs. The evaluation metrics include Raw Score and Score with Penalty under two settings: 'Verbatim' and 'Interpretive'.

Applications

PAPER2WEB enables researchers to rapidly transform manuscripts into professional websites that enhance dissemination, collaboration, and impact. Applications include: creating interactive project portals for grant submissions, integrating exploratory visualizations in open science, enabling live demos in educational platforms, and supporting reproducible research by embedding runnable code.

For example, a paper on 3D generation can include an embedded viewer allowing users to rotate, zoom, and inspect models. In teaching, interactive graphs let students explore parameter effects dynamically. For institutions, automated website generation scales outreach across thousands of publications. Researchers can start from PWAGENT outputs and manually enrich them with videos and animations. This streamlines the communication pipeline from paper to public-facing, interactive knowledge platform.

Figure 9

Illustration of website variants for the paper generated by different methods.

Figure 10

Illustration of website variants for the paper 'MLLM-as-a-Judge: Assessing Multimodal LLM-asa-Judge with Vision-Language Benchmark' 3 generated by different methods.

Figure 11

Illustration of website variants for the paper 'Interactive3D: Create What You Want by Interactive 3D Generation' 5 generated by different methods.

Figure 12

Illustration of website variants for the paper 'SMIRK: 3D Facial Expressions through Analysis-byNeural-Synthesis' 4 generated by different methods.

Figure 13

Illustration of website variants for the paper 'MVDream: Multi-view Diffusion for 3D Generation' 7 generated by different methods.

Figure 14

Illustration of website variants for the paper 'Masked Audio Generation using a Single NonAutoregressive Transformer' 6 generated by different methods.

Figure 15

Illustration of website variants for the paper “MVDream: Multi-view Diffusion for 3D Genera tion”7generated by different methods.

1 / 6

Resources & External Links

GitHub Repository

Official GitHub repository for PAPER2WEB, hosting source code and implementation details.

Visit Repository

arXiv Paper

arXiv version of the paper, providing the full manuscript and latest updates.

Read Paper

ONE Lab

ONE Lab homepage, showcasing affiliated research projects and team members.

Visit Lab

PAPER2WEB: LET'S MAKE YOUR PAPER ALIVE!

Traditional Academic Dissemination

Static PDF

PAPER2WEB

Web Dissemination

Introduction

Dataset Overview

Methodology/Approach

Results/Evaluation