DISE 2026

Workshop Overview

Modern software systems are becoming increasingly data-intensive. They process massive quantities of data and also produce large-scale, machine-generated data, such as traces and telemetry, which is used for AI training, runtime monitoring, log analysis, debugging, and testing. These analytical pipelines are integral to today's AI advancements and are performance-critical, and their behavior is highly dependent on the scale and content of the data they process.

To thrive at this scale, software organizations need intelligent AI agents that can not only process the data but also reason about it within the context of the software. Integrating AI-driven, data-informed feedback loops is essential for autonomous software operations and context-aware guidance to developers. Achieving trustworthy AI-native software engineering demands addressing challenges that directly influence the robustness, reliability, and trustworthiness of data-intensive systems.

Key Challenges

Trustworthy AI-enabled software engineering faces several reinforcing challenges in data-intensive environments:

Data-dependent uncertainty. The dynamic nature of large-scale, machine-generated data complicates comprehensive testing and debugging, and agentic approaches can amplify diagnosis uncertainty.
Massive scale. Operational data overwhelms traditional analysis, demanding systems-level optimization for storage, tracing, and monitoring where critical behaviors manifest only at production scale.
Stringent latency requirements. Real-time or near real-time expectations leave little room for offline analysis; long-running workloads make techniques like fuzzing difficult to apply at scale.
Robustness under evolving conditions. Shifting data distributions and infrastructure conditions require AI techniques that remain stable despite drift, adversarial inputs, and noisy telemetry.
AI tailored workloads. The classical principles of correctness, validation, and scale do not hold when such data pipelines are utilized for crucial training data preparation and wrangling phases.

Workshop Goals

Provide a forum for advancing software engineering research around data-intensive development, debugging, monitoring, and testing.
Highlight AI-driven, data-informed feedback loops that enable autonomous operations and context-aware guidance.
Launch a community consortium dedicated to curating and maintaining industry-scale synthetic datasets, benchmarks, and bug corpora.

Topics of Interest

We invite discussions that explore fundamental ideas, practical solutions, and cross-domain innovations, including but not limited to:

Data-intensive software testing, debugging, runtime monitoring, and log analytics
Use of data-intensive software in AI training and inference phases
Semantics lifting from systems-generated data
Modelling application behavior via data-system coordination
AI, ML, and agentic approaches for monitoring, debugging, and testing systems-generated data
Algorithms and foundations for testing and debugging performance-critical systems
Benchmarks for testing, debugging, and analysis of systems-generated data

Morning Session I (8:45 – 10:30)

Keynote I

Dr. Dongmei Zhang
Distinguished Scientist and Deputy Managing Director, Microsoft Research Asia

Paper Presentations & Invited Talk

Short Papers

A Holistic Risk Calculus for Microservice Systems using Multi-Source Signal Fusion
Shakthi Weerasinghe, Tomas Cerny (University of Arizona)
Cloud Intelligence/AIOps 2.0: Knowledge-Anchored Agentic AIOps
Dongmei Zhang, Qingwei Lin, Si Qin, Liqun Li, Lianbin Chi, Dawei Song, Biao Cheng, Chaoyun Zhang, Yingnong Dang, Samia Khalid, Saravan Rajmohan, Sitaram Lanka (Microsoft)
Freezing the Crime Scene: A State Snapshot Paradigm for Reproducible Agentic SRE Evaluation
Guangba Yu, Yilun Wang, Michael R. Lyu (The Chinese University of Hong Kong)

Poster

Why Property-Based Testing is Necessary for Data Intensive Scalable Computing
Yaoxuan Wu, Ingrid Lee, Miryung Kim (UCLA); Ahmad Humayun, Muhammad Ali Gulzar (Virginia Tech)

Coffee Break (10:30 – 11:00)

Morning Session II (11:00 – 12:30)

Overview of Breakout Group Topics + Panel

What are the research challenges in this space that we as a community should be working on?

Panelists: Lionel Briand, Michael Lyu, Dongmei Zhang, Chao Peng
Moderator: Miryung Kim

Breakout Session

Participants divide into focused breakout rooms organized around themes distilled from the morning program, exploring shared challenges, synergies, and actionable ideas.

Summary and Discussion

Each breakout group shares key takeaways for collective reflection.

Lunch (12:30 – 14:00)

Afternoon Session I (14:00 – 15:30)

Keynote II

Chao Peng
Principal Research Scientist, ByteDance; Program Co-Chair, AIware

Long Paper Presentations

PerfGen: Automated Performance Benchmark Generation for Big Data Analytics
Jiyuan Wang (Tulane); Jason Teoh, Miryung Kim (UCLA); Muhammad Ali Gulzar (Virginia Tech); Qian Zhang (UC Riverside)
AI-NativeBench: An Open-Source White-Box Agentic Benchmark Suite for AI-Native Systems
Zirui Wang (Sun Yat-sen University); Guangba Yu, Michael R. Lyu (The Chinese University of Hong Kong)
GADM-Oracle: A Domain-Adaptive Oracle for Detecting "Data Bugs" in Enterprise Data Stacks
Shakthi Weerasinghe, Amr S. Abdelfattah, Tomas Cerny (University of Arizona)

Coffee Break (15:30 – 16:00)

Afternoon Session II (16:00 – 17:30)

Re-group

Participants identify where they would like to continue their discussions.

Breakout Sessions

Participants divide into focused groups continuing to explore shared challenges and actionable ideas with an eye toward a community-driven article.

Panel + Summary

Panelists and breakout group leads reconvene to share outcomes and consolidate key insights.

Collective Writing

Groups contribute to a shared writing template covering challenges in data-intensive software engineering, the current landscape, and a roadmap for future research directions.

Data-Intensive Benchmarks, Bugs, and Oracles

Design and maintain realistic benchmarks, bug corpora, and domain-specific oracles that capture the diversity of contemporary data stacks.

Runtime Monitoring for Data-Intensive Software

Reinvent observability for heterogeneous, AI-augmented systems spanning telemetry pipelines, instrumentation, and fine-grained failure attribution.

Debugging and Testing Data-Intensive Systems

Address reproduction challenges, isolate root causes, and improve coverage in data-rich environments with emergent behaviors.

Non-Functional Quality Attributes

Ensure security, privacy, scalability, and resilience across shifting workloads and operating conditions.

AI Agents in Data-Intensive Pipelines

Understand and standardize how AI agents consume and generate telemetry, including decision logs and monitoring strategies.

Data-Intensive Pipelines For AI

Understand and standardize how data-intensive software evolves for AI training and inference needs at scale

We welcome multiple submission types that reflect the maturity and goals of your contribution:

Position Statements

Up to 2 pages, including references highlighting early-stage ideas, industrial perspectives, or proposals that feed into the community report.

Short Papers

Up to 4 pages, plus 1 additional page for references presenting early visions, experience notes, or framework sketches that invite feedback.

Full Papers

Up to 10 pages, plus 2 additional pages for references documenting novel approaches, frameworks, or evaluations grounded in data-intensive software engineering.

arXiv Presentations

Recent preprints offered for open discussion in a non-archival format to spark collaboration.

Important Dates (AoE, UTC-12h)

Paper Submission Deadline: February 1, 2026
Paper Notification: February 25, 2026
Camera-Ready Deadline: April 2, 2026
Early Registration Deadline: April 24, 2026

Submission notes: Submissions must adhere to the FSE 2026 two-column industry track format. Detailed formatting guidelines can be found at the FSE 2026 – How to Submit. Submissions will undergo a double-blind review process by the program committee members. Submissions must be original at the time of submission and must be uploaded via HotCRP. At least one author of each accepted paper must register and present at the workshop.

All accepted papers, except for position statements and arXiv presentations, will appear in the FSE 2026 workshop proceedings by default. Non-archival papers and position statements will be shared via the workshop website. Regardless of track, submissions must be unpublished elsewhere, and at least one author must attend and present.

Authors of accepted position statements will collaborate after the workshop to synthesize a community article—targeting venues such as IEEE Software— that captures shared insights and future directions. We are also planning to award a spotlight award for the best emerging ideas on data intensive software engineering.

Program Committee

Hamid Bagheri, University of Nebraska-Lincoln
Lionel Briand, Lero Centre, University College Dublin
Tse-Hsun (Peter) Chen, Concordia University
Muhammad Ali Gulzar, Virginia Tech
Pinjia He, The Chinese University of Hong Kong, Shenzhen
Miryung Kim, UCLA
Odej Kao, TU Berlin
Burcu Kulahcioglu Ozkan, Delft University of Technology
Yiling Lou, UIUC
Michael Lyu, The Chinese University of Hong Kong
Manuel Rigger, National University of Singapore
Weiyi Shang, University of Waterloo
Yintong Huo, Singapore Management University

International Workshop on Data Intensive Software Engineering (DISE)

Workshop Overview

Workshop Overview

Key Challenges

Workshop Goals

Topics of Interest

Topics of Interest

Workshop Program

Morning Session I (8:45 – 10:30)

Keynote I

Paper Presentations & Invited Talk

Morning Session II (11:00 – 12:30)

Overview of Breakout Group Topics + Panel

Breakout Session

Summary and Discussion

Afternoon Session I (14:00 – 15:30)

Keynote II

Long Paper Presentations

Afternoon Session II (16:00 – 17:30)

Re-group

Breakout Sessions

Panel + Summary

Collective Writing

Breakout Themes

Data-Intensive Benchmarks, Bugs, and Oracles

Runtime Monitoring for Data-Intensive Software

Debugging and Testing Data-Intensive Systems

Non-Functional Quality Attributes

AI Agents in Data-Intensive Pipelines

Data-Intensive Pipelines For AI

Call for Papers

Position Statements

Short Papers

Full Papers

arXiv Presentations

Organizers

Prof. Lionel Briand

Prof. Tse-Hsun (Peter) Chen

Prof. Muhammad Ali Gulzar

Prof. Yintong Huo

Prof. Miryung Kim

Prof. Michael Lyu

Prof. Weiyi Shang

Program Committee

Program Committee