ACM FSE 2026 Workshop

International Workshop on Data Intensive Software Engineering (DISE)

July, 5th 2026

A one-day gathering for researchers and practitioners shaping trustworthy, AI-enabled, data-intensive software engineering.

Call for Participation

Important Dates (AoE, UTC-12h)

  • Paper Submission Deadline: February 1, 2026
  • Paper Notification: February 25, 2026
  • Camera-Ready Deadline: April 2, 2026
  • Early Registration Deadline: April 24, 2026
  • Workshop Date: July 5, 2026

Workshop Overview

Workshop Overview

Modern software systems are becoming increasingly data-intensive. They process massive quantities of data and also produce large-scale, machine-generated data, such as traces and telemetry, which is used for AI training, runtime monitoring, log analysis, debugging, and testing. These analytical pipelines are integral to today's AI advancements and are performance-critical, and their behavior is highly dependent on the scale and content of the data they process.

To thrive at this scale, software organizations need intelligent AI agents that can not only process the data but also reason about it within the context of the software. Integrating AI-driven, data-informed feedback loops is essential for autonomous software operations and context-aware guidance to developers. Achieving trustworthy AI-native software engineering demands addressing challenges that directly influence the robustness, reliability, and trustworthiness of data-intensive systems.

Key Challenges

Trustworthy AI-enabled software engineering faces several reinforcing challenges in data-intensive environments:

  • Data-dependent uncertainty. The dynamic nature of large-scale, machine-generated data complicates comprehensive testing and debugging, and agentic approaches can amplify diagnosis uncertainty.
  • Massive scale. Operational data overwhelms traditional analysis, demanding systems-level optimization for storage, tracing, and monitoring where critical behaviors manifest only at production scale.
  • Stringent latency requirements. Real-time or near real-time expectations leave little room for offline analysis; long-running workloads make techniques like fuzzing difficult to apply at scale.
  • Robustness under evolving conditions. Shifting data distributions and infrastructure conditions require AI techniques that remain stable despite drift, adversarial inputs, and noisy telemetry.
  • AI tailored workloads. The classical principles of correctness, validation, and scale do not hold when such data pipelines are utilized for crucial training data preparation and wrangling phases.

Workshop Goals

  • Provide a forum for advancing software engineering research around data-intensive development, debugging, monitoring, and testing.
  • Highlight AI-driven, data-informed feedback loops that enable autonomous operations and context-aware guidance.
  • Launch a community consortium dedicated to curating and maintaining industry-scale synthetic datasets, benchmarks, and bug corpora.

Topics of Interest

Topics of Interest

We invite discussions that explore fundamental ideas, practical solutions, and cross-domain innovations, including but not limited to:

  • Data-intensive software testing, debugging, runtime monitoring, and log analytics
  • Use of data-intensive software in AI training and inference phases
  • Semantics lifting from systems-generated data
  • Modelling application behavior via data-system coordination
  • AI, ML, and agentic approaches for monitoring, debugging, and testing systems-generated data
  • Algorithms and foundations for testing and debugging performance-critical systems
  • Benchmarks for testing, debugging, and analysis of systems-generated data

Workshop Program

Morning Session I (8:45 – 10:30)

Keynote I

Dr. Dongmei Zhang
Distinguished Scientist and Deputy Managing Director, Microsoft Research Asia


Paper Presentations & Invited Talk

Short Papers

  1. A Holistic Risk Calculus for Microservice Systems using Multi-Source Signal Fusion
    Shakthi Weerasinghe, Tomas Cerny (University of Arizona)
  2. Cloud Intelligence/AIOps 2.0: Knowledge-Anchored Agentic AIOps
    Dongmei Zhang, Qingwei Lin, Si Qin, Liqun Li, Lianbin Chi, Dawei Song, Biao Cheng, Chaoyun Zhang, Yingnong Dang, Samia Khalid, Saravan Rajmohan, Sitaram Lanka (Microsoft)
  3. Freezing the Crime Scene: A State Snapshot Paradigm for Reproducible Agentic SRE Evaluation
    Guangba Yu, Yilun Wang, Michael R. Lyu (The Chinese University of Hong Kong)

Poster

  1. Why Property-Based Testing is Necessary for Data Intensive Scalable Computing
    Yaoxuan Wu, Ingrid Lee, Miryung Kim (UCLA); Ahmad Humayun, Muhammad Ali Gulzar (Virginia Tech)
Coffee Break (10:30 – 11:00)

Morning Session II (11:00 – 12:30)

Overview of Breakout Group Topics + Panel

What are the research challenges in this space that we as a community should be working on?

Panelists: Lionel Briand, Michael Lyu, Dongmei Zhang, Chao Peng
Moderator: Miryung Kim


Breakout Session

Participants divide into focused breakout rooms organized around themes distilled from the morning program, exploring shared challenges, synergies, and actionable ideas.


Summary and Discussion

Each breakout group shares key takeaways for collective reflection.

Lunch (12:30 – 14:00)

Afternoon Session I (14:00 – 15:30)

Keynote II

Chao Peng
Principal Research Scientist, ByteDance; Program Co-Chair, AIware


Long Paper Presentations

  1. PerfGen: Automated Performance Benchmark Generation for Big Data Analytics
    Jiyuan Wang (Tulane); Jason Teoh, Miryung Kim (UCLA); Muhammad Ali Gulzar (Virginia Tech); Qian Zhang (UC Riverside)
  2. AI-NativeBench: An Open-Source White-Box Agentic Benchmark Suite for AI-Native Systems
    Zirui Wang (Sun Yat-sen University); Guangba Yu, Michael R. Lyu (The Chinese University of Hong Kong)
  3. GADM-Oracle: A Domain-Adaptive Oracle for Detecting "Data Bugs" in Enterprise Data Stacks
    Shakthi Weerasinghe, Amr S. Abdelfattah, Tomas Cerny (University of Arizona)
Coffee Break (15:30 – 16:00)

Afternoon Session II (16:00 – 17:30)

Re-group

Participants identify where they would like to continue their discussions.


Breakout Sessions

Participants divide into focused groups continuing to explore shared challenges and actionable ideas with an eye toward a community-driven article.


Panel + Summary

Panelists and breakout group leads reconvene to share outcomes and consolidate key insights.


Collective Writing

Groups contribute to a shared writing template covering challenges in data-intensive software engineering, the current landscape, and a roadmap for future research directions.

Breakout Themes

Anticipated working groups emphasize sustained collaboration and tangible outputs:

Data-Intensive Benchmarks, Bugs, and Oracles

Design and maintain realistic benchmarks, bug corpora, and domain-specific oracles that capture the diversity of contemporary data stacks.

Runtime Monitoring for Data-Intensive Software

Reinvent observability for heterogeneous, AI-augmented systems spanning telemetry pipelines, instrumentation, and fine-grained failure attribution.

Debugging and Testing Data-Intensive Systems

Address reproduction challenges, isolate root causes, and improve coverage in data-rich environments with emergent behaviors.

Non-Functional Quality Attributes

Ensure security, privacy, scalability, and resilience across shifting workloads and operating conditions.

AI Agents in Data-Intensive Pipelines

Understand and standardize how AI agents consume and generate telemetry, including decision logs and monitoring strategies.

Data-Intensive Pipelines For AI

Understand and standardize how data-intensive software evolves for AI training and inference needs at scale

Call for Papers

We welcome multiple submission types that reflect the maturity and goals of your contribution:

Position Statements

Up to 2 pages, including references highlighting early-stage ideas, industrial perspectives, or proposals that feed into the community report.

Short Papers

Up to 4 pages, plus 1 additional page for references presenting early visions, experience notes, or framework sketches that invite feedback.

Full Papers

Up to 10 pages, plus 2 additional pages for references documenting novel approaches, frameworks, or evaluations grounded in data-intensive software engineering.

arXiv Presentations

Recent preprints offered for open discussion in a non-archival format to spark collaboration.

Important Dates (AoE, UTC-12h)

  • Paper Submission Deadline: February 1, 2026
  • Paper Notification: February 25, 2026
  • Camera-Ready Deadline: April 2, 2026
  • Early Registration Deadline: April 24, 2026

Submission notes: Submissions must adhere to the FSE 2026 two-column industry track format. Detailed formatting guidelines can be found at the FSE 2026 – How to Submit. Submissions will undergo a double-blind review process by the program committee members. Submissions must be original at the time of submission and must be uploaded via HotCRP. At least one author of each accepted paper must register and present at the workshop.

All accepted papers, except for position statements and arXiv presentations, will appear in the FSE 2026 workshop proceedings by default. Non-archival papers and position statements will be shared via the workshop website. Regardless of track, submissions must be unpublished elsewhere, and at least one author must attend and present.

Authors of accepted position statements will collaborate after the workshop to synthesize a community article—targeting venues such as IEEE Software— that captures shared insights and future directions. We are also planning to award a spotlight award for the best emerging ideas on data intensive software engineering.

Organizers

The workshop is coordinated by an international team with extensive experience organizing, chairing, and hosting major conferences and workshops.

Prof. Lionel Briand
Prof. Lionel Briand

Lero Centre, University College Dublin

lbriand@uottawa.ca

Prof. Tse-Hsun (Peter) Chen
Prof. Tse-Hsun (Peter) Chen

Concordia University

peterc@encs.concordia.ca

Prof. Muhammad Ali Gulzar
Prof. Muhammad Ali Gulzar

Virginia Tech

gulzar@cs.vt.edu

Prof. Yintong Huo
Prof. Yintong Huo

Singapore Management University

ythuo@smu.edu.sg

Prof. Miryung Kim
Prof. Miryung Kim

UCLA

miryung@cs.ucla.edu

Prof. Michael Lyu
Prof. Michael Lyu

The Chinese University of Hong Kong

lyu@cse.cuhk.edu.hk

Prof. Weiyi Shang
Prof. Weiyi Shang

University of Waterloo

wshang@uwaterloo.ca

Program Committee

Program Committee

  • Hamid Bagheri, University of Nebraska-Lincoln
  • Lionel Briand, Lero Centre, University College Dublin
  • Tse-Hsun (Peter) Chen, Concordia University
  • Muhammad Ali Gulzar, Virginia Tech
  • Pinjia He, The Chinese University of Hong Kong, Shenzhen
  • Miryung Kim, UCLA
  • Odej Kao, TU Berlin
  • Burcu Kulahcioglu Ozkan, Delft University of Technology
  • Yiling Lou, UIUC
  • Michael Lyu, The Chinese University of Hong Kong
  • Manuel Rigger, National University of Singapore
  • Weiyi Shang, University of Waterloo
  • Yintong Huo, Singapore Management University