DISE 2026

Modern software systems are becoming increasingly data-intensive. They process massive quantities of data and also produce large-scale, machine-generated data, such as traces and telemetry, which is used for AI training, runtime monitoring, log analysis, debugging, and testing. These analytical pipelines are integral to today's AI advancements and are performance-critical, and their behavior is highly dependent on the scale and content of the data they process.

To thrive at this scale, software organizations need intelligent AI agents that can not only process the data but also reason about it within the context of the software. Integrating AI-driven, data-informed feedback loops is essential for autonomous software operations and context-aware guidance to developers. Achieving trustworthy AI-native software engineering demands addressing challenges that directly influence the robustness, reliability, and trustworthiness of data-intensive systems.

Key Challenges

Trustworthy AI-enabled software engineering faces several reinforcing challenges in data-intensive environments:

Data-dependent uncertainty. The dynamic nature of large-scale, machine-generated data complicates comprehensive testing and debugging, and agentic approaches can amplify diagnosis uncertainty.
Massive scale. Operational data overwhelms traditional analysis, demanding systems-level optimization for storage, tracing, and monitoring where critical behaviors manifest only at production scale.
Stringent latency requirements. Real-time or near real-time expectations leave little room for offline analysis; long-running workloads make techniques like fuzzing difficult to apply at scale.
Robustness under evolving conditions. Shifting data distributions and infrastructure conditions require AI techniques that remain stable despite drift, adversarial inputs, and noisy telemetry.
AI tailored workloads. The classical principles of correctness, validation, and scale do not hold when such data pipelines are utilized for crucial training data preparation and wrangling phases.

Workshop Goals

Provide a forum for advancing software engineering research around data-intensive development, debugging, monitoring, and testing.
Highlight AI-driven, data-informed feedback loops that enable autonomous operations and context-aware guidance.
Launch a community consortium dedicated to curating and maintaining industry-scale synthetic datasets, benchmarks, and bug corpora.

We invite discussions that explore fundamental ideas, practical solutions, and cross-domain innovations, including but not limited to:

Data-intensive software testing, debugging, runtime monitoring, and log analytics
Use of data-intensive software in AI training and inference phases
Semantics lifting from systems-generated data
Modelling application behavior via data-system coordination
AI, ML, and agentic approaches for monitoring, debugging, and testing systems-generated data
Algorithms and foundations for testing and debugging performance-critical systems
Benchmarks for testing, debugging, and analysis of systems-generated data

Keynote Session I

Jeromy Carriere, SVP, Technology at Oscar Health

Paper Presentations

Accepted contributions—position ideas, research visions, and experience reports—are presented to surface concrete challenges and seed collaborative themes for the afternoon sessions.

Keynote Session II

Chao Peng, Principal Research Scientist at ByteDance

Breakout Sessions

Participants divide into focused breakout rooms organized around four to five themes distilled from the morning program. Each group explores shared challenges, synergies, and actionable ideas, with an eye toward contributing to a community-driven article.

Round-Table Synthesis

Groups reconvene to share outcomes, debate perspectives, and consolidate an outline for a living article on challenges and reflections in data-intensive software development.

Data-Intensive Benchmarks, Bugs, and Oracles

Design and maintain realistic benchmarks, bug corpora, and domain-specific oracles that capture the diversity of contemporary data stacks.

Runtime Monitoring for Data-Intensive Software

Reinvent observability for heterogeneous, AI-augmented systems spanning telemetry pipelines, instrumentation, and fine-grained failure attribution.

Debugging and Testing Data-Intensive Systems

Address reproduction challenges, isolate root causes, and improve coverage in data-rich environments with emergent behaviors.

Non-Functional Quality Attributes

Ensure security, privacy, scalability, and resilience across shifting workloads and operating conditions.

AI Agents in Data-Intensive Pipelines

Understand and standardize how AI agents consume and generate telemetry, including decision logs and monitoring strategies.

Data-Intensive Pipelines For AI

Understand and standardize how data-intensive software evolves for AI training and inference needs at scale

We welcome multiple submission types that reflect the maturity and goals of your contribution:

Position Statements

Up to 2 pages, including references highlighting early-stage ideas, industrial perspectives, or proposals that feed into the community report.

Short Papers

Up to 4 pages, plus 1 additional page for references presenting early visions, experience notes, or framework sketches that invite feedback.

Full Papers

Up to 10 pages, plus 2 additional pages for references documenting novel approaches, frameworks, or evaluations grounded in data-intensive software engineering.

arXiv Presentations

Recent preprints offered for open discussion in a non-archival format to spark collaboration.

Important Dates (AoE, UTC-12h)

Paper Submission Deadline: February 1, 2026
Paper Notification: February 25, 2026
Camera-Ready Deadline: April 2, 2026
Early Registration Deadline: April 24, 2026

Submission notes: Submissions must adhere to the FSE 2026 two-column industry track format. Detailed formatting guidelines can be found at the FSE 2026 – How to Submit. Submissions will undergo a double-blind review process by the program committee members. Submissions must be original at the time of submission and must be uploaded via HotCRP. At least one author of each accepted paper must register and present at the workshop.

All accepted papers, except for position statements and arXiv presentations, will appear in the FSE 2026 workshop proceedings by default. Non-archival papers and position statements will be shared via the workshop website. Regardless of track, submissions must be unpublished elsewhere, and at least one author must attend and present.

Authors of accepted position statements will collaborate after the workshop to synthesize a community article—targeting venues such as IEEE Software— that captures shared insights and future directions. We are also planning to award a spotlight award for the best emerging ideas on data intensive software engineering.