Given the explosion of data production, storage capabilities, communications technologies, computational power, and supporting infrastructure, data science is now recognized as a highly-critical growth area with impact across many sectors including science, government, finance, health care, manufacturing, advertising, retail, and others. Since data science technologies are being leveraged to drive crucial decision making, it is of paramount importance to be able to measure the performance of these technologies and to correctly interpret their output. The NIST Information Technology Laboratory is forming a cross-cutting data science program focused on driving advancements in data science through system benchmarking and rigorous measurement science.
Understanding the Data Science Technical Landscape:
- Primary challenges in and technical approaches to complex workflow components of Big Data systems, including ETL, lifecycle management, analytics, visualization & human-system interaction.
- Major forms of analytics employed in data science.
- Generation of ground truth for large datasets and performance measurement with limited or no ground truth.
- Methods to measure the performance of data analytic workflows where there are multiple subcomponents, decision points, and human interactions.
- Methods to measure the flow of uncertainty across complex data analytic systems.
- Approaches to formally characterizing end-to-end analytic workflows.
- Useful properties for data science reference datasets.
- Leveraging simulated data in data science research.
- Efficient approaches to sharing research data.