Data Education - Systems Modeling and Data Analytics Core

The primary goal of the SMDA Data Education working group is to develop new introductory coursework for scientists and clinicians with no training in statistics or computational methods. This coursework is tailored to specific domains, building on undergraduate training and the core datasets that underpin each discipline. This domain-specific approach is crucial for making statistical concepts more intuitively accessible and, therefore, easier to master.

Our approach follows a 4-step procedure that we believe to be generalizable to any discipline-specific data-science course:

Map Data-sets that are relevant to trainees
Map Software tools that enable trainees to answer standard questions with #1’s datasets
Reinforce basic statistics training using #1’s datasets
Teach basic coding skills using #1’s datasets

Mapping Datasets and Tools

Information overload is one of the most significant challenges facing scientists and clinicians in our new “big data world.” Whether they have computational training or not, most investigators are unaware of the breadth of data relevant to their specific discipline. Understanding which datasets exist (or do not exist) is a critical prerequisite to

Find data relevant to a question
Applying tools to answer a question

Most large-scale biomedical databases have (1) internal rules guiding how their data is organized and (2) graphic user interfaces on their websites to enable a general audience to use their data for commonly asked questions. Each of these databases is analogous to a different country with its own laws and guidebooks. As a result, a key goal of the SMDA Data Education core is to build “world maps” that outline major biomedical datasets and provide tutorials on using their pre-existing software tools. Overall, ~70% of standard biomedical data analytics can be accomplished without any computational or coding training; it simply requires awareness of the available datasets and the software tools that enable you to ask your own research questions. SMDA’s primary goal is increasing this awareness.

Equipping Researchers with Data Science Tools

While most standard questions can be answered using pre-existing software tools, approximately 30% of biomedical data questions require more advanced statistical or computational methods that are not accessible to most scientists and clinicians. As a result, there is an increasing need to improve both statistical and computational training for these professionals. Fortunately, advanced training in these areas is rarely necessary. Instead, a basic foundation in common statistical tests and programming for data analytics is sufficient. Often, this involves revisiting and reinforcing statistical concepts from undergraduate coursework within the context of relevant biomedical datasets (e.g., when to use a T-test). Additionally, learning basic coding skills is crucial for applying these statistical methods at scale (e.g., performing a T-test 20,000 times in one minute).

Tactical Priorities

Train Scientists and Clinicians:
- available datasets
- available data-tools
- basic statistics
- programming for data analytics
- machine learning
Build Collaborations: to improve biomedical data-science education
- educational researchers
- data scientists
- clinicians
- scientists
Build Data- and Tool repositories:
- Graphical Maps outlining available datasets
- Lists of most important software tools
- Cleaned standardizes datasets
- Code for Standard Operating Procedures in commonly used languages (R, Python, etc.)

Current Members

Eugene Douglass
Fred Maier
Kimberly Van Orman
Russ Palmer
Katie Smith
Anthony Roberto

Discipline-specific Data-science Pages:

-Omic Datasets in Cancer Research
- Data-Maps
- Software Tools
- Cleaned Datasets
- Clean Code for SOP’s

……

…

Key Performance Indicators: summary statistics

KPI Category	KPI	Current Value
Research and Publications	Number of Published Papers
	Impact Factor of Journals
	Citations
	Conference Presentations
Collaboration and Engagement	Interdisciplinary Projects
	External Collaborations
	Collaborative Publications	%
	Workshops and Seminars
Funding and Grants	Research Grants Received	$
	Grant Applications Submitted
	Grant Success Rate	%
Data and Tools	Datasets Published
	Software Tools Developed
	Tool Adoption	downloads
Training and Development	Students Supervised
	Training Programs
	Skill Development	certificates
Impact and Outreach	Societal Impact
	Media Mentions
	Public Engagement	events
Operational Efficiency	Project Completion Rate
	Data Management Practices	% compliance
	Resource Utilization	% efficiency
Innovation and Excellence	Awards and Recognitions
Innovation and Excellence	Innovative Solutions	breakthroughs
Feedback and Improvement	Stakeholder Feedback	satisfaction
Feedback and Improvement	Continuous Improvement	# iterations

Key Performance Indicators: specific items list

Protected Content

Research and Publications
- Number of Published Papers
- Impact Factor of Journals
- Citations
- Conference Presentations
Collaboration and Engagement
- Interdisciplinary Projects
- External Collaborations
- Collaborative Publications
- Workshops and Seminars
Funding and Grants
- Research Grants Received
- Grant Applications Submitted
- Grant Success Rate
Data and Tools
- Datasets Published
- Software Tools Developed
- Tool Adoption
Training and Development
- Students Supervised
- Training Programs
- Skill Development
Impact and Outreach
- Societal Impact
- Media Mentions
- Public Engagement
Operational Efficiency
- Project Completion Rate
- Data Management Practices
- Resource Utilization
Innovation and Excellence
- Awards and Recognitions
- Innovative Solutions
Feedback and Improvement
- Stakeholder Feedback
- Continuous Improvement

…