Data Education

The primary goal of the SMDA Data Education working group is to develop new introductory coursework for scientists and clinicians with no training in statistics or computational methods. This coursework is tailored to specific domains, building on undergraduate training and the core datasets that underpin each discipline. This domain-specific approach is crucial for making statistical concepts more intuitively accessible and, therefore, easier to master.

Our approach follows a 4-step procedure that we believe to be generalizable to any discipline-specific data-science course:

  1. Map Data-sets that are relevant to trainees
  2. Map Software tools that enable trainees to answer standard questions with #1’s datasets
  3. Reinforce basic statistics training using #1’s datasets
  4. Teach basic coding skills using #1’s datasets

 

Mapping Datasets and Tools

Information overload is one of the most significant challenges facing scientists and clinicians in our new “big data world.” Whether they have computational training or not, most investigators are unaware of the breadth of data relevant to their specific discipline. Understanding which datasets exist (or do not exist) is a critical prerequisite to

  • Find data relevant to a question
  • Applying tools to answer a question

Most large-scale biomedical databases have (1) internal rules guiding how their data is organized and (2) graphic user interfaces on their websites to enable a general audience to use their data for commonly asked questions. Each of these databases is analogous to a different country with its own laws and guidebooks. As a result, a key goal of the SMDA Data Education core is to build “world maps” that outline major biomedical datasets and provide tutorials on using their pre-existing software tools. Overall, ~70% of standard biomedical data analytics can be accomplished without any computational or coding training; it simply requires awareness of the available datasets and the software tools that enable you to ask your own research questions.   SMDA’s primary goal is increasing this awareness.

Equipping Researchers with Data Science Tools

While most standard questions can be answered using pre-existing software tools, approximately 30% of biomedical data questions require more advanced statistical or computational methods that are not accessible to most scientists and clinicians. As a result, there is an increasing need to improve both statistical and computational training for these professionals. Fortunately, advanced training in these areas is rarely necessary. Instead, a basic foundation in common statistical tests and programming for data analytics is sufficient. Often, this involves revisiting and reinforcing statistical concepts from undergraduate coursework within the context of relevant biomedical datasets (e.g., when to use a T-test). Additionally, learning basic coding skills is crucial for applying these statistical methods at scale (e.g., performing a T-test 20,000 times in one minute).

 

 


Tactical Priorities

  • Train Scientists and Clinicians:
    • available datasets
    • available data-tools
    • basic statistics
    • programming for data analytics
    • machine learning
  • Build Collaborations:  to improve biomedical data-science education
    • educational researchers
    • data scientists
    • clinicians
    • scientists
  • Build Data- and Tool repositories:
    • Graphical Maps outlining available datasets
    • Lists of most important software tools
    • Cleaned standardizes datasets
    • Code for Standard Operating Procedures in commonly used languages (R, Python, etc.)


Current Members

  • Eugene Douglass
  • Fred Maier
  • Kimberly Van Orman
  • Russ Palmer
  • Katie Smith
  • Anthony Roberto

Discipline-specific Data-science Pages:

  • -Omic Datasets in Cancer Research
    • Data-Maps
    • Software Tools
    • Cleaned Datasets
    • Clean Code for SOP’s

 


Key Performance Indicators: summary statistics

 

KPI Category KPI Current Value
Research and Publications Number of Published Papers
Impact Factor of Journals
Citations
Conference Presentations
Collaboration and Engagement Interdisciplinary Projects
External Collaborations
Collaborative Publications %
Workshops and Seminars
Funding and Grants Research Grants Received $
Grant Applications Submitted
Grant Success Rate %
Data and Tools Datasets Published
Software Tools Developed
Tool Adoption downloads
Training and Development Students Supervised
Training Programs
Skill Development certificates
Impact and Outreach Societal Impact
Media Mentions
Public Engagement events
Operational Efficiency Project Completion Rate
Data Management Practices % compliance
Resource Utilization % efficiency
Innovation and Excellence Awards and Recognitions
Innovative Solutions breakthroughs
Feedback and Improvement Stakeholder Feedback satisfaction
Continuous Improvement # iterations

 

 

 


Key Performance Indicators: specific items list

 


Protected Content

  • Research and Publications
    • Number of Published Papers
    • Impact Factor of Journals
    • Citations
    • Conference Presentations
  • Collaboration and Engagement
    • Interdisciplinary Projects
    • External Collaborations
    • Collaborative Publications
    • Workshops and Seminars
  • Funding and Grants
    • Research Grants Received
    • Grant Applications Submitted
    • Grant Success Rate
  • Data and Tools
    • Datasets Published
    • Software Tools Developed
    • Tool Adoption
  • Training and Development
    • Students Supervised
    • Training Programs
    • Skill Development
  • Impact and Outreach
    • Societal Impact
    • Media Mentions
    • Public Engagement
  • Operational Efficiency
    • Project Completion Rate
    • Data Management Practices
    • Resource Utilization
  • Innovation and Excellence
    • Awards and Recognitions
    • Innovative Solutions
  • Feedback and Improvement
    • Stakeholder Feedback
    • Continuous Improvement