XDASI: Exploratory Data Analysis and Statistical Inference
NYU Biology (BIOL-GA2030), Fall 2021
Table of contents
- Welcome!
- Course Description
- Instructors
- Meetings
- Grading
- Course Websites and Slack
- Textbook and Reference Materials
- Academic Integrity
- Religious Observance
- Diversity and Inclusion
- Accommodations for Students with Disabilities
Welcome!
We are very happy to welcome you to our biostatistics course, BIOL-GA2030, which we informally call “Exploratory Data Analysis and Statistical Inference (XDASI)”.
Course Description
This course will provide introductory theory and hands-on training in exploratory data analysis and statistics for graduate students in biology. Students will learn basic R programming as well as foundational concepts and practical tools that provide a starting point for further advanced study in genomics, bioinformatics and computational biology.
More
The course will cover both classical and modern statistical methods, including frequentist, Bayesian, and resampling methods, as well as exploratory data analysis (descriptive statistics and dimensional reduction with clustering, PCA, t-SNE). Applications to the analysis of laboratory data will include problems commonly encountered in bioinformatics, genomics, molecular biology, and systems biology. In-class exercises and problem sets will draw from data generated in our own department, public-domain websites, and simulations. Data sets will be analyzed in the context of hypotheses underlying the experiments in which they were generated. The role of simulation techniques for testing statistical methods applied to real data will be emphasized throughout the course.
Several modern statistical methods are lumped under the broad umbrella term, “resampling statistics.” Resampling methods date back to the 1930’s, but they did not become practical until recently. These methods are computer-intensive, but not time-consuming, thanks to fast processing speeds and inexpensive memory in modern personal computers. Resampling methods involve randomization, permutation and Monte Carlo techniques applied to the original experimental data (sample). These methods involve no statistical tables and do not rely on assumptions about the distributions of the underlying random variables. Resampling methods are intuitively satisfying and easy to understand. Such statistical methods are commonly used in bioinformatics, genomics, and systems biology, but they are not covered in many classical biostatistics textbooks.
The course assumes no previous background in statistics or programming. Students will receive hands-on training using the public-domain statistical programming environment RStudio. R is widely used in bioinformatics, genomics, and systems biology. R, which is similar to Matlab, is free and platform-independent software that can be run on Unix, Linux, Windows and any Macintosh OS X operating system. The software is specifically designated for statistical analysis of data, data manipulation and visualization, and generation of publication-quality figures.
Instructors
Professor - Kristin Gunsalus (kcg1-at-nyu.edu)
Co-Instructor - Bogdan Sieriebriennikov (bs167-at-nyu.edu)
Meetings
- One 120-minute lecture/lab (Mondays 9:30am-11:30am)
- One 120-minute lecture/lab (Thursdays 9:30am-11:30am)
- One 75-minute recitation (Fridays 2:00pm-3:15pm)
All classes will take place in Room 805, CGSB (12 Waverly Place). Office hours may be requested by appointment.
Grading
Breakdown
Class Participation: 10%
- Classes will include presentations, tutorials, group discussion, and in-class coding exercises.
- You will be asked to share thoughts, explain ideas, and share coding solutions with other students during class.
- You will sometimes be asked to work together with one or two other students to discuss concepts or work on an R exercise or coding challenge.
Quizzes: 15%
- Classes will begin with a 5-minute quiz designed gauge your knowledge of basic statistical concepts. Quizzes will consists of 5 random questions from a set of study questions for each chapter in the textbook, which will be provided in advance.
Homework: 50%
- Weekly homework assignments involving R coding will posted on Brightspace.
- A total of ten homework assignments will contribute toward the final grade for the course (the lowest scores will be dropped).
- Assignments will be due Sundays at 11:55pm EST/EDT. You are expected to complete all assigned homework.
- Students will have an opportunity to resubmit revised versions of their solutions through Tuesday at 1:00pm EST/EDT.
- Resubmissions will receive a 10% penalty if submitted by Monday at 11:55pm EST/EDT, and an additional 5% penalty if handed in after that.
- No homework submissions will be accepted later than Tuesday afternoon at 1pm.
Final In-class Exam: 10%
- An in-class written exam on major statistical concepts and theory will be given on the last day of class.
Final Take-Home Exam / Project: 15%
- An R coding project will be assigned that will challenge you to apply statistical techniques from throughout the semester to modern biological data. This will take the form either of an extended homework assignment or a final project of your choice. A final project will require prior approval from the instructor and the TA.
Course Websites and Slack
Course Website
The course syllabus, reading assignments, class notes, in-class exercises, tutorials, and links to useful resources will be posted here.
NYU Brightspace
Quizzes, homework assignments, and class recordings will be posted here.
Course Slack Workspace
To facilitate communication, we have set up a Slack workspace for the course. The instructors will monitor the site regularly, so this is the place to post questions and share all things related to the course (thoughts, coding tricks, problems, etc.)
Textbook and Reference Materials
Statistics
-
The textbook we will use is called The Analysis of Biological Data by Whitlock and Schluter, 3rd edition. You may either purchase a hard copy or purchase / rent an e-copy from Amazon.
-
Additional assigned reading will draw from a variety of reference books and other informational resources.
R Coding
DataCamp
We provide free access to a customized DataCamp for Education site for the duration of the course. Students with little or no experience in R programming are encouraged to complete the first series of online tutorials prior to the first class in order to familiarize them with R programming concepts, syntax, operators, and data structures.
Online Resources
There are many online resources for learning R. Some of these are listed on this website under R Resources. Students will be pointed to relevant references throughout the semester.
Safari Books Online
All NYU students have free access to electronic versions of a large number of O’Reilly technical programming manuals (R, Python, etc.) through NYU Libraries. You may access them here (requires NYU two-factor authentication; you will be redirected automatically).
- A good introductory text for R programming that is available here is R for Everyone: Advanced Analytics and Graphics, by Jared Lander.
ProQuest Bookshelf
NYU also provides subscription to ProQuest, which provides free access to many additional reference books online. You can access it here. Note that you will need to either be on campus or connected to the NYU network through a proxy server (Cisco AnyConnect).
Please let us know if any of the above links do not work!
Academic Integrity
Students are expected to know and understand the policies on academic integrity, including University and CAS policies.
The instructors of this course will not tolerate cheating or plagiarism. When academic dishonesty is suspected, it will be dealt with seriously in adherence to these policies.
If a student is caught cheating or plagiarizing, the instructors may, at their discretion, give the student an academic sanction. Such a sanction may include a reduction of the grade on that assignment or exam (possibly to 0) or even a reduction of the final course grade (in consultation with the Director of Graduate Studies, who may meet with the instructors and the student to discuss the nature of the offense).
Depending on the severity of the infraction, the sanction could mean failure of the student in the course. The student may appeal any grade reduction to the Director of Graduate Studies. The departmental decision is final. In addition, any substantial case brought by an instructor to the Director of Graduate Studies must be referred to the Dean’s office for possible disciplinary action.
If you have any questions or uncertainties about these policies, please consult the instructor, Director of Graduate Studies, or Dean’s office.
Religious Observance
As a nonsectarian, inclusive institution, NYU policy permits members of any religious group to absent themselves from classes without penalty when required for compliance with their religious obligations. The policy and principles to be followed by students and faculty may be found here: The University Calendar Policy on Religious Holidays.
Diversity and Inclusion
The instructors of this course share NYU’s commitment to “building a culture that respects and embraces diversity, inclusion, and equity”. We aim to create a learning environment in which every student feels included, supported, and respected. We will hold students (and ourselves) to the CAS Honor Code’s pledge to “behave with decorum and civility, and with respectful regard” for others.
Accommodations for Students with Disabilities
Academic accommodations are available to any student with a chronic, psychological, visual, mobility, learning disability, or who is deaf or hard of hearing. Students should please register with the Moses Center for Students with Disabilities as early as possible in the semester.
NYU’s Henry and Lucy Moses Center for Students with Disabilities 726 Broadway, 2nd Floor New York, NY 10003-6675 Telephone: 212-998-4980