A practical introduction to statistical programming focusing on the R programming language. Students will engage with the programming challenges inherent in the various stages of modern statistical analyses including everything from data collection/aggregation/cleaning to visualization and exploratory analysis to statistical model building and evaluation. This course places an emphasis on modern approaches / best practices for programming including: source control, collaborative coding, literate and reproducible programming, and distributed and multicore computing.
- Tuesday, 1:25PM – 2:40PM, French Science 2231
- Thursday, 1:25PM – 2:40PM, French Science 2231
- Monday, 4:40PM – 5:55PM, Old Chemistry 116
- Monday, August 26 - Fall semester classes begin; Drop/Add continues
- Monday, September 2 - Labor Day. Classes in session
- Friday, September 6 - Drop/Add ends
- Monday, October 7 - No class; Fall break
- Tuesday, October 8 - No class; Fall break
- Tuesday, November 26 - Graduate classes end
This class is about you doing as opposed to you watching or listening. Lectures and labs will be interactive and hands-on for us all. My role as instructor is to introduce you to new tools and techniques, but it is up to you to take them and make use of them. Most topics will include supplemental resources for you to delve deeper. Occasionally, there will be pre-class readings in order to enrich our lecture and lab experience.
Attendance will not be taken during lecture or lab, but you are expected to attend all sessions and meaningfully contribute to in-class exercises and homework assignments.
This course will involve a lot of group work. Functional and diverse teams will be constructed based on the first-day class survey; these teams will not change throughout the semester (barring extraordinary circumstances). You will work in these teams during class and on the homework assignments.
Homework will regularly be assigned throughout the semester. Some assignments will be done individually and some will be done in groups. For team based assignments, all team members are expected to contribute equally to the completion of each assignment. It is also imperative that each team member has read, run, and understood all code in the final submission. An intragroup peer evaluation will be conducted after each assignment.
Students are expected to make use of the provided git repository on the course’s GitHub page as their central collaborative platform. Commits to this repository will be used as one metric of each team member’s relative contribution for each homework.
There will be a two take home exams that are to be completed individually. Details on what is and what is not permitted for each exam will be provided.
There will be a final project that is to be completed in a team of your choice. Details of the final project will be provided as the course progresses. It will include a written reproducible report with your code.
Your final grade will be computed based on the following weights.
- Homework: 45%
- Exam 1: 20%
- Exam 2: 20%
- Final Project: 15%
The exact ranges for letter grades may be curved and cutoffs will be determined at the end of the semester. The more evidence there is that the class has mastered the material, the more generous the curve will be.
There are no required textbooks for this course; the following are recommended textbooks for this course and your future self.
- Advanced R
Wickham, H. (2019). Chapman and Hall/CRC.
- R for Data Science
Grolemund, G., & Wickham, H. (2017). O’Reilly.
- R packages
Wickham, H. (2015). O’Reilly.
R / RStudio
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R we will primarily be using RStudio, an interactive development environment (IDE), via a browser based interface or on your own computer. You may use RStudio on the Department servers.
When you’re writing code, it is nice to have a text editor that is optimized for writing code. There is a huge variety of options out there, if you do not already have a preferred editor try a few and see which one works best for you.
- Vim / Emacs - old school Unix console based editors, they have a steep learning curve but are incredibly powerful
- Nano - another Unix console editor, easier learning curve but with much less power
- Sublime Text - cross-platform GUI text editor with a robust plug-in ecosystem
Git is a state-of-the-art version control system. It lets you track who made changes to what and when, and it has options for easily updating a shared or public version of your code on GitHub.
- OSX - install Git for Mac by downloading and running the installer or install homebrew and use it to install git via brew install git.
- Unix / Linux - you should be able to install git via your preferred package manager (if it is not already installed).
- Windows - install Git for Windows by downloading and running the git for windows installer. This will provide you with git, the bash shell, and ssh in windows.
Unix shell(s) / ssh
We will be doing much of the work in the class on remote Linux systems, primarily we will be interacting with these machines through a remote terminal and a shell. Using a shell gives you more power to do more tasks more efficiently with your computer.
- OSX / Unix / Linux - these tools should already be installed and you should be able to access your shell through the Terminal application (name may vary slightly depending on your OS).
- Windows - there are several ways to install bash or a bash-like shell, the preferred method is to install the Git for Windows package as detailed above.
We will be using Slack to facilitate communication and group work. Slack provides a single location for messaging, tools, and files - allowing for efficient collaboration.
Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and non-academic endeavors, and to protect and promote a culture of integrity. Cheating on exams and quizzes, plagiarism on homework assignments and projects, lying about an illness or absence and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the Duke Community Standard, and will not be tolerated. Such incidences will result in a 0 grade for all parties involved as well as being reported to the University Judicial Board. Additionally, there may be penalties to your final class grade. Please review Duke’s Standards of Conduct.
Students with disabilities who believe they may need accommodations in this class are encouraged to contact the Student Disability Access Office at (919) 668-1267 as soon as possible to better ensure that such accommodations can be made.
In an emergency, there are several ways that the University will contact you. Two are detailed below. Campus emergency procedures are described here: http://emergency.duke.edu
Text Messaging: An alert message may be sent to the mobile devices of Duke community members who register for a new text messaging system. Sign up for DukeALERT text messages or learn more about text messaging at Duke.
LiveSafe Mobile App: Notifications may be sent through the LiveSafe Mobile app to notify members of the Duke community of emergency situations. The free mobile app, available through the Apple App Store and Android App Store, offers real-time, two-way communication between Duke community members and the Duke University Police Department.
- Shawn Santo
- Wednesday 3:00 – 4:00pm, 207A Old Chemistry
- Friday 8:30 – 9:30am, 207A Old Chemistry