Course Syllabus
Table of contents
- Resources
- Reading
- Class participation and required readings
- Programming assignments
- Course projects
- Grading
- Late policy
- Academic Integrity
- Students with disabilities or learning needs
Resources
Go to Resource Tab.
Reading
There are no official textbooks. Required readings are (most frequently) in the form of seminal research papers, online documentations, and/or selected textbook chapters There are several books that might be useful:
Operating Systems: Three Easy Pieces (OSTEP), by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau, Aug, 2018 v 1.00 (free book).
Designing Data-Intensive Applications (1st Edition), by Martin Kleppmann (see instruction below how to access the free version via UVA Library).
Distributed Systems 3rd edition (2017), by Maarten van Steen and Andrew S. Tenenbaum (free book).
To access the O’Reilly text book (Designing Data-Intensive Applications (1st Edition), by Martin Kleppmann), you just need to do the following:
- Access the UVA Library website.
- Search the title of the textbook: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems.
- Click on Library Catalog (Access Online), and sign into the O’Reilly website with your UVA email address.
- If the O’Reilly link brings you to an audiobook, search for the textbook in O’Reilly website and find the ebook.
Class participation and required readings
Class participation is required. We will discuss the design and the use (application) of a variety of modern big data systems that we’ll cover during this semester. Most of these systems have research papers, if not, online docs, which present the original/evolved design of them. One of the many great examples is Google’s MapReduce (and later the open-source implementation of MapReduce: Apache Hadoop), which opens a new era of what we call Big Data Systems today.
Specifically, the instructor (prof or the invited guest speaker) will lead the lecture. In some lectures we will have moderate discussions about the papers/articles that we will have all read before each class. You are encouraged to participate in discussions. To stimulate better discussion, you need to complete the assigned reading assignments, e.g., a research paper about a topic. One way to test your understanding is to fill out a required review form for that reading assignment. The review forms in total is worth 5% of your overall grade, so it does matter that you 1) submit the review forms on time AND 2) participate in the discussion (which in fact requires you to do the required readings).
I also strongly encourage you to discuss the assigned/optional readings (papers/tech reports/online documentations) with other students in the class — you may have insights that others do not, and vice versa. Oftentimes, students form reading groups, which I encourage; on the other hand, I would like to point out that group discussion is not an effective substitute for actually reading the paper.
Programming assignments
We will have three Programming Assignments during the first half of the semester:
- Assignment 0: Using AWS Academy, EC2, and Linux shell.
- Assignment 1: Parallelizing Python processing with Dask.
- Assignment 2: A tour of Apache HDFS and Spark.
- Assignment 3: A deeper dive with Ray.
Course projects
Probably the most exciting part of this course is to complete an interesting project related to big data systems. I will provide you with a list of ideas around Week 4.
Grading
Your grade will be calculated as follows:
- 10% quizzes
- 5% participation (paper review forms)
- 5% assignment 0
- 10% assignment 1
- 10% assignment 2
- 10% assignment 3
- 10% midterm exam
- 10% project checkpoint 1 report
- 10% project checkpoint 2 report
- 20% final project report, presentation, artifact evaluation
Midterm exam
There will be a midterm exam scheduled around Week 8 (taken online).
Quizzes
There will be a short quiz due at the end of most Wednesdays. Make sure you know the rules regarding what is allowed and what is not.
Allowed
- However much time you need.
- Discussing answers with classmates who are taking the quiz at the same time.
- Referencing texts, notes, or provided course materials.
- Searching online for general information.
- Running code.
NOT allowed
- Taking it more than once.
- Discussing answers with anybody outside of the course.
- Discussing with classmates who have already completed the quiz when you haven’t completed it yourself yet.
- Posting anything online about the quizzes.
- Using such material potentially posted by other students who broke the preceding rule.
- Getting TA/instructor help on quiz questions prior to the quiz deadline.
Grading rules
The final grade is computed according to the following rules:
- A+: >= 98%; A: [93%, 98%); A-: [88%, 93%)
- B+: [83%, 87%); B: [80%, 83%); B-: [80%, 83%)
- C+: [77%, 80%); C: [73%, 77%); C-: [70%, 73%)
- D+: [67%, 70%); D: [63%, 67%); D-: [60%, 63%)
- F: < 60%
Team project grading
In cases where team members do not equally contribute to the project, we may assign different grades to different individuals, up to an extreme of deducting 50% of the team project grade for a student. We will evaluate each individual’s contribution on the basis of a variety of factors, including progress reports at project checkpoints, through inspecting version control history, through each students’ self-reflection, and through each students’ peer evaluation {during and/or} at the end of the project. We will make regular efforts to collect and distribute this feedback throughout the project — our ultimate goal is for all students to participate and receive full marks.
Late policy
Students must work individually on all assignments including the programming assignments and projects. We encourage you to have high-level discussions with other students in the class about the assignments, however, we require that when you turn in an assignment, it is only your work. That is, copying any part of another student’s assignment is strictly prohibited, and repercussions for doing so will be severe (up to and including failing the class outright). You are free to reuse small snippets of example code found on the Internet (e.g. via StackOverflow) provided that it is attributed. If you are concerned that by reusing and attributing that copied code it may appear that you didn’t complete the assignment yourself, then please raise a discussion with the instructor.
Your work is late if it is not turned in by the deadline.
- 10% will be deducted for late assignments turned in within 24 hours after the due date.
- Assignments submitted more than 24 hours late will receive a zero.
If you’re worried about being busy around the time of an assignment submission, please plan ahead and get started early. Assignment that does not compile or run will receive at most 50% credit.
For fairness to all, there are no exceptions to the above rules.
Academic Integrity
The School relies upon and cherishes its community of trust. We firmly endorse, uphold, and embrace the University’s Honor principle that students will not lie, cheat, or steal, nor shall they tolerate those who do. We recognize that even one honor infraction can destroy an exemplary reputation that has taken years to build. Acting in a manner consistent with the principles of honor will benefit every member of the community both while enrolled in the School of Data Science and in the future. Students are expected to be familiar with the university honor code, including the section on academic fraud.
Citing ChatGPT (or other LLMs): It’s allowed with proper citation.
- Use of the tools is permitted, with proper citation.
- A “chats” directory should contain screenshots or PDFs of any chats, in their entirety. Name them as chat1.png, chat2.png, etc. PDF and JPG formats are also permitted.
Students with disabilities or learning needs
It is my goal to create a learning experience that is as accessible as possible. If you anticipate any issues related to the format, materials, or requirements of this course, please meet with me outside of class so we can explore potential options. Students with disabilities may also wish to work with the Student Disability Access Center to discuss a range of options to removing barriers in this course, including official accommodations. Please visit their website for information on this process and to apply for services online. If you have already been approved for accommodations through SDAC, please send me your accommodation letter and meet with me so we can develop an implementation plan together.