Applied Distributed Systems

Spring 2022

COURSE

CSCI-B 649, Topics in Systems,
Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University

INSTRUCTORS

Suresh Marru and Marlon Pierce

CLASS SCHEDULE

Tuesdays and Thursdays from 3 pm to 4.15 pm in Cedar Hall-Union Street Center AC C002

OFFICE HOURS

Office Hours To Be Scheduled

Instructors

The course will be taught by Suresh Marru and Marlon Pierce, who lead the Pervasive Technology Institute’s Cyberinfrastructure Integration Research Center and are nominated members of the Apache Software Foundation and project management committee members for the Apache Airavata open source distributed computing software framework.

Course Overview

Distributed software systems use software components operating on multiple, coordinated computing resources to handle large amounts of data, provide resilience by removing single points of failure, and achieve better performance than single-component systems. Such systems form the backbone of enormously scalable cloud-based software systems that power social networking sites, e-commerce, streaming media services, and many other modern businesses; however, many of the core concepts for distributed systems that underlie the modern services go back decades and need to be understood in order to build new systems.

Inherently, distributed systems face challenges that can be categorized as follows:

Scalability: How well does the system scale as it adds more resources? What are the overheads for management and coordination as the system grows?
Efficiency: The system handles a large amount of data, so performance is important.
Fault tolerance: Can the system continue to operate if some of the components fail? Can the system recover full capacity when resources come back online?
Communications: How do the components of the system communicate? How well does the system handle message latency and loss?
Heterogeneity: How can a system be built out of components developed using multiple programming languages, supporting components (such as databases), etc?
Integration, Deployment, and Operation: How can multi-component systems developed by multiple teams be integrated, tested, deployed into production, and operated at scale?
User Environments: How can end user environments to dynamic distributed systems be developed at scale? How can they efficiently evolve as the underlying system evolves?
Security: How can these systems be operated securely? How can security problems be detected?

As an applied course, the students will get an opportunity to work with concrete instances of distributed systems. Students will learn from our experience developing science gateways, which are distributed computing environments that enable scientists to conduct computational experiments on computing clouds and supercomputers. Science gateways have revolutionized bioinformatics, computational chemistry, nano-engineering, atmospheric science and other scientific fields by bringing unprecedented computing power to a broad community of scientists. The architecture, implementation, and operations of science gateways are interesting topics in their own right. Modern gateway systems utilize microservice architectures, DevOps principles, and user-centered design in their design and operations, adopting lessons learned from cloud-based Software as a Service activities. In this course, students will be divided into development teams, and each team will build a distributed “software as a service” system from scratch. Teams will be encouraged to explore alternative technologies and ways for building systems as well as learning Cloud Native principles such as containerization, continuous integration, and continuous deployment for deploying robust cloud services. Students will also be introduced to the Apache Software Foundation’s open community governance principles for open source software and will learn how to effectively interact with Apache Software Foundation projects in order to become committers and project management committee members. Finally, students will have an opportunity to apply what they have learned to Apache Airavata distributed systems framework.

Course Objectives

Provide a high level, broad understanding of the application of core distributed computing systems concepts and apply them to build “Software as a Service” systems.
Study both abstract concepts and practical techniques for building Cloud-Native Distributed Systems.
Provide hands-on experience in developing scalable application stacks while working with open source philosophies modeled after Apache Software Foundation.
Apply the general concepts of Distributed Systems to understanding the state of the art in “real world” systems.

Course Outcomes

Demonstrate an applied understanding of cloud-native, microservice architectures, and their underlying distributed systems foundations.
Demonstrate an applied understanding of the DevOps principles of continuous integration and delivery to the development and operations of science.
Demonstrate an understanding of open source practices, particularly those of the Apache Software Foundation.
Demonstrate an ability to develop a metadata management system for managing the digital objects created by the system.
Demonstrate an ability to develop and consume API services.
Demonstrate an ability to apply discovery, load balancing, failure recovery, metrics, and monitoring to a distributed system.
Demonstrate ability to perform scalability testing, canary rollouts, rate limiting, access control, and end-to-end authentication.

Course Structure

Course Goal: Students working in a team of 3 will learn and apply modern distributed computing concepts.

Projects
- Projects due every four weeks
- All projects are individually submitted; use GitHub to provide an auditable record of your contributions to your team’s work.

Project Themes for Spring 2022

For the Spring 2021 project theme, we derive a generalized version of data gateway infrastructures that serve the data management and research needs of scientific instruments such as electron microscopes, light-sheet microscopes, and next-generation sequencers. Data gateways provide secure, controlled access to data generated by these and many other scientific instruments. To keep the students focused on the Distributed Systems aspects, we use a photo-sharing application as a simpler analogy.

The students will develop a user interface to upload photos to the archive on remote storage servers. The projects will need to develop pipelines for extracting additional picture metadata using open-source image parsing libraries. The application should enable browsing the photos organized into collections and metadata searches. An example might be to display all photos where flash was used. Students can integrate advanced techniques to integrate imagine recognition machine learning algorithms, although the emphasis will be on building distributed systems that exhibit fault tolerance, scalability, good engineering and operations practices, etc.

The projects will need to exercise the distributed systems discussed above and use concepts of Micro-services & Micro-Frontends, and Cloud-Native Architecture principles. The end system developed in three incremental milestones should be highly available and highly scalable, and the architecture should be demonstrably evolvable over time. Projects will be required to use Security (Custos Website) and Data Management components from the Apache Airavata ecosystem and support accessible storage systems such as https://kb.iu.edu/d/aczn#research, https://uits.iu.edu/google, and https://uits.iu.edu/onedrive using Apache Airavata Managed File Transfer services (MFT Paper).

Grading

This will be a project heavy course. Students will be divided into teams. Each team will have to do a semester-long project divided into 3 project milestones. There will also be a midterm and final presentation. The maximum number of points for the semester is 100. 90-100 points is an A, 80-89 points is a B, etc.

Course Projects 90%: There will be 3 project milestones with each worth 30 points.
- Must use Apache compatible open source licensed software and tools.
- Projects must be checked into github, must be reproducibly executable on the deadline day by the TA’s and instructors.
  - Linux/Unix compatible
  - If the instructors cannot execute your project and verify you have met the success criteria, the team receives 0 points.
  - A team may resubmit their assignment at any time before the next milestone. Each resubmission gets -1 points; i.e., 9 points if you get it right on the second try, 8 points on the third try, etc.
- Students who show no activity (no github commits, no email discussions, etc) for the milestone will receive 0 points.
- Up to 5 bonus points per project for GitHub interactions with other projects (not your team’s project). These have to be substantially demonstrated. Examples include
  - Posting bugs that get resolved
  - Resolving bugs in other team’s projects. These must be accepted to the code base. Trivial issues don’t get rewarded.
Mid-term and final presentations: 10%. Pre-recorded project demonstrations with each of the three teammates taking turns to present the project covering specific topics related to the progress of project milestones.
Classroom Interactions and Peer Reviews: (Bonus) 10%. The projects and topics will require interactive pro-active participation. Also mimicking real-world open source and software development practices, the course requires students to be aware of other approaches to problems, borrowing ideas (with proper acknowledgments and no stealing and plagiarizing), and peer reviewing and offering constructive feedback. These demonstrated interactions (on GitHub issues and pull requests) will be worth 10 points.
- We expect students to attend all classes and to actively participate in the class by asking questions. The instructors will factor this into the final grades for individuals.
Project Grading
- Each project will be judged on ~4 quality attributes. The number will vary by assignment
- To get all points, the project must demonstrate all attributes to the grader. The grader must also be able to easily install and test all software by following documentation for the milestone in the team’s GitHub Wiki.
- Each student on the team will submit a report describing what they did
  - Give a percentage of effort for each attribute
  - Provide auditable proof via links to issues and commits.

Resources

During the course, instructors will provide references to journal and conference papers. A good understanding of concepts discussed in these referred papers will greatly help in absorbing the course material.

Beginner Materials

Principles of Distributed Computing – Lecture Collection
Overview of Apache Airavata
Apache Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center Apache Mesos Paper
Apache ZooKeeper: Wait-free coordination for Internet-scale systems Apache Zookeeper Paper

Open Collaboration

Reuse and building upon ideas or code are major parts of modern software development. As a professional programmer you will never write anything from scratch. This class is structured such that all solutions are public. You are encouraged to learn from the work of your peers. We won’t hunt down people who are simply copying-and-pasting solutions, because without challenging themselves, they are simply wasting their time and money taking this class.
Please respect the terms of use and/or license of any code you find, and if you reimplement or duplicate a design or code from elsewhere, credit the original source.