Fall 2018 - Science Gateway Architectures

  • Course: CSCI-B 649, Topics in Systems, Computer Science, School of Informatics and Computing, Indiana University
  • Instructors: Marlon Pierce, marpierc@iu.edu; Suresh Marru, smarru@iu.edu

  • Class Schedule Tuesdays and Thursdays from 4 pm to 5.15pm in I2 (Informatics East) Room 130
  • Office Hours On Request

Course Overview

Science gateways are distributed computing environments that enable scientists to conduct computational experiments on computing clouds and supercomputers and have revolutionized bioinformatics, computational chemistry, nano-engineering, atmospheric science and other scientific fields by bringing unprecedented computing power to a broad community of scientists. Gateways are interesting topics in their own right. Modern gateway systems utilize microservice architectures and DevOps principles in their design and operations, adopting lessons learned from cloud-based Software as a Service activities. Distributed systems by design scale software to handle large amount of data and to achieve better performance. Inherently distributed systems face challenges related to scaling and a system consists of multiple processes and these processes may run on different hardware systems. The challenges in distributed systems can be mainly categorized as follows:

  • Scalability
  • Efficiency: System handles large amount of data, so performance is important
  • Fault tolerance: Now to solve a problem multiple processes work together. If a process goes down, system is not able to solve a problem.
  • Operation: Easy scaling reduce operation complexity and cost
  • Avoid over-engineering
  • Ability to work with multiple devices
  • Change management

In this course, students will be divided into development teams, and each team will build a distributed system software as a service system from scratch. Teams will be encouraged to explore alternative technologies and ways for building systems as well as learning DevOps principles such as containerization, continuous integration, and continuous deployment for deploying robust cloud services. Students will also be introduced to the Apache Software Foundation’s open community governance principles for open source software and will learn how to effectively interact with Apache Software Foundation projects in order to become committers and project management committee members. Finally the students will have an opportunity to apply the learnings to Apache Airavata based Science Gateways.

Course Objectives

  • Provide a high level, broad understanding of the application of core distributed computing systems concepts to “Software as a Service” systems that support scientific research and education.
  • Study both abstract concepts and practical techniques for building science gateways.
  • Provide hands-on experience in developing a science gateway while working with open source philosophies modelled after Apache Software Foundation.
  • Apply the general concepts of Distributed Systems and understanding state of the art in applicable areas.

Course Outcomes

  • Demonstrate an applied understanding of microservice architectures and their underlying distributed systems foundations.
  • Demonstrate an applied understanding of the DevOps principles of continuous integration and delivery to the development and operations of science
  • Demonstrate an understanding of open source practices, particularly those of the Apache Software Foundation.
  • Demonstrate an ability to develop remote job submission interfaces to computational cyberinfrastructure like IU Big Red 2 Supercomputers.
  • Demonstrate an ability to develop a simple metadata management system.
  • Demonstrate an ability to develop and consume API services.

Course Structure

Course Goal: students will, working in team of 2 to 3 students., learn and apply modern distributed computing concepts to a stand alone Apache Airavata and contribute them to the code base.

  • All assignment reports are individual assignments even though projects are executed within groups.
  • The course will be split into two parts
    • Part 1: Learning basic distributed computing concepts, microservices, DevOps etc
    • Part 2: Applying what you have learned to Apache Airavata

Instructors

The course will be taught by Marlon Pierce and Suresh Marru, who lead the Pervasive Technology Institute’s Science Gateways Research Center and are members of the Apache Software Foundation and project management committee members for the Apache Airavata open source software.

Apache Airavata Project Themes

  1. Load balancing and fault tolerance of the API Server
  2. Expanding Airavata microservice architecture with new services
    1. CI/CD for new services
    2. Allocation management
    3. Resource status, load, etc
    4. Postprocessing pipelines
  3. Containerizing and orchestrating Airavata deployments
  4. Airavata CI/CD
    1. Resource provisioning
    2. CI/CD
    3. Blue-green deployments
    4. Testing and assurance
  5. Workflows, task orchestration, scheduling, parameter sweeps
  6. Improving interactions with remote resources:
    1. Reducing errors with submissions,
    2. Improving monitoring,
    3. Application-specific monitoring
    4. Running batch cloud applications
  7. Logging, searching, and event detection
    1. Comprehensive, consolidated logging for all Airavata production services
    2. Logs for tenant admins
    3. Search, analytics, event detection
    4. Log visualization
  8. User interfaces and user environments
    1. Jupyter, Django, and related stuff
    2. Application toolkits
    3. Visualization, science desktop integration
  9. Data mining as a back end application
  10. Data management and file transfer

Resources

During the course, instructors will provide references to journal and conference papers. A good understanding of concepts discussed in these referred papers will greatly help in absorbing the course material.

Beginner Materials

Open Collaboration

  • Reuse and building upon ideas or code are major parts of modern software development. As a professional programmer you will never write anything from scratch. This class is structured such that all solutions are public. You are encouraged to learn from the work of your peers. We won’t hunt down people who are simply copying-and-pasting solutions, because without challenging themselves, they are simply wasting their time and money taking this class.

  • Please respect the terms of use and/or license of any code you find, and if you reimplement or duplicate a design or code from elsewhere, credit the original source.