Skip to content

Commit

Permalink
Merge pull request kubernetes#130 from smarterclayton/docs
Browse files Browse the repository at this point in the history
Add a roadmap to OpenShift
  • Loading branch information
openshift-merge-robot committed Jan 1, 2020
2 parents c728d5c + 2e6e05f commit d6d4731
Show file tree
Hide file tree
Showing 3 changed files with 275 additions and 20 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

Enhancement tracking repository for OKD.

Inspired by [Kubernetes
enhancements](https://github.com/kubernetes/enhancements) process.
Inspired by the [Kubernetes enhancement](https://github.com/kubernetes/enhancements) process.

This repository provides a rally point to discuss, debate, and reach consensus
for how OKD [enhancements](./enhancements) are introduced. OKD combines
Expand All @@ -19,6 +18,8 @@ the basis of a community roadmap. Enhancements may be filed from anyone in the
community, but require consensus from domain specific project maintainers in
order to implement and accept into the release.

For an overview of the whole project, see [ROADMAP.md](./ROADMAP.md)

## Is My Thing an Enhancement?

A rough heuristic for an enhancement is anything that:
Expand Down
259 changes: 259 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
---
title: roadmap
authors:
- "@smarterclayton"
- "@derekwaynecarr"
- "@jwforres"
- "@crawford"
- "@eparis"
reviewers:
- "@smarterclayton"
- "@derekwaynecarr"
- "@jwforres"
- "@crawford"
- "@eparis"
approvers:
- "@smarterclayton"
- "@derekwaynecarr"
- "@jwforres"
- "@crawford"
- "@eparis"
creation-date: 2019-11-24
last-updated: 2019-11-24
status: provisional
see-also:
replaces:
superseded-by:
---

# OpenShift Roadmap

## Summary

This document identifies the top level initiatives driving the OpenShift project
as a whole and identifies key interlocking objectives that provide context for
individual enhancements. This document is not a replacement for the enhancements
it references - instead it identifies thematic goals across the entire project
and helps orient developers, users, and advocates in specific directions. This
roadmap is advisory and describes problems and constraints that span multiple
areas of a very large project.


## Motivation

The roadmap helps drive continuity across releases and coherence across many
individual areas of the project. This document is intended to remain relatively
up to date and describe in broad details the top-level objectives of the project.

As a platform, predictability of lifecycle and direction is critical for consumers
making multi-year bets, and the roadmap must provide sufficient clarity that a
new consumer can assess the difference between short, medium, and long term risks.


### Goals

OpenShift generally attempts to satisfy the following objectives:

#### Platform

1. Provide a predictable and reliable distribution of Kubernetes that remains close to the upstream project cadence
2. Provide long-term stability of features and APIs (over a 1-3 year timeframe), regardless of upstream project choices
3. Be "secure by default" in terms of all choices in lifecycle, features, and configuration within the project
4. Provide balanced support for self-service by users on the platform as well as platform as deployment target

#### Ecosystem

5. Identify, stabilize, and operationalize critical ecosystem components and provide them "out-of-the-box" with the distribution (e.g. ingress, networking)
6. Make extension of the core platform (including replacement out-of-the-box components) easy
7. Make platform and component lifecycle trivially easy to manage and low risk

#### Operational

8. Be easily installable in all major environments in an opinionated best-practices fashion, but be flexible to user-provided opinionation
9. Ensure configuration, rollback, and reconfiguration of the platform is broadly consistent and easily automatable
10. Perform automatic maintenance of all software components and infrastructure, detect and repair drift, and continuously monitor subsystem health
11. Provide clear guidance via alerting, user interface, and dashboards when manual intervention is necessary

#### Applications

12. Make developing and deploying a broad range of applications from a broad range of developer skill sets easy and/or possible
13. Provide tools for operational teams to monitor, strictly control or enable self-service, and securely subdivide the resources within a cluster
14. Identify and enable key application development technologies to integrate well with the platform, while preserving the other objectives
15. Progressively orient and educate developers across a broad skill range about patterns and tools that can improve their effectiveness


### Non-Goals

1. Build new components that could be better adapted from within the ecosystem (unless otherwise necessary)
2. Endorse one particular "right way" to build and develop containerized applications - instead enable specific patterns (GitOps, iterative appdev, team driven microservices, etc) that can match a broad range of organizational needs
3. Be a "kitchen-sink" distribution - it is better to have a small core with stable APIs and a big ecosystem at different lifecycles that can evolve without regressing
4. Allow deep customization within the platform - for the components we ship, we want to avoid complex configuration and expansive test matrices
5. Ship upstream components as fast as possible - we emphasize "don't worry" over "fear of missing out" with respect to new changes


## Proposal

OpenShift is a containerized application platform built on Kubernetes and its ecosystem of
tools focused on maximizing operational and developer effiency. Everyone - from a single
developer to the world's largest companies - should be able to develop, build, and run
mission-critical applications with OpenShift in any enviroment and see benefits over their
existing platforms and toolchains.

### User Stories

These stories define the core use cases OpenShift looks to address.

#### Stable Enterprise Kubernetes

As an enterprise IT organization deploying Kubernetes,
I should have a stable and reliable Kubernetes distribution that reduces my support and operational burden while allowing me to meet the organizational, legal, and functional requirements I must work within,
so that I can quickly evaluate, integrate, and deploy Kubernetest to production.

This includes:

* Corporate identity integration like LDAP, SSO, and large scale team hierarchies
* Resource usage reporting and chargeback, hard and soft resource limits, and configurable self-service for teams
* Security and audit compliance (with or without regulatory features), like FIPS, FedRamp, off-cluster audit, secure containers, role-based access control for operations and teams, least-privilege default configurations, and encryption at rest of high value secrets
* Private clusters in cloud environments, airgapped cluster deployments, delegated install with preconfigured VPC networking
* Ability to both integrate with existing data center tooling (load balancing, DNS, networking) as well as the ability to take ownership of those problems within a cluster to reduce organizational friction and improve operational velocity
* A reliable bare-metal and multi-environment block and object storage solution
* Tooling and practices around common problems such as multiple datacenter high-availability, migration of containerized applications across clusters, whole cluster backup and restore, and network tracing control


#### Programmable containerized application deployment environment

As an organization with an existing development pipeline, or one building a new enterprise application
platform, or as a small to medium sized team using Kubernetes as a deployment target,
I should expect Kubernetes and the necessary ecosystem components to remain stable over multiple-year timeframes,
so that I can delivery applications more rapidly, with better operational efficiency, at higher scales, and with better availability.

This includes:

* API stability and conformance within the Kubernetes project and other ecosystem projects
* Backwards and forwards compatibility for all APIs and extensions - all breaks are regressions
* A clear lifecycle that matches my organizational needs with safe upgrades and long term support
* Automation for common operational patterns like autoscaling, machine lifecycle, and load balancer integration
* Automatic hardware, infrastructure, and software monitoring and remediation to mitigate entropy
* Easy infrastructure and user workload monitoring and alerting that can help track and monitor health
* Easy access to both reliable application components on platform and cloud or organizational services off platform
* Access to virtualization tools to migrate existing applications and reduce the need for alternative platforms
* A command line and web console that provide simple operational troubleshooting
* A single-pane-of-glass management experience across one or more clusters that targets planning, capacity, operational monitoring, and policy enforcement


#### Self-service developer platform

As an organization looking to modernize, innovate, or standardize large portions of application development,
I should have tools and patterns that are easily accessible and consumable by a wide range of
developer skillsets and that allow organizational, operational, or security practices to easily integrate,
so that I can rapidly improve my development organization efficiency and react more quickly to business needs.

This includes:

* Simple out-of-the-box tooling and user experiences to iteratively develop and deploy containerized applications
* A range of available runtime frameworks that combine sufficient lifecycles and reasonably recent versions
* A command line and web console that provide simple self-service development workflows on top of the platform
* Easy access to function-as-a-service, service mesh, remote cloud services, and easy to consume automated components (like queues, databases, and caches)
* Deployment and iteration integration with common IDEs, and an on-demand zero-install IDE for quick iteration, prototyping, and troubleshooting
* User experiences that enable incremental learning about Kubernetes, containerized applications, and advanced concepts


#### Project reliability engineering

As an open-source community and product focused organization,
OKD and OpenShift should have a development lifecycle that leverages automation and data capture to rapidly test, release, and
validate the projects being developed within the product,
so that we can deliver higher quality software faster to more environments, with less regressions, and with a tighter feedback loop between developer and deployer.

This includes:

* Broad CI automation to integrate the work of hundreds of open source projects
* Extensive test-before-merge and test-before-release gating via end-to-end and project specific suites, along with manual testing on pull-requests, to catch regressions before they are merged
* Short, automated, and reliable processes for promoting projects to release candidates and publishing them for consumption
* Remote health monitoring of CI, evaluation, and production clusters to identify issues as upgrades roll out and to determine common failures
* Predictable and short release cadences that reduce slippage by derisking delaying individual features


## Initiatives

This lists the important initiatives across the project. These are the ones that span
multiple releases, require close coordination between teams, or have subtle implications
on a large number of areas.


### Automating management of the control plane

Our goal is to fully automate control plane node lifecycle, reduce operational complexity
during recovery of a master, simplify the install sequence and remove the need for a
unique bootstrap node, prepare for vertical autosizing of masters, and enable some form of
non-HA clusters. As of 4.1, a number of operational advantages provided to worker nodes
cannot be realized. A brief sketch of the approach is covered below (in rough order):

1. Automate the core etcd quorum and lifecycle of etcd members with the cluster-etcd-operator
2. Make the bootstrap node look more like a full master and have additional masters join
3. Front the API servers and other master services with service load balancers
4. Automatically recover when a master machine dies on cloud providers by creating a new machine (machine health check)
5. Add out-of-the-box metal load balancing support (with metallb project?).
6. Allow masters to be vertically scaled by changing a machine size property and replacing mismatched masters
7. Add a simple backup recovery experience to etcd operator instances that requires no additional scripting / commands (form new cluster with X after shutting down other workers)
8. Allow the bootstrap node to be easily transitioned to a worker node post boot (to reduce minimum cluster requirements)

Completing this change will simplify the operational experience for masters to only a
single recovery action (purge other masters, pick leader or restore from backup) on all
clouds.

### Allow cluster control planes to be hosted on another cluster

TODO

### Improve management experience of one or more clusters

TODO

### Improve OpenShift on bare metal

TODO


### Improve platform observability and reactivity

The introduction of remote health monitoring and deeper CI monitoring in 4.x is allowing
us to more quickly identify and triage issues impacting the fleet and deliver fixes and
improved monitoring and alerting. We must continue to improve and invest in this pattern by:

1. Identify and prioritize top failure modes in production environments
2. Ensure thorough alert and metrics coverage of those failure modes
3. Improve usage of alerting by making configuration and status more obvious to end users (have you configured alerting yet?)
4. Refine and improve failure monitoring in operators and on cluster (health detection) for key components like ingress, networking, and machines
5. Better correlate configuration failures (on upgrade or in normal operation) and safeguard those changes
6. Identify and implement e2e tests that better simulate top problems (machine failure, master recovery, network loss)
7. Automate detection and reporting of failures as upgrades are being rolled out
8. Reduce triage time of failures with better standard development tooling and dashboarding
9. Better understand which features are in common use to prioritize investment

Investment in this area allows us to more effectively fix the most impactful issues, which
has better user outcomes.


### Improve operator lifecycle manager end-user experience and operator-author lifecycle

TODO


### Improve the networking stack

openshift-sdn has succeeded at being a no-frills default networking plugin for OpenShift.
The introduction of multus in 4.1 opened significant flexibility for integrators to provide
multiple networks and specialized use cases.

As a long term direction we believe OVN has better abstractions in place to grow feature
capability and integrations. IPv6 support (single and dual-stack) is planned only for OVN.
We will continue to improve support for third party networking plugins at install and
update time.

We also wish to improve the integration of multus with the project, potentially by adding
service integration to secondary interfaces.

Finally, a key challenge with SDN is detecting subtle bugs and misconfigurations. We would
like to add network tracing and failure detection to each node to better diagnose and catch
those issues.
31 changes: 13 additions & 18 deletions enhancements/README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,34 @@
# OKD Enhancement Proposals

An OKD Enhancement Proposal is a way to propose, communicate, and coordinate on
new efforts for the OKD project.
An OKD Enhancement Proposal is a way to propose, communicate, and coordinate on new efforts for the OKD project.

It is inspired from our experience with the Kubernetes Enhancement process where
many of our community participants collaborate each day.
It is inspired from our experience with the Kubernetes Enhancement process where many of our community participants collaborate each day.

This process is evolving, but is mandatory for all enhancements beginning with
release-4.3.
This process is evolving, but is mandatory for all enhancements beginning with release-4.3.

## Quick start

1. Socialize an idea with others. Make sure others think the work is worth
doing, and are willing to review design and code changes required.
2. Follow the process outlined in the [enhancement
template](template.md).
1. Socialize an idea with others. Make sure others think the work is worth doing, and are willing to review design and code changes required.
2. Follow the process outlined in the [enhancement template](template.md).

## FAQs

### Do I have to use the process?

If the enhancement has broad scope, yes. It helps everyone track why, when,
how, and by whom work is done.
If the enhancement has broad scope, yes. It helps everyone track why, when, how, and by whom work is done.

### Why would I want to use the process?

Provide a mechanism to communicate design and implementation strategies across
the OKD community.
Provide a mechanism to communicate design and implementation strategies across the OKD community.

### Do I put design in a particular directory?

If it has broad impact, place it in the root of this directory. If it's
localized to a particular domain, find the appropriate directory.
If it has broad impact, place it in the root of this directory. If it's localized to a particular domain, find the appropriate directory.

### Do I have to use the Oxford comma?

Yes. OKD is an open, inclusive, and diverse community, and there is absolutely no room for ambiguous clauses.

### My FAQ isn't answered here!

Open an issue and ask or even better open a PR with a question and proposed
answer.
Open an issue and ask or even better open a PR with a question and proposed answer.

0 comments on commit d6d4731

Please sign in to comment.