Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Installation Issue List #5321

Open
Starmys opened this issue Feb 25, 2021 · 4 comments
Open

Installation Issue List #5321

Starmys opened this issue Feb 25, 2021 · 4 comments

Comments

@Starmys
Copy link
Contributor

Starmys commented Feb 25, 2021

  1. Add / Remove nodes

  2. Installation Enhancement

  3. 5100: Installation script refinement

    • dev-box can be inside master node
    • P3 all in one deployment: single node cluster support
      • allow master node to be worker at the same time;
    • uninstallation doc :
    • ns 'pai-storage' already exists: if quick-start-service.sh fails, the ns may have already been created.
@Starmys
Copy link
Contributor Author

Starmys commented Feb 25, 2021

Issue when exchange worker and master (delete and re-deploy): etcd config conflicts
image

@Starmys
Copy link
Contributor Author

Starmys commented Mar 1, 2021

Plan

  1. docker dev box

    1. add into dockerfile
      # basic tools
      apt-get install software-properties-common
      apt-get update
      apt-get install sudo
      # python3.6
      sudo add-apt-repository ppa:deadsnakes/ppa
      sudo apt update
      sudo apt install python3.6
      sudo rm /usr/bin/python3
      sudo ln -s /usr/bin/python3.6 /usr/bin/python3
      sudo pip3 install setuptools
      # ansible, etc.
      The bottom half of pai/contrib/kubespray/script/environment.sh
    2. docker run ... -v ~/pai:/root/pai -v ~/pai-deploy:/root/pai-deploy -v ~/.ssh:/root/.ssh
  2. integrate the steps of add / remove nodes into ./paictl scale

    1. before ./paictl scale: modify layout.yml and services-configuration.yaml manually
    2. preparation
      1. check if layout.yaml conflicts with services-configuration.yaml
      2. compare the input layout.yml with /cluster_configuration/layout.yml to fetch the nodelist to add / remove
      3. ./paictl config push -p /udpated-config
      4. modify kubespray
        1. remove-node.yaml: gather_facts: yes
        2. roles/remove-node/post-remove/tasks/main.yml: remove run_once: true
    3. add nodes
      # check docker daemon config
      requirement.sh --limit ...
      # raise a notice to change docker daemon config and reload docker daemon manually if necessary
      # add node to k8s cluster
      cd /root/pai-deploy/kubespray
      ansible-playbook -i inventory/pai/hosts.yml scale.yml --become --become-user=root -e "@inventory/gcrv100/openpai.yml" --limit=nodelist
      # update config and restart service
      cd /root/pai/
      ./paictl service stop -n cluster-configuration rest-server hivedscheduler job-exporter
      ./paictl service start -n cluster-configuration rest-server hivedscheduler job-exporter
    4. remove nodes
      # remove node from k8s cluster
      cd /root/pai-deploy/kubespray
      ansible-playbook -i inventory/pai/hosts.yml remove-node.yml --become --become-user=root -e "@inventory/gcrv100/openpai.yml" --limit=nodelist
      # update config and restart service
      cd /root/pai/
      ./paictl service stop -n cluster-configuration rest-server hivedscheduler job-exporter
      ./paictl service start -n cluster-configuration rest-server hivedscheduler job-exporter

@fanyangCS
Copy link
Contributor

related #2558

@fanyangCS
Copy link
Contributor

related #4521

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants