-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error check to pdsh function and other works to verify the deploying process #1
base: master
Are you sure you want to change the base?
Conversation
1. change the pdsh function to with ".communicate()", and add error check there, with error, do sys.exit() 2. remove all ".communicate()" suffix of pdsh caller in benchmark/*.py cluster/ceph.py monitoring.py 3. add a parameter in pdsh function to skip error check if the cmd dosn't require 0 error like "pkill collectl" Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
add a check in setup_fs, if user specify the osd devices in yaml, will deploy osd on those device instead of /dev/disk/by-partlabel/osd-device-%s-data Example: === runtest.xfs.yaml === cluster: osds: [cceph01, cceph02] cceph01: [/dev/sda1, /dev/sdb1, /dev/sdc1, /dev/sdd1, /dev/sde1, /dev/sdf1, /dev/sdg1,/dev/sdh1] cceph02: [/dev/sda1, /dev/sdb1, /dev/sdc1, /dev/sdd1, /dev/sde1, /dev/sdf1, /dev/sdg1,/dev/sdh1] === deploy log === ['/dev/sda1', '/dev/sdb1', '/dev/sdc1', '/dev/sdd1', '/dev/sde1', '/dev/sdf1', '/dev/sdg1', '/dev/sdh1'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo umount /dev/sda1'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo rm -rf /tmp/cbt/mnt/osd-device-0-data'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo mkdir -p -m0755 -- /tmp/cbt/mnt/osd-device-0-data'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo mkfs.xfs -f -i size=2048 -n size=64k /dev/sda1'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo mount -o inode64,noatime,logbsize=256k -t xfs /dev/sda1 /tmp/cbt/mnt/osd-device-0-data'] ... If not specify the osd device, will follow the original way using device /dev/disk/by-partlabel/osd-device-%s-data Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
After adding error check, ceph-osd and ceph-mon cmd will interrupt during deploying example: pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo ceph -c /tmp/cbt/None/ceph.conf -i \ /tmp/cbt/mnt/osd-device-0-data/keyring auth add osd.0 osd "allow *" mon "allow profile osd"'] [ERROR]:cceph01: added key for osd.0 pdsh will get stderr msg "added key for osd.0", which should be stdout msg before fixing this bug in ceph-osd, will skip error check of these command firstly Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Currently using pdcp function to send ceph.conf to tmp_dir(/tmp/cbt/None), It seems it's better using scp here, to scp ceph.conf from the head node to all osd/mon/mds nodes. Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Current, when deploy osd to the second node, it will mount osd device from 0 again here, change to use the osd_id as the mount dir number Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
stdout, stderr = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True).communicate() | ||
if force: | ||
return [stdout, stderr] | ||
if stderr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having something in stderr doesn't mean, that command fail
that's why I checked Popen.returncode in my pull request. |
I do like idea of specifying block devices in yaml instead of with partition name convention though! Can we separate out that part and get that merged? |
you have a point about using scp in at least one case, rpdcp did not pull results back to test driver for me, but I think it can push files out from test driver to other systems just fine. We could do scps in parallel to copy per-host results to different subdirectories on test driver host. |
Four commit here: