Error check to pdsh function and other works to verify the deploying process #1

xuechendi · 2015-03-18T07:01:29Z

Four commit here:

add error check in pdsh function, so pdsh will exit when stderr is not none.
add osd device field in yaml, user can specify osd devices there, if none, will follow the original way using /dev/disk/by-partlabel/osd-device-%s-data
skip error check when calling ceph-osd, ceph-mon command
change pdcp ceph.conf to all nodes to scp

1. change the pdsh function to with ".communicate()", and add error check there, with error, do sys.exit() 2. remove all ".communicate()" suffix of pdsh caller in benchmark/*.py cluster/ceph.py monitoring.py 3. add a parameter in pdsh function to skip error check if the cmd dosn't require 0 error like "pkill collectl" Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

add a check in setup_fs, if user specify the osd devices in yaml, will deploy osd on those device instead of /dev/disk/by-partlabel/osd-device-%s-data Example: === runtest.xfs.yaml === cluster: osds: [cceph01, cceph02] cceph01: [/dev/sda1, /dev/sdb1, /dev/sdc1, /dev/sdd1, /dev/sde1, /dev/sdf1, /dev/sdg1,/dev/sdh1] cceph02: [/dev/sda1, /dev/sdb1, /dev/sdc1, /dev/sdd1, /dev/sde1, /dev/sdf1, /dev/sdg1,/dev/sdh1] === deploy log === ['/dev/sda1', '/dev/sdb1', '/dev/sdc1', '/dev/sdd1', '/dev/sde1', '/dev/sdf1', '/dev/sdg1', '/dev/sdh1'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo umount /dev/sda1'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo rm -rf /tmp/cbt/mnt/osd-device-0-data'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo mkdir -p -m0755 -- /tmp/cbt/mnt/osd-device-0-data'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo mkfs.xfs -f -i size=2048 -n size=64k /dev/sda1'] pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo mount -o inode64,noatime,logbsize=256k -t xfs /dev/sda1 /tmp/cbt/mnt/osd-device-0-data'] ... If not specify the osd device, will follow the original way using device /dev/disk/by-partlabel/osd-device-%s-data Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

After adding error check, ceph-osd and ceph-mon cmd will interrupt during deploying example: pdsh: ['pdsh', '-R', 'ssh', '-w', 'root@cceph01', 'sudo ceph -c /tmp/cbt/None/ceph.conf -i \ /tmp/cbt/mnt/osd-device-0-data/keyring auth add osd.0 osd "allow *" mon "allow profile osd"'] [ERROR]:cceph01: added key for osd.0 pdsh will get stderr msg "added key for osd.0", which should be stdout msg before fixing this bug in ceph-osd, will skip error check of these command firstly Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

Currently using pdcp function to send ceph.conf to tmp_dir(/tmp/cbt/None), It seems it's better using scp here, to scp ceph.conf from the head node to all osd/mon/mds nodes. Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

Current, when deploy osd to the second node, it will mount osd device from 0 again here, change to use the osd_id as the mount dir number Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

koder-ua · 2015-08-06T11:20:05Z

common.py

+    stdout, stderr = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True).communicate()
+    if force:
+        return [stdout, stderr]
+    if stderr:


Having something in stderr doesn't mean, that command fail

bengland2 · 2015-08-06T13:05:02Z

that's why I checked Popen.returncode in my pull request.

bengland2 · 2015-08-06T13:07:28Z

I do like idea of specifying block devices in yaml instead of with partition name convention though! Can we separate out that part and get that merged?

bengland2 · 2015-08-09T16:44:31Z

you have a point about using scp in at least one case, rpdcp did not pull results back to test driver for me, but I think it can push files out from test driver to other systems just fine. We could do scps in parallel to copy per-host results to different subdirectories on test driver host.

bengland2 · 2016-08-12T11:43:56Z

I think the error checking part of this pull request is now fixed, or at least mostly fixed, more error checking wlil be added over time, see merged PRs #110 and #107 .

xuechendi added 5 commits March 18, 2015 19:18

Change pdcp ceph.conf to tmp_dir function to scp

eb9f5df

Currently using pdcp function to send ceph.conf to tmp_dir(/tmp/cbt/None), It seems it's better using scp here, to scp ceph.conf from the head node to all osd/mon/mds nodes. Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

Change setup_fs disk id to osd_num

ed88686

Current, when deploy osd to the second node, it will mount osd device from 0 again here, change to use the osd_id as the mount dir number Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

koder-ua reviewed Aug 6, 2015
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error check to pdsh function and other works to verify the deploying process #1

Error check to pdsh function and other works to verify the deploying process #1

xuechendi commented Mar 18, 2015

koder-ua Aug 6, 2015

bengland2 commented Aug 6, 2015

bengland2 commented Aug 6, 2015

bengland2 commented Aug 9, 2015

bengland2 commented Aug 12, 2016

Error check to pdsh function and other works to verify the deploying process #1

Are you sure you want to change the base?

Error check to pdsh function and other works to verify the deploying process #1

Conversation

xuechendi commented Mar 18, 2015

koder-ua Aug 6, 2015

Choose a reason for hiding this comment

bengland2 commented Aug 6, 2015

bengland2 commented Aug 6, 2015

bengland2 commented Aug 9, 2015

bengland2 commented Aug 12, 2016