RancherOS fails to install on nvme drive

I am trying to install rancherOS on a new M.2 SSD, but it fails (seemingly) at formatting the drive:

sudo ros install -t gptsyslinux -f --debug -c cloud-config.yml -d /dev/nvme0n1 -i rancher/os:v0.9.0                        Installing from rancher/os:v0.9.0
> INFO[0000] start !isoinstallerloaded
> DEBU[0000] blkid type of /dev/sr0: iso9660
> DEBU[0000] mount (&exec.Cmd{Path:"/bin/mount", Args:[]string{"mount", "-t", "iso9660", "/dev/sr0", "/bootiso"}, Env:[]string(nil), Dir:"", Stdin:io.Reader(nil), Stdout:io.Writer(nil), Stderr:io.Writer(nil), ExtraFiles:[]*os.File(nil), SysProcAttr:(*syscall.SysProcAttr)(nil), Process:(*os.Process)(nil), ProcessState:(*os.ProcessState)(nil), ctx:context.Context(nil), lookPathErr:error(nil), finished:false, childFiles:[]*os.File(nil), closeAfterStart:[]io.Closer(nil), closeAfterWait:[]io.Closer(nil), goroutine:[]func() error(nil), errch:(chan error)(nil), waitDone:(chan struct {})(nil)})
> DEBU[0001] Mounted /dev/sr0
> INFO[0001] trying to load /bootiso/rancheros/installer.tar.gz
> INFO[0066] Loaded images from /bootiso/rancheros/installer.tar.gz
> INFO[0066] starting installer container for rancher/os:v0.9.0 (new)
> DEBU[0066] Run(&{/bin/system-docker [system-docker run --rm --net=host --privileged -v /:/host --volumes-from=all-volumes rancher/os:v0.9.0 -t gptsyslinux -d /dev/nvme0n1 -i rancher/os:v0.9.0 -f --no-reboot -c /opt/user_config.yml --isoinstallerloaded=1 --debug] []  <nil> <nil> <nil> [] <nil> <nil> <nil> <nil> <nil> false [] [] [] [] <nil> <nil>})
Unable to find image 'rancher/os:v0.9.0' locally
v0.9.0: Pulling from rancher/os
627beaf3eaaf: Pull complete
c62794cdced6: Pull complete
c4626eff5f27: Pull complete
b47e38b4585d: Pull complete
4914c31b1e0c: Pull complete
172683e78e1a: Pull complete
865dbcf06d52: Pull complete
df311c14e097: Pull complete
41861a405f90: Pull complete
Digest: sha256:f48961131d180cebfc55a324e89316e35227ca43025cb74886d5929b76cb21ee
Status: Downloaded newer image for rancher/os:v0.9.0
Installing from rancher/os:v0.9.0
mount: /dev/sr0 is write-protected, mounting read-only
> time="2017-04-06T00:43:40Z" level=error msg="mkfs.ext4: exit status 1"
> time="2017-04-06T00:43:40Z" level=error msg="formatdevice exit status 1"
> time="2017-04-06T00:43:40Z" level=error msg="formatAndMount exit status 1"
> time="2017-04-06T00:43:40Z" level=error msg="error layDownOS exit status 1"
> time="2017-04-06T00:43:40Z" level=fatal msg="Failed to run install" err="exit status 1"
> FATA[0074] Failed to run install                         err="exit status 1"

fdisk output for the device is:

Disk /dev/nvme0n1: 119.2 GiB, 128035676160 bytes, 250069680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xee88c077

Device         Boot Start       End   Sectors   Size Id Type
/dev/nvme0n1p1 *     2048 250066943 250064896 119.2G 83 Linux

I’ve tried a couple different versions to install (0.9.0, 0.9.1, and latest) and they all fail - so it doesn’t seem to be an issue with a specific installer image. Based on the fdisk output above it seems like the formatting is actually succeeding, not sure why it reports an exit status of 1.

I was able to manually format:

sudo mkfs.ext4 -L RANCHER_STATE /dev/nvme0n1
mke2fs 1.43.1 (08-Jun-2016)
Found a dos partition table in /dev/nvme0n1
Proceed anyway? (y,n) y
Discarding device blocks: done
Creating filesystem with 31258710 4k blocks and 7815168 inodes
Filesystem UUID: 2612d138-bd44-416a-abf4-960881281ebc
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

And then tried the install again with the same result. Finally, tried completely wiping the disk and installing:

sudo dd if=/dev/zero of=/dev/nvme0n1  bs=512  count=1
1+0 records in
1+0 records out

But again, failed in the same way. What am I missing? I have another machine that I was able to install on a USB, but want to install to a more reliable disk type.

the -d device is the unpartitioned device, and the installer will then try to create a partition /dev/nvme0n11 and install to it.

if you have already created the partition, then the install will fail.

There is a -p option (iirc), so you can use both -d and -p - but that might be something i just fixed in the upcoming 0.9.2 release. (sorry, I’m on holidays in rural Tasmania, so not entirely sure atm)

/dev/nvme0n1 is the unpartitioned device. I have tried partitioning and deleting all partitions (with fdisk) with the same result. I will try partitioning with the -p option tonight and follow-up. Thanks for the response and enjoy your vacation!

Just to make sure I get syntax correct, I’m assuming the proper values for partition 1 would be:

-d /dev/nvme0n1 -p /dev/nvme0n1p1

Following up - tried using -d and -p together, failed with the same error:

 sudo ros install --debug -d /dev/nvme0n1 -p /dev/nvme0n1p                                                                  1 -f -c cloud-config.yml
> INFO[0000] No install type specified...defaulting to generic
Installing from rancher/os:v0.9.2-rc1
> INFO[0000] start !isoinstallerloaded
> DEBU[0000] blkid type of /dev/sr0: iso9660
> DEBU[0000] mount (&exec.Cmd{Path:"/bin/mount", Args:[]string{"mount", "-t",                                                                   "iso9660", "/dev/sr0", "/bootiso"}, Env:[]string(nil), Dir:"", Stdin:io.Reader                                                                  (nil), Stdout:io.Writer(nil), Stderr:io.Writer(nil), ExtraFiles:[]*os.File(nil                                                                  ), SysProcAttr:(*syscall.SysProcAttr)(nil), Process:(*os.Process)(nil), Proces                                                                  sState:(*os.ProcessState)(nil), ctx:context.Context(nil), lookPathErr:error(ni                                                                  l), finished:false, childFiles:[]*os.File(nil), closeAfterStart:[]io.Closer(ni                                                                  l), closeAfterWait:[]io.Closer(nil), goroutine:[]func() error(nil), errch:(cha                                                                  n error)(nil), waitDone:(chan struct {})(nil)})
> DEBU[0001] Mounted /dev/sr0
> INFO[0001] trying to load /bootiso/rancheros/installer.tar.gz
23b9c7b43573: Loading layer [==================================================>]  4.23 MB/4.23 MB
fff53cdc3366: Loading layer [==================================================>] 14.96 MB/14.96 MB
d72d7d111782: Loading layer [==================================================>] 4.608 kB/4.608 kB
4535d0555c04: Loading layer [==================================================>] 18.07 MB/18.07 MB
b38bf5c54934: Loading layer [==================================================>] 1.636 MB/1.636 MB
593b827a89b6: Loading layer [==================================================>] 1.536 kB/1.536 kB
de1c486c654a: Loading layer [==================================================>]  2.56 kB/2.56 kB
284afe3082b6: Loading layer [==================================================>] 3.072 kB/3.072 kB
> INFO[0039] Loaded images from /bootiso/rancheros/installer.tar.gz
> INFO[0039] starting installer container for rancher/os-installer:latest (new)
> DEBU[0039] Run(&{/bin/system-docker [system-docker run --rm --net=host --privileged -v /:/host --volumes-from=all-volumes rancher/os-installer:latest -t generic -d /dev/nvme0n1 -i rancher/os-installer:latest -f --no-reboot -c /opt/user_config.yml --isoinstallerloaded=1 --debug] []  <nil> <nil> <nil> [] <nil> <nil> <nil> <nil> <nil> false [] [] [] [] <nil> <nil>})
Installing from rancher/os-installer:latest
mount: /dev/sr0 is write-protected, mounting read-only
> time="2017-04-06T22:59:55Z" level=error msg="mkfs.ext4: exit status 1"
> time="2017-04-06T22:59:55Z" level=error msg="formatdevice exit status 1"
> time="2017-04-06T22:59:55Z" level=error msg="formatAndMount exit status 1"
> time="2017-04-06T22:59:55Z" level=error msg="error layDownOS exit status 1"
> time="2017-04-06T22:59:55Z" level=fatal msg="Failed to run install" err="exit status 1"
> FATA[0043] Failed to run install                         err="exit status 1"

And here is the whole process, delete the partition from the last attempt, verify there are no partitions, run the installer, list the partitions, see that one was created:

 sudo fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 119.2 GiB, 128035676160 bytes, 250069680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0465e265

Device         Boot Start       End   Sectors   Size Id Type
/dev/nvme0n1p1 *     2048 250066943 250064896 119.2G 83 Linux
[rancher@rancher ~]$ sudo fdisk /dev/nvme0n1

Welcome to fdisk (util-linux 2.28).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
[rancher@rancher ~]$ sudo fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 119.2 GiB, 128035676160 bytes, 250069680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0465e265
bug -fer@rancher ~]$ sudo ros install -c cloud-config.yml -d /dev/nvme0n1 --de
> INFO[0000] No install type specified...defaulting to generic
Installing from rancher/os:v0.9.2-rc1
> INFO[0000] start !isoinstallerloaded
> DEBU[0001] blkid type of /dev/sr0: iso9660
> DEBU[0001] mount (&exec.Cmd{Path:"/bin/mount", Args:[]string{"mount", "-t", "iso9660", "/dev/sr0", "/bootiso"}, Env:[]string(nil), Dir:"", Stdin:io.Reader(nil), Stdout:io.Writer(nil), Stderr:io.Writer(nil), ExtraFiles:[]*os.File(nil), SysProcAttr:(*syscall.SysProcAttr)(nil), Process:(*os.Process)(nil), ProcessState:(*os.ProcessState)(nil), ctx:context.Context(nil), lookPathErr:error(nil), finished:false, childFiles:[]*os.File(nil), closeAfterStart:[]io.Closer(nil), closeAfterWait:[]io.Closer(nil), goroutine:[]func() error(nil), errch:(chan error)(nil), waitDone:(chan struct {})(nil)})
> DEBU[0001] Mounted /dev/sr0
> INFO[0001] trying to load /bootiso/rancheros/installer.tar.gz
> INFO[0048] Loaded images from /bootiso/rancheros/installer.tar.gz
> INFO[0049] starting installer container for rancher/os-installer:latest (new)
> DEBU[0049] Run(&{/bin/system-docker [system-docker run --rm --net=host --privileged -v /:/host --volumes-from=all-volumes rancher/os-installer:latest -t generic -d /dev/nvme0n1 -i rancher/os-installer:latest -f --no-reboot -c /opt/user_config.yml --isoinstallerloaded=1 --debug] []  <nil> <nil> <nil> [] <nil> <nil> <nil> <nil> <nil> false [] [] [] [] <nil> <nil>})
Installing from rancher/os-installer:latest
mount: /dev/sr0 is write-protected, mounting read-only
> time="2017-04-06T23:16:17Z" level=error msg="mkfs.ext4: exit status 1"
> time="2017-04-06T23:16:17Z" level=error msg="formatdevice exit status 1"
> time="2017-04-06T23:16:17Z" level=error msg="formatAndMount exit status 1"
> time="2017-04-06T23:16:17Z" level=error msg="error layDownOS exit status 1"
> time="2017-04-06T23:16:17Z" level=fatal msg="Failed to run install" err="exit status 1"
[rancher@rancher ~]$ sudo fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 119.2 GiB, 128035676160 bytes, 250069680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xfcecbb84

Device         Boot Start       End   Sectors   Size Id Type
/dev/nvme0n1p1 *     2048 250066943 250064896 119.2G 83 Linux
[rancher@rancher ~]$

Getting a little further. It looks like the lay-down-os script assumes the partition will be ${DEVICE}1. With the nvme drive, the first partition is ${DEVICE}p1, so I built my own image changing the partition to p1. Following your posts here GPT partition option.

Then had issues with the script telling me ‘-’ was an invalid option but I couldn’t find a bad option in my command line so I just ran the container manually and then ran the script.

Had to change -t syslinux to -t generic. Getting further, but I’m being plagued with the invalid option errors:

sudo ros install -t generic -i jaknoll/os-installer -c cloud-config.yml -d /dev/nvme0n1 --debug -f
Installing from jaknoll/os-installer
> INFO[0000] start !isoinstallerloaded
> DEBU[0001] blkid type of /dev/sr0: iso9660
> DEBU[0001] mount (&exec.Cmd{Path:"/bin/mount", Args:[]string{"mount", "-t", "iso9660", "/dev/sr0", "/bootiso"}, Env:[]string(nil), Dir:"", Stdin:io.Reader(nil), Stdout:io.Writer(nil), Stderr:io.Writer(nil), ExtraFiles:[]*os.File(nil), SysProcAttr:(*syscall.SysProcAttr)(nil), Process:(*os.Process)(nil), ProcessState:(*os.ProcessState)(nil), ctx:context.Context(nil), lookPathErr:error(nil), finished:false, childFiles:[]*os.File(nil), closeAfterStart:[]io.Closer(nil), closeAfterWait:[]io.Closer(nil), goroutine:[]func() error(nil), errch:(chan error)(nil), waitDone:(chan struct {})(nil)})
> DEBU[0001] Mounted /dev/sr0
> INFO[0001] trying to load /bootiso/rancheros/installer.tar.gz
> INFO[0044] Loaded images from /bootiso/rancheros/installer.tar.gz
> INFO[0044] starting installer container for jaknoll/os-installer (new)
> DEBU[0044] Run(&{/bin/system-docker [system-docker run --rm --net=host --privileged -v /:/host --volumes-from=all-volumes jaknoll/os-installer -t generic -d /dev/nvme0n1 -i jaknoll/os-installer -f --no-reboot -c /opt/user_config.yml --isoinstallerloaded=1 --debug] []  <nil> <nil> <nil> [] <nil> <nil> <nil> <nil> <nil> false [] [] [] [] <nil> <nil>})
+ getopts i:f:c:d:t:r:o:p: OPTION
+ case $OPTION in
+ ENV=generic
+ getopts i:f:c:d:t:r:o:p: OPTION
+ case $OPTION in
+ DEVICE=/dev/nvme0n1
+ getopts i:f:c:d:t:r:o:p: OPTION
+ case $OPTION in
+ DIST=jaknoll/os-installer
+ getopts i:f:c:d:t:r:o:p: OPTION
+ case $OPTION in
+ FILES=--no-reboot
+ getopts i:f:c:d:t:r:o:p: OPTION
+ case $OPTION in
+ CLOUD_CONFIG=/opt/user_config.yml
+ getopts i:f:c:d:t:r:o:p: OPTION
/scripts/lay-down-os: illegal option -- -
+ case $OPTION in
+ exit 1
> FATA[0044] Failed to run install                         err="exit status 1"

Final follow up - finally got things to install (I think) but unfortunately my BIOS doesn’t support booting from NVMe. When I get a chance I’ll migrate to another system that does and see if things were successful.

these errors look like you’re using RancherOS 0.7.1 or before - the installer was re-written in 0.8, so things are similar, but different.

From some of my posts you can see I started with 0.9.2-rc1.

I think that aside from the partition issue most of my problems were user error. I think in the rc1 version the lay-down-os script has a -p option to specify the partition, but its not fully plumbed out and the script overwrites it with ${device}1. I think a simple fix would be to check the last character of the device. If it ends in a digit, nvme0n1, append p1 for the partition. Otherwise, append 1.

See also https://github.com/rancher/os/issues/742

Like mentioned in the issue I struggled with this as well, but when looking at the link supplied by @sixcorners I noticed the partition flag. So, as a workaround you can specify the -p flag to specify the partition to install to.

I got is working on 1.0.0-rc1, But It is probably working in 1.0.0 as well.

Just to confirm, that did work with a later version

Here is what I did to get my NVME drive working.

sudo fdisk /dev/nvme0n1

delete any existing partitions with “d” till all deleted
new gpt "g"
new partition "n"
No "N"
write “w”

sudo mkfs.ext4 -F -i 4096 -O ^64bit -L RANCHER_STATE /dev/nvme0n1p1

sudo ros install -t gptsyslinux -c cloud-config.yml -d /dev/nvme0n1 -p /dev/nvme0n1p1