I am dealing with Cloud 5 and have some trouble with creating instances. When creating a VM I assign a floating IP to that VM, if I have multiple VMs running in the same network, they can ping each other. But why does the VM itself still have no IP? When I log into an instance, I have to change the network settings manually to be able to login via SSH from console. That seems not very practical, is that really the workflow? I thought the IP address would be kind of “injected” automatically, but it’s not, am I right?
I just wanted to post my finding from last week. We are fighting with our cloud environment and have an open Service Request for an issue where openvswitch causes problems.
I didn’t count how many times I re-installed our cloud nodes from scratch in the last two weeks, but last week happened something strange. I don’t know why and I couldn’t reproduce it since then, but one installation worked and I had the chance to see what the network configuration is supposed to look like.
The difference between our settings when I posted this question and last week lies in the neutron barclamp. When we set up our first environments we used “linuxbridge” as Modular Layer 2 mechanism drivers, but the instances had no IP and there was a lot of modification required to get it all working as desired. Then we tried openvswitch with vlan, but we are facing major problems while trying to deploy nova barclamp. As I already mentioned, we got it working only once, but it’s not reproducable yet. Anyway, with openvswitch the instances get their IPs injected and you can see it directly in the login prompt. If you configure your security groups and rules correctly you can access the instances via SSH etc. So if - and only if - you get your cloud working with openvswitch, the cloud is able to handle all the traffic correctly and fully automated.
I just wish that there was a hint or a recommendation or a description which settings should be used when deploying neutron. Anyway, now I understand the description in Figure 12.2: http://docs.openstack.org/openstack-ops/content/network_troubleshooting.html
The configuration on control (network) and compute nodes looked exactly like described.
Now I just need to get the cloud working again, which seems to be quite difficult, our SR is open for weeks now.
a minor update on the issue: We’ve switched back to “linuxbridge” and found out that the VLAN interface on the compute node doesn’t get created automatically, and hence isn’t added to the bridge, apparently. We’re using bonding of two Ethernet adapters, maybe that’s part of the problem.
We’re working with SUSE to resolve the issue and will report back once our cloud is operational.
Just a little comment on latest findings:
The problems with our compute node are a result of our attempt to have our compute node running with SLES12. When we install SLES11 on compute node, too, the cloud works perfectly! All instances get their IPs, you can assign floating IPs and access them via SSH (if you configured the security rules). We are still working with Support to get it running with SLES12.
Update: We found the solution!
Support pointed me to the right direction, I had to edit the file /usr/lib64/python2.6/site-packages/neutron/plugins/linuxbridge/agent/linuxbridge_neutron_agent.py to add a log entry, but I couldn’t see more output. In the file linuxbridge_neutron_agent.py on compute node with SLES11 is a hint:
I compared the linuxbridge_neutron_agent.py on SLES11 with the same file on SLES12 (the directory is a little different: /usr/lib/python2.7/site-packages/neutron/plugins/linuxbridge/agent) and strangely, this direction was not considered on SLES12. The directory used here was /sys/devices/virtual/net. If you compare contents of the directories you’ll see the difference:
root@d0c-c4-7a-06-71-f0:/sys/devices/virtual/net # ll
drwxr-xr-x 6 root root 0 Aug 24 09:02 bond0
drwxr-xr-x 7 root root 0 Aug 24 09:03 brqe423e6c2-54
drwxr-xr-x 5 root root 0 Aug 24 09:01 lo
But if you look into /sys/class/net:
root@d0c-c4-7a-06-71-f0:/sys/class/net # ll
lrwxrwxrwx 1 root root 0 Aug 24 09:02 bond0 -> ../../devices/virtual/net/bond0
-rw-r--r-- 1 root root 4096 Aug 24 09:02 bonding_masters
lrwxrwxrwx 1 root root 0 Aug 24 09:03 brqe423e6c2-54 -> ../../devices/virtual/net/brqe423e6c2-54
lrwxrwxrwx 1 root root 0 Aug 24 09:02 eth0 -> ../../devices/pci0000:00/0000:00:01.1/0000:02:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Aug 24 09:02 eth1 -> ../../devices/pci0000:00/0000:00:01.1/0000:02:00.1/net/eth1
lrwxrwxrwx 1 root root 0 Aug 24 09:01 lo -> ../../devices/virtual/net/lo
So I edited linuxbridge_neutron_agent.py to use the correct directory and now it works! Support says the fix will make it into the Update channels soon.