Thursday, February 3, 2011

Migrating to pv-grub kernels for kernel upgrades

After the release of the Ubuntu 10.04 LTS images that use the pv-grub kernel, there were some questions on what to do if you're running from an older AMI and want to take advantage of what pv-grub offers.

Heres a short list of how you might be affected:

  • If you are running an EBS root instance launched from or rebundled from an Official Ubuntu image of 10.04 LTS (Lucid Lynx) released 20110201.1 or later, 10.10 (Maverick Meerkat), Natty Narwhal, or later, then you do not need to do anything. You already are using pv-grub, and can simply apply software updates and reboot to get new kernel updates. You can stop reading now.
  • If you are running an instance-store instance that is not using a pv-grub kernel, there is nothing you can do. There is simply no way to change the kernel of an instance store instance.
  • If you are running an EBS-root based instance rebundled from a Ubuntu 9.10 (Karmic Koala) or older, then there is currently no supported path to getting kernel upgrades. There were no officially released EBS-root based Ubuntu images of 9.10, and with Karmic's end of life coming in April, there is not likely to be support for this new feature.
  • If you are running an EBS-root instance launched from or rebundled from an official Ubuntu release of 10.04, read on.

Updating a 10.04 based image basically entails 2 steps, setting up /boot/grub/menu.lst, and then modifying your instance to have a pv-grub kernel.


Step 1: installing grub-legacy-ec2.

If you launched or rebundled your instance from an Ubuntu 10.04 numbered 20101020 or earlier, you need to do this step. If you started from a release of 20101228 you can skip this step.

  • Apply software updates.

    Depending on how out of date you are, this might take a while.


    sudo apt-get update && sudo apt-get dist-upgrade

  • Install grub-legacy-ec2

    The 'grub-legacy-ec2' package is what the images use to manage /boot/grub/menu.lst. If you had used Ubuntu prior to the default selection of grub2, you will be familiar with how it works. grub-legacy-ec2 is basically just the menu.lst managing portion of the Ubuntu grub 0.97 package with some EC2 specifics thrown in.

    To get a functional /boot/grub/menu.lst, all you have to do is:


    sudo apt-get install grub-legacy-ec2


Step 2: modifying the instance to use pv-grub kernels

Now, your images should have a functional /boot/grub/menu.lst, and grub-legacy-ec2 should be properly installed such that future kernels will get automatically added and selected on reboot. However, you have to change your instance to boot using pv-grub rather than the old kernel aki that you originally started with.
  • Shut down the instance

    The best way to do this is probably to just issue '/sbin/poweroff' inside the instance. Alternatively, you could use the ec2 api tools, or do so from the AWS console.

    % sudo /sbin/poweroff

  • Modify the instance's kernel to be a pv-grub kernel

    Once the instance is moved to "stopped" state, you can modify its kernel to be a pv-grub kernel. The kernel you select depends on the arch and region. See the table below for selecting which you should use:
    regionarchaki id
    ap-southeast-1x86_64aki-11d5aa43
    ap-southeast-1i386aki-13d5aa41
    eu-west-1x86_64aki-4feec43b
    eu-west-1i386aki-4deec439
    us-east-1x86_64aki-427d952b
    us-east-1i386aki-407d9529
    us-west-1x86_64aki-9ba0f1de
    us-west-1i386aki-99a0f1dc

    Then, assuming you have $AKI represents the appropriate aki above, and $IID represents your instance id, and $REGION represents your region, you can update the instance and then start it with:

    $ ec2-modify-instance-attribute --region ${REGION} --kernel ${AKI} ${IID}
    $ ec2-start-instances --region ${REGION} ${IID}


Your instance will start with a new hostname/IP address, so get that out of describe-instances and ssh to your instance. You can check that it has worked by looking at /proc/cmdline. Your kernel command line should look something like this:


$ cat /proc/cmdline
root=UUID=7233f657-c156-48fe-8d60-31ae6400d0cf ro console=hvc0


In the future, your instance will now behave much more like a "normal server". If you apply software updates (apt-get dist-upgrade) and reboot, you'll boot into a fresh new kernel.

12 comments:

  1. Probably a really dumb question, but what about the /proc/cmdline output means success. I had updated to a newer kernel as described in another of your articles, and I did't see the kernel change in uname -r.

    ReplyDelete
  2. Tested and works well. Thanks.

    ReplyDelete
  3. Hi Scott,

    Thanks for putting this guide together. I'm hopeful it will help me to easy the upgrade burden for my EC2 servers.

    I just completed everything you described here but I'm still getting the following notice when I login into the my server:

    60 packages can be updated.
    30 updates are security updates.

    A newer build of the Ubuntu lucid server image is available.
    It is named 'release' and has build serial '20101228'.

    However, now when I do a 'sudo apt-get update && sudo apt-get dist-upgrade' it says there is nothing to upgrade.

    ubuntu@...:~$ uname -r
    2.6.32-314-ec2
    ubuntu@...:~$ cat /proc/cmdline
    root=UUID=53b9a4cb-3e23-4050-af81-a2c378040fff ro console=hvc0
    ubuntu@...:~$ sudo apt-get update

    ubuntu@...:~$ sudo apt-get dist-upgrade
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    Calculating upgrade... Done
    0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

    Have I missed something? I'm not sure why the login notice is contradicting the apt-get output.

    Thanks for you help!

    ReplyDelete
  4. Hey Scott, What does it mean when you get something like this in the console output? This is happening to me on two instances, and I've been struggling for a couple days now to figure out what's causing it. It looks like the instance can't find the kernel or something, and is just falling back to the grub prompt:

    ******************* BLKFRONT for device/vbd/2051 **********
    backend at /local/domain/0/backend/vbd/186/2051
    Failed to read /local/domain/0/backend/vbd/186/2051/feature-barrier.
    Failed to read /local/domain/0/backend/vbd/186/2051/feature-flush-cache.
    1835008 sectors of 0 bytes
    **************************
    [H
    [J
    GNU GRUB version 0.97 (1781760K lower / 0K upper memory)
    [ Minimal BASH-like line editing is supported. For
    the first word, TAB lists possible command
    completions. Anywhere else TAB lists the possible
    completions of a device/filename. ]
    grubdom>
    [9;10H

    ReplyDelete
  5. I did everything you mentioned in your post. Executing the ec2-modify-instance-attribute command resulted in this:

    Unknown host: 'https://ec2.west-1.amazonaws.com'

    Nevertheless, the Kernel ID changed and everything seems to be fine. But after SSHing into my instance I still get

    A newer build of the Ubuntu lucid server image is available.
    It is named 'release' and has build serial '20110719'.

    Any ideas? Thanks.

    ReplyDelete
  6. @Ulf,
    I suspect that you did 'REGION=west-1' rather than REGION=us-west-1 (which is why you see the 'Unknown host' error).
    Regarding the "newer image available", this is http://pad.lv/653220 . You can safely ignore this.

    ReplyDelete
  7. My instance is not coming back up. ec2-get-console-output shows:

    [H[J Booting 'Ubuntu 10.04.3 LTS, kernel 2.6.32-318-ec2'

    root (hd0)
    Filesystem type is ext2fs, using whole disk
    kernel /boot/vmlinuz-2.6.32-318-ec2 root=UUID=7233f657-c156-48fe-8d60-31ae6400
    d0cf ro console=hvc0
    initrd /boot/initrd.img-2.6.32-318-ec2

    xc_dom_probe_bzimage_kernel: kernel is not a bzImage
    can only boot x86 64 kernels, not xen-3.0-x86_32p

    Error 13: Invalid or unsupported executable format

    Press any key to continue...

    ReplyDelete
  8. Tim O,
    You assigned the 64 bit pv-grub kernel to a 32 bit instance.

    ReplyDelete
  9. Scott,

    You're right. For some reason, I thought it was a 64-bit instance when it's actually 32-bit. I switched to using the 32-bit kernel and it works now. Thanks!

    ReplyDelete
    Replies
    1. Hi,
      1. I have installed slackware14.0 (64bit) in my local machine.
      2. I have created the 10 gb image space in slackware machine using below command.
(dd if=/dev/zero of=slack14.img bs=1M count=10075). and mount the image in slack14.img.
mount -o loop slack14.img /mnt/slack1464.
      3. I have format the image (slack14.img)
      4. I have installed the custom package through ruby script. The custom package for
installed without any error.
      After that While login the mounting image (/mnt/slack1464). and I have installed the package.

      and bundle the image with this aki (aki-427d952b) and upload to amazon.

      While start the instance.

      I am unable to login the server, I am getting below error

      6535502.145187] ip_tables: (C) 2000-2006 Netfilter Core Team
      [6535502.145235] TCP cubic registered
      [6535502.145244] NET: Registered protocol family 17
      [6535502.245110] XENBUS: Device with no driver: device/console/0
      [6535502.247428] EXT3-fs: barriers not enabled
      [6535502.257460] EXT3-fs (xvda1): mounted filesystem with writeback data mode
      [6535502.257484] VFS: Mounted root (ext3 filesystem) readonly on device 202:1.
      [6535502.257779] Freeing unused kernel memory: 484k freed
      [6535502.257953] kjournald starting. Commit interval 5 seconds
      [6535502.471724] mount used greatest stack depth: 4296 bytes left
      [6535512.662690] touch used greatest stack depth: 4120 bytes left
      [6535576.762574] xenbus_dev_shutdown: device/console/0: Initialising != Connected, skipping
      [6535577.114477] Restarting system.

      earlier I have created the .img image in (slackware13.1 & 13.37) without any error.

      But I am getting the error in Slackware14.0 64bit and 32bit only.

      Thanks for advance.
      BY
      
DAVID

      Delete
    2. David,
      I'm not sure. It would take looking at it more, but It looks like your kernel got loaded, even mounted root. It looks like you may not have the correct 'console=' parameter and you're losing data after /sbin/init is invoked, but not sure.

      Delete