Monday, January 25, 2010

Procedure to add a swap file

Procedure to add a swap file

You need to use dd command to create swapfile. Next you need to use mkswap command to set up a Linux swap area on a device or in a file.

a) Login as the root user

b) Type following command to create 512MB swap file (1024 * 512MB = 524288 block size):

# dd if=/dev/zero of=/swapfile1 bs=1024 count=524288

c) Set up a Linux swap area:

# mkswap /swapfile1

d) Activate /swapfile1 swap space immediately:

# swapon /swapfile1

e) To activate /swapfile1 after Linux system reboot, add entry to /etc/fstab file. Open this file using text editor such as vi:

# vi /etc/fstab

Append following line:

/swapfile1 swap swap defaults 0 0

So next time Linux comes up after reboot, it enables the new swap file for you automatically.

g) How do I verify swap is activated or not?

Simply use free command:

$ free -m

Wednesday, January 13, 2010

Recover a dead hard drive using dd

The Unix program dd is a disk copying util that you can use at the command line in order to make a disk image. It makes a bit-by-bit copy of the drive it's copying, caring nothing about filesystem type, files, or anything else. It's a great way to workaround the need for Norton Ghost.

Normally, in order to make a disk image, the disk you're copying from has to be able to spin up and talk -- in other words, it's OK to make a copy if the disk is healthy. But what happens when your disk is becoming a doorstop? As long as it continues to spin, even with physical damage on the drive, dd and Mac OS X will get you out of the fire.

We had a situation recently where a friend sent a disk to us that had hard physical errors on it. It would boot in Windows, but then it would hit one of these scratch marks and just die. We fired up dd, and it started OK, but stopped at the same physical error location -- complaining about a Hard Error.

So the workaround was to designate the dd mode as noerror -- which just slides over the hard stops, and to add the mode sync, which fills the image with nulls at that point. We did it on BSD Unix, but as long as you can get the hard drive attached to your Mac, the command is the same:
dd bs=512 if=/dev/rXX# of=/some_dir/foo.dmg conv=noerror,sync
The bs=512 designates block size, and the if=/dev/rXX# is the UNIX path to the actual disk device. Make sure that the chosen directory (some_dir) has enough room to take the entire disk image -- which will be equal to the size of the drive. Since dd doesn't care about the contents of the drive, it copies every bit on the thing, so you get an image equal to the disk's capacity. A really big file. One workaround is to put it on a RAID array.

Once you've established the disk image (in this example, foo.dmg), you're almost home. Here's where your Mac OS X box is far and away the best thing to have. In this example, the dd output file is foo.dmg. You have to realize that this is an exact copy of a busted drive, but the "holes" are filled with nulls. As long as the damage isn't to the boot sector, though, when you double-click on it, Mac OS X mounts it without breathing hard ... who cares if it's FAT32, NTFS, whatever.

Due to the size of the image that we were copying, we put it on a RAID array, and had to access the image over the network -- it still mounted fine. In straight UNIX, if you try to mount a disk image, it complains that there is "no block device" and fails. Once your image is mounted, it appears in your Finder, and then it's easy work to retrieve the critical files from the image -- usually things like .doc files and .xls files and the lot.

Finally, since your disk is actually dying, once you have your image, you can drop it to tape or something and you've not only recovered your files, you've made a viable backup as well. Once again, that which destroys a Windows box becomes a play thing to a Mac OS X box.


Help:
http://www.macosxhints.com/article.php?story=20050302225659382
http://www.unix-tutorials.com/search.php?act=search&term=Recover+a+dead+hard+drive+using+dd
http://tldp.org/LDP/LG/issue46/nielsen.html
http://www.cgsecurity.org/wiki/Damaged_Hard_Disk
http://www.debianadmin.com/recover-data-from-a-dead-hard-drive-using-ddrescue.html
http://www.pcstats.com/articleview.cfm?articleid=1583&page=5

Diagnostic Tools:
http://www.dataclinic.co.uk/scandisk-chkdsk-disk-checking-repair-software.htm
http://www.pcstats.com/articleview.cfm?articleid=1583&page=2
http://www.pcstats.com/articleview.cfm?articleid=1583&page=9


Different Types Of Hard Disk Failure:

http://www.dataclinic.co.uk/hard-disk-failures.htm
http://www.streetdirectory.com/travel_guide/124429/hardware/computer_hard_disk_failure___what_can_i_do_to_recover_my_data.html
http://data-recovery.mirandasbeach.com/
http://www.hard-drive-recovery-software.com/blog/hard-disk-failure-types-and-hard-drive-recovery-possibilities/

Raid:
http://en.wikipedia.org/wiki/RAID
http://raidcalculator.icc-usa.com/

Minimizing Hard Disk Drive Failure and Data Loss:

http://en.wikibooks.org/wiki/Minimizing_Hard_Disk_Drive_Failure_and_Data_Loss

Linux Software Raid How-To with GRUB Bootloader

Install and Prep

Do an install and create the individual software raid partitions on the
drives first and is easiest to deselect the available drives that you don't
want the partition on first while creating the individual partitions on
specific drives (ie sda1 sdb1 with type as software raid) Then do the raid
button in disk druid and it will only have these 2 partitions available and
set your raid level and ext3 and your partition label and then ok. Tag the
next 2 partitions and follow the same process as above.

The reason for being so meticulous is that this way the raid partitions are
matching on the drives and not out of order. If you just create partitions
and then let disk druid do whatever you wind up with them all out of order
and makes a big headache when replacing drives and initiating rebuilds on
partitions. (Trust me on this).

After install is complete we will then create a directory on root called
/raidinfo and will place some information in there that will make replacing
drives and rebuilding them a lot easier if you have a failure.

Prepping the System in Case of Drive Failure

1st is to backup the drive partition tables and is rather simple command:

#sfdisk -d /dev/sda > /raidinfo/partitions.sda
#sfdisk -d /dev/sdb > /raidinfo/partitions.sdb

Do this for all your drives that you have in the system and then you have
backup configuration files of your drive's partitions and when you have a
new drive to replace a previous drive it is very easy to load the partition
table on the new drive with command:

#sfdisk /dev/sda < /raidinfo/partitions.sda

This partitions the new drive with the exact same partition table that was
there before and makes all the rebuild and raid partitions all in the exact
same place so that you don't have to edit any raid configuration files at
all.

PROBLEM: When you do the install is that grub is only installed on id0 if
you have scsi drives and /dev/hda for ide drives. So if either of these
drives fails and you pull it, you have problems.

SOLUTION: You need to run grub and install onto all other drives that are
part of raid that the /boot partitions are on. At present only raid1 is
supported to put the /boot partition on with linux software raid.

Installing GRUB Onto Drives that have /boot Raid Partitions

Type grub from prompt and will put you in grub's shell:
#grub
grub>

grub>find /grub/stage1
This will give you where all the grub setup files are located. On my raid1
mirror system listed:
(hd0,0)
(hd1,0)

Redhat specifically mounts your /boot partition as the root partition for
grub and is redhat specific. What this is listing are the locations of root
for grub which in fact is grub syntax for where the /boot partition is
located.
Sda=hd0 sdb=hd1 for scsi and hda=hd0 and hdb=hd1 for ide and the second
number specifies the partition number and 0 is the starting number so this
is specifying that it found grub setup files on sda1 and sdb1 scsi or hda1
and hdb1 ide.

Now you want to make sure that grub gets installed into the master boot
record of your additional raid drives so that if id0 is gone then the next
drive has a mbr loaded with grub ready to go. Systems will automatically go
in drive order for both ide and scsi and use the first mbr and active
partitions it finds so you can have multiple drives that have mbr's as well
as active partitions and it won't affect your system booting at all.

So using what was shown with the find above and already knowing that hd0
already has grub in mbr, we then run:

Grub>device (hd0) /dev/sdb (/dev/hdb for ide)
Grub>root (hd0,0) and then:
Grub>setup (hd0)

Grub will then spit out all the commands it runs in the background of setup
and will end with a successful embed command and then install command and
end with .. succeeded on both of these commands and then Done returning to
the grub> prompt.

Notice that we made the second drive device 0. Why is that you ask?
Because device 0 is going to be the one with mbr on the drive so passing
these commands to grub temporarily puts the 2nd mirror drive as 0 and will
put a bootable mbr on the drive and when you quit grub you still have the
original mbr on sda and will still boot to it till it is missing from the
system.

You have then just succeeded in installing grub to the mbr of your other
mirrored drive and marked the boot partition on it active as well. This
will insure that if id0 fails that you can still boot to the os with id0
pulled and not have to have an emergency boot floppy.

Replacing Drives Before or After Failure

Remember the entries above in preparing the system of saving the partition
tables? Well they will come handy here and make life much easier in case
you have to replace a drive.

There is a lot of mythology about linux software raid and it being difficult
as well as not supporting hot swap of drives. That is excactly what it is,
a myth. Since most scsi controllers are adaptec and the adaptec driver
supports hotswap functionality in the linux scsi layer, you can echo the
device in and out of the realtime /proc filesystem within linux and it will
spin down and spin up the corresponding drive you pass to it.

To get the syntax to pass to the proc filesystem you perform:

#cat /proc/scsi/scsi

It will list all scsi devices detected within linux at the moment you pass
the command. Let's say you have a smart alert being raised from the aic7xxx
module and you want to replace the drive and you are running software raid1
on the system.

Output of the /proc/scsi/scsi is:

Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: Seagate Model: and so on
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: Seagate and so on.

Id 1 has been the offender of the smart alerts in messages and you have
another exact duplicate of the drive to replace it with.

You will need to pass this command to remove id1 from the system:

#echo "scsi remove-single-device 0 0 1 0" > /proc/scsi/scsi

You will then get a message stating that it is spinning down the drive and
will then message what dev it is as well and will tell you when it has
completed. Once completed you can remove the drive and then replace with
the new one.

The reverse is the case to spin the drive up and make it available to linux:

#echo "scsi add-single-device" 0 0 1 0 > /proc/scsi/scsi

Will give the dev name and say the drive is spinning up and then give the
queue and tag information and then give ready and the size and device name.

You would then use the sfdisk command to reload the partition table back
onto the drive. (Word of caution, if the system was rebooted and came up
with id0 missing, id1 is now sda and not sdb)

#sfdisk /dev/sdb < /raidinfo/partitions.sdb

This will autopartition the drive to exactly what it was partitioned before
and now you are ready to recreate the raid partitions that were previously
on id1.

Running Raid Commands to Recover Partitions

The commands to rebuild or remove raid partitions are rather simple. You
have to understand where and what to look at to see the health of the raid
partitions as well as which /dev/sda and /dev/sdb partitions are assigned to
what md raid device. You see that within the /proc filesystem just like you
did with scsi. The filename is mdstat and is in the root of /proc:

#less /proc/mdstat

md1 : active raid1 sda1[1]
40064 blocks [2/1] [_U]

It will list all the md partitions which are actually the raid partitions
and the raid counterpart of /sdxx. Notice the [2/1] which indicates that
one of the mirrored partitions is missing which is what was on the other
drive which was sdb1. The [1] after sda1 denotes that this is id1 drive.
Adding the drive partition back to the array is accomplished with raidhotadd
command:

#raidhotadd /dev/md0 /dev/sdb1

Now it shows with the less /proc/mdstat

Md1 : active raid1 sdb1[0] sda1[1]
40064 blocks [2/2] [UU]

Now both drives are shown with id0 being first which is sdb1 and the [UU]
designates that both partitions are up and [_U] before meant the first
partition was missing and not up. You can queue up all the partition
rebuilds and it will process them one by one in order of queue up. It will
then show both of the partitions with the ones that are still queued will
show a sda5[2] with the 2 denoting that it is a spare drive waiting to
build.

If you've put a new drive in, don't forget to do the grub commands above to
place grub in the mbr of the drive so that you can boot to it later if
necessary.

Congratulations you have now installed, worked with, and restored software
raid in linux.

Replacing A Failed Hard Drive In A Software RAID1 Array

This guide shows how to remove a failed hard drive from a Linux RAID1 array (software RAID), and how to add a new hard disk to the RAID1 array without losing data.

I do not issue any guarantee that this will work for you!

1 Preliminary Note

In this example I have two hard drives, /dev/sda and /dev/sdb, with the partitions /dev/sda1 and /dev/sda2 as well as /dev/sdb1 and /dev/sdb2.

/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0.

/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.

/dev/sda1 + /dev/sdb1 = /dev/md0

/dev/sda2 + /dev/sdb2 = /dev/md1

/dev/sdb has failed, and we want to replace it.

2 How Do I Tell If A Hard Disk Has Failed?

If a disk has failed, you will probably find a lot of error messages in the log files, e.g. /var/log/messages or /var/log/syslog.

You can also run

cat /proc/mdstat

and instead of the string [UU] you will see [U_] if you have a degraded RAID1 array.

3 Removing The Failed Disk

To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb2 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).

First we mark /dev/sdb1 as failed:

mdadm --manage /dev/md0 --fail /dev/sdb1

The output of

cat /proc/mdstat

should look like this:

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[2](F)
24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]

unused devices:

Then we remove /dev/sdb1 from /dev/md0:

mdadm --manage /dev/md0 --remove /dev/sdb1

The output should be like this:

server1:~# mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

And

cat /proc/mdstat

should show this:

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]

unused devices:

Now we do the same steps again for /dev/sdb2 (which is part of /dev/md1):

mdadm --manage /dev/md1 --fail /dev/sdb2

cat /proc/mdstat

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[2](F)
24418688 blocks [2/1] [U_]
unused devices:
mdadm --manage /dev/md1 --remove /dev/sdb2

server1:~# mdadm --manage /dev/md1 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2

cat /proc/mdstat

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0]
24418688 blocks [2/1] [U_]

unused devices:

Then power down the system:

shutdown -h now

and replace the old /dev/sdb hard drive with a new one (it must have at least the same size as the old one - if it's only a few MB smaller than the old one then rebuilding the arrays will fail).

4 Adding The New Hard Disk

After you have changed the hard disk /dev/sdb, boot the system.

The first thing we must do now is to create the exact same partitioning as on /dev/sda. We can do this with one simple command:

sfdisk -d /dev/sda | sfdisk /dev/sdb

You can run

fdisk -l

to check if both hard drives have the same partitioning now.

Next we add /dev/sdb1 to /dev/md0 and /dev/sdb2 to /dev/md1:

mdadm --manage /dev/md0 --add /dev/sdb1

server1:~# mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: re-added /dev/sdb1

mdadm --manage /dev/md1 --add /dev/sdb2

server1:~# mdadm --manage /dev/md1 --add /dev/sdb2
mdadm: re-added /dev/sdb2

Now both arays (/dev/md0 and /dev/md1) will be synchronized. Run

cat /proc/mdstat

to see when it's finished.

During the synchronization the output will look like this:

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/1] [U_]
[=>...................] recovery = 9.9% (2423168/24418688) finish=2.8min speed=127535K/sec

md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/1] [U_]
[=>...................] recovery = 6.4% (1572096/24418688) finish=1.9min speed=196512K/sec

unused devices:

When the synchronization is finished, the output will look like this:

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]

unused devices:

That's it, you have successfully replaced /dev/sdb!



Help:

http://www.unix-tutorials.com/go.php?id=795

http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array#comment-93



Friday, January 8, 2010

Initrd/ Initial Ramdisk

"Initrd" is the name of the "initial ramdisk" feature of Linux. With this, you have your loader (probably LILO or Grub) load a filesystem into memory (as a ramdisk) before starting the kernel. When it starts the kernel, it tells it to mount the ramdisk as the root filesystem. You put the disk device driver for your real root filesystem and all the software you need to load it in that ramdisk filesystem. Your startup programs (which live in the ramdisk) eventually mount the real (disk) filesystem as the root filesystem. Note that a ramdisk doesn't require any device driver.

This does not free you, however, from having to bind into the base kernel 1) the filesystem driver for the filesystem in your ramdisk, and 2) the executable interpreter for the programs in the ramdisk.

 Linux Interview  Linux booting process EXT4 XFS file system runlevel scan disk hba driver systool -c fc_host lspci -nn | grep -i hba single...