Rescue Procedure - Disaster Recovery for the Blue Ring MBW-II

To God Goes All The Glory!

How to recover data from a crashed MBWE-II


NOTE

These instructions are relevant to the blue ring - two drive model of the My Book World II NAS device.

If you have the "white light" version, (or any newer version of the MBW), these instructions may not work for you as written; so you may wish to try to find relevant recovery steps for your version.

If you find articles relevant to the newer versions, please leave me a private message, and I will update this page to point to the relevant page(s) for the newer MBW editions.


UPDATE

After the original article was written, I have learned more about the recovery process. I am going to include the additional information I have found, as well as fix some glaring errors (make that read "STUPID errors"), that I noticed re-reading this article.


Acknowledgments:

I want to acknowledge the help given me by Gabriel (who sure earned his name this time!) along with everyone else on these fora who posted their own experiences with the MBWE. Without your help I would have been SO SCREWED it would not be funny.

As we all know, there's really no excuse for inadequate backups. And of all people, I know better.

However, there I was with 30+ years of accumulated experience, tools, tricks, tips, software, etc. on a single drive - just waiting for Good 'Ole Mr. Murphy to come in and make a complete balls-up of everything. This data was both critical and irreplaceable, so "failure is NOT an option!"

There was no choice, I had to recover that data "regardless of cost or loss!" - even if it meant I had to go through those disks byte-by-byte with a disk editor.

I was damned lucky.
I was able to recover about 99% of my data, with the lost data being (relatively) easily replaced.

It did cost me though. I went through about $700.00, four tanks of gasoline, and a number of trips to my local (!!) Micro-Center to get parts and materials. Not to mention two weeks of acid-reflux.

I am taking the trouble to document what eventually succeeded for me - in the hope that it will help others avoid some of the mistakes *I* made.

Lastly, please excuse the length of this article. Even though I will make it as brief as possible, it was a long time in the telling, and it won't be told here in three lines.


Hardware Requirements:

  • Your hard drive must still be spinning, with the potential for recovering data

Obviously if your drive's platters have frozen solid and don't spin, or the drive is suffering from a gross mechanical defect - such as pieces rattling around inside - your chances of success plummet like a rock.

  • You will need a computer that you can exclusively dedicate to this task for awhile

"Awhile" might be measured in days, or even weeks. It took me two weeks of trial-and-error to get my data fully recovered.

  • You will need at least twice as many drives as there were drives in your MBWE

My device had two 500 gig drives, so I purchased four drives to rebuild data on.

  • Each drive will need to be at least twice the size of the drive you're trying to recover

Since I had two 500 gig drives, I purchased four 1T drives.

  • You will need a controller card - or available SATA space on your computer's motherboard - for the extra drives in addition to the drive(s) already in the system
  • You may need a replacement drive for the one that failed

Try to get as exact a replacement as possible. Western Digital, same size, same model series if possible, etc.


Software Requirements:

  • You will need a flavor of Linux compatible with your system and controller
    • Some people recommend the use of a "Live CD" for the recovery. I don't. I found it very convenient to be able to save log files, as well as some of the smaller data files, to my desktop. It's not so easy to do this with a "Live" CD.
    • Since you will need to download, install, save test artifacts and files, etc. etc. etc. I found it much easier to just do a flat "install from scratch" on the recovery system.
    • Additionally, the "Live" CD's I did try, (Ubuntu, Fedora, Knoppix), did NOT want to work with the SATA (RAID) card I bought. Chip revisions change, and sometimes the older drivers don't like the newer boards. I was able to get newer drivers, but only for Fedora, and they'd ONLY work on an "installed" system from the full-up install DVD - not the "Live CD" install.

(I upgraded / reinstalled from Fedora 8 (which the drivers were for) to Fedora 10 (the latest stable release as of this writing), and the additional drivers were not required. The Fedora 10 "Live CD" however did not work. Maybe they have to leave things off like "unusual" drivers on the CD? :-) )

  • You will need GParted

It usually does not come pre-installed on a new install, which I think is stupid, since it is one of the best tools you will find for working with disks. Because of that, you will need to download it to your system using your system's package manager

  • You will need ddrescue / dd_rescue / gddrescue

You will need to find, or download, a copy of the program "ddrescue". (It's called "dd_rescue" on some distributions and it's "gddrescue" on Ubuntu and related Ubuntu spins.) If your distribution does not come with that already, download and install it via your distribution's package manager.

  • You will need mdadm

This is commonly included in most recent distributions. If it's not included, you can download it via your distribution's package manager.

  • You will need a recent copy of the Western Digital Data Lifeguard Tools CD to make a boot floppy of the Western Digital Data Lifeguard "Diagnostics".
  • You will need to be on excellent terms with Lady Luck!

Or, as Scripture says: "The fervent effectual prayer of a righteous man availeth much."
And I'm not kidding.

If you're reading this, you are probably already in Deep Sneakers, and sinking fast. Luck, prayer, whatever, will be a primary constituent of your success.

My MBWE-II Configuration and Status as of the time of the repair

  • My system was set up as a LINEAR array - that is the two 500 gig drives in my system appeared to be one 1 terabyte drive.
    • Because of this - it is actually a RAID-0 I think - the data was striped across both drives. In this case, the failure of any one drive means the entire data store was garbage.
    • To recover this - if both drives are spinning! - I needed to copy off the data from both drives to somewhere I could work on it, and then try to "stitch" the two array halves back together again.
  • The Web Setup (admin) page for my system showed "Drive 'A' Failed"
    • When I used dd_rescue (described below) to image the "failed" drive, the system partitions showed a number of "bad blocks" - in this case, it turned out that they were not truly defective, but just corrupted beyond the ability of the operating system to repair them. However, because the system partitions had bad blocks, I had to hope that the system partitions on drive B (my hopefully "good" drive), would be intact enough to recover from.
    • In my case, it turned out that my "B" drive was "still good" - and as dd_rescue proved further down - I had no bad clusters on that drive - so I could try to use the system partitions from that drive to re-create the partitions on the "bad" drive.
      • Note that this will work regardless of which drive has failed. If your drive "A" is good, but drive ""B" is pooched, just swap the letters around as you read
  • I was able to prove - using the Western Digital Drive Diagnostics - that the "A" drive was actually not truly defective. That saved me from having to actually replace the drive. However, if that had been needed, the only difference would be to substitute the NEW hard drive for the OLD one when you begin the drive "A" rebuild process.
    • Update: Since this article was written, I have discovered that the failing "A" drive was actually flaky. Unfortunately, despite it not being reliable, the Western Digital Drive Diagnostics NEVER reported any drive error codes. Go figure.

Recovery Steps

Rule #1: Don't Touch That Drive!

You are already in trouble. Dinking around with the drive - potentially changing it's contents - will only make it worse.


Notes:

  • You need to be logged in as root to do any of this stuff.
    • In Linux, "root" = "God". Seriously! As root you can do - literally - anything you want, no matter how stupid or careless, at the drop of a hat.
    • As root, Linux assumes that you know exactly what you are doing. There are no prompts. There is no UAC, There are no "Do you really want to format your system drive?" messages. All it takes is the wave of a hand and the snap of your fingers. Snap your fingers one way and wonderful things happen. Snap them just slightly differently - well, that's a completely different story.
    • Now don't get me wrong here. I'm not trying to scare you off. What I am trying to do is caution you about the power you wield as root on any 'nix system. If you're confident that you can wield that power safely, then go right ahead. If this is your first time with Linux, or a complex hard drive repair, you may want to enlist the help of someone more experienced.
  • Be EXTREMELY CAREFUL with the "dd" and "dd_rescue" commands - they are extremely powerful and useful commands - but a tiny typo could render your drives, or your computer, a quivering wasteland.
  • For brevity, I have NOT included examples of every possible command used (i.e. "mount" "umount" "ls" etc) If you are not sure how to do this stuff, (or are not that familiar with Linux), as I said before, get help!

Prepare the new drives to receive the recovery data

  • Open a terminal session - or two! - and SU to root.
    • If you cannot SU directly to root, you may have to reset the root password. You do this by executing the following from a terminal session:
sudo passwd root
(here it will ask you for YOUR password)
password:  [here you type in the root password you want, "blank" won't work here.]
Re-enter password:  [here you re-type the same password you typed in before.]
security tokens updated successfully  [If you get this, everything is just fine now.]
su -  [su to the root account, using the root context]
password:  [type in the new password you created above

#  [this is the root prompt, (the normal user prompt is a "$").  Once you see this, you're God.]

You will need to be ROOT (super-user) for any of this to work. (see method above)
Each time you shutdown and restart the system, you need to re-open your terminal sessions and re-su to root.

  • Attach all the NEWLY PURCHASED drives, create one single partition on each of the individual drives, (so that we can use the entire drive as storage), and format as ext3. (If your distribution offers ext4, go ahead and use it.)
    • You can do this one-at-a-time, or you can attach all four of the new recovery drives to the controller, and format them all up there.
    • Shutdown and remove all formatted drives and set them aside carefully.

Copying the data off the damaged drive.

  • Install the drive that is NOT damaged, and view the partition table with Gparted and verify that the partition table is intact.
  • Your partition table should look like this:
    • Unallocated space. (This space is used to store individual system specific data, such as MAC address, serial number, etc.)
    • Partition #1, formatted as ext3. (This is the boot partition, with /boot, /root, etc. on it.)
    • Partition #2, formatted as swap (This is the system paging file.)
    • Partition #3, formatted as ext3 (This is the rest of the O/S, /var, etc.)
    • Partition #4, unknown format. (This is the data-store, don't modify or change this!)

These partitions will be essentially identical between the two drives on a two drive system - Linear array or mirrored.

  • Using dd_rescue, copy the "un-damaged" drive to a file on one of the new drives.
    • This will take a fairly long while - measured in hours.
    • Take note of any failed blocks. (cut-and-paste to a text file.)
    • Shutdown the system, turn it off, remove the new drive with the file, label it, and put it somewhere safe.
    • Attach another new drive.
    • Reboot.
Commands to do the above:

dd_rescue -l /home/**uname**/Desktop/B-logfile.txt -o /home/**uname**/Desktop/B-bbfile.txt -v /dev/sdb /recover/b/b-recover-disk

Don't type the "asterisks" (**)

uname = Your username (this is the path to your desktop)
l = logfile output
o = bad-block logfile output (you need both of these for repairs)
/dev/sdb = The physical device the drive is on

NOTE: The logical drive node location, (/dev/sdb in this example), is very likely to be different. When you use GParted to look at the partition tables, make a careful note of which device node is being used. (i.e. /dev/sdh, /dev/sdc, whatever)

/recover/b/b-recover-disk = the output filename for the extracted disk image.
(I mounted my "recovery" drives at a mount-point called "/recover" on my system,
and the recovery drives were mounted as "a" and "b" - so I had "/recover/a" and "/recover/b"
as the two recovery drives on my system.)

  • Using dd_rescue, copy the last partition from the "undamaged" drive to a file on the new drive.
    • This will also take a long while. Almost exactly as long as the first copy, since this is where most of the data lives.
    • Again, take note of any failed blocks. Hopefully you won't find any on the "2nd" drive during either copy.
    • Shutdown the system, turn it off, remove both the new drive (mark it and put it somewhere safe), and the "B" drive, label and put somewhere else safe.
Commands to do the above:

dd_rescue -l /home/**uname**/Desktop/B-logfile.txt -o /home/**uname**/Desktop/B-bbfile.txt -v /dev/**sdb4** /recover/b/b-recover-data

/dev/sdb4 = The 4th partition on device "sdb"  You can copy any partition by enumerating it here.
/recover/b/b-recover-data = the output file containing the **data** partition from drive "B".

Again, note that the drive node may be different on your system.

  • Attempt to copy data from the system area of the disk to a file on the last new hard disk
    • Hopefully, one of the two disks had the system partitions without errors.
    • Since the "system" portion of the disk is a constant size on the MBW-II, we can copy just the section we need.

The "dd" command used to make this copy was:

dd if=/dev/sdx of="./updated_MyBookWorld(a).bin" count=8112826

Note: A little further down, you will try to re-install this image to the replacement drive. Once you verify that this image file actually boots the system, copy it somewhere safe and place it in a directory called "images that work" or something like that. This way, if you ever need it again to recover a system, you already have it.

  • Add the failed drive to the system and attempt to verify partitions
    • Attach the failed drive ("A"), to the controller where the "B" drive was, and re-run the Gparted, partition verification step as noted above.
    • Shut down and turn off the system.

IF the "failed" drive's partition table is OK, continue with the next section.
IF the "failed" drive's partition table is NOT OK, continue with the steps below.

  • Use dd to copy the first 512 bytes from the disk with the good partition table.
  • Copy that file to the first 512 bytes of the "bad" disk to see if we can recover valid partition data.

Attempt to recover data from the failed drive

  • Attach the failed drive ("A"), to the controller where the "B"drive was, and attach another new drive.
  • Reboot the system.
  • Using dd_rescue, copy the last partition (/dev/sdx4) of the "A" drive to a file on the new disk.
    • Again, this will take a long while.
    • Also, take careful note of any bad blocks.
    • Shutdown the system, turn it off, remove and label the new drive, and put it away safely.
    • Attach the last new drive and reboot.
  • Shutdown the system, turn it off, remove and label the last new drive, and put it away safely, leaving the potentially defective drive attached.

At this point, you should have all the images you need.


Verify if the "failed" drive is really bad

  • At this point, the system should be shut down, with all the new drives removed, and the one failing drive still attached.
  • Boot the system using the "Diagnostics" floppy you created from the Western Digital Data Lifeguard CD.
  • Select the correct drive in your system.
  • Run the "Quick Test".
    • It is not necessary to run the "full" test.
    • If the drive passes the "Quick" test, repeat it a few times to verify that it always passes.
    • Ideally, each pass will return an error code of "0000"
  • If the drive passes, mark it so, and put it away.
  • If the drive fails, mark it so, and set it aside where you won't pick it up to use it.
    • Depending on how old the drive is, you may be able to take the failure code and go to Western Digital's web site and submit a claim for a warranty replacement. This way they send you a brand-new, excellent working drive for something like $9 to cover shipping costs.
    • The magnets out of a failed H/D make GREAT 'fridge magnets!
  • Replace it with the replacement drive you purchased, or go purchase one. Remember to get as exact a replacement as humanly possible.
  • Repeat this same exact procedure, substituting the other MBWE drive to verify it is OK.

Attempt to rebuild the damaged data array

  • Re-attach the data image drives and prepare to recover
    • Shutdown and turn off the system if not already shutdown.
    • Attach the two drives that have the two data-partition images on them in positions 1 & 2 on the controller.
    • Attach a blank drive - if available - as position #3.
    • Restart the system.
  • Mount the three drives in a convenient location
    • I will assume /recover/a, /recover/b, and /recover/c are the mount points.
    • I am also assuming that the drive with the drive "A" data image is first, the drive "B" data image is second.
  • Loop-mount the recovered data image files we created earlier.
    • I will assume that they're named "a-recover-data" and "b-recover-data"
    • Execute the following commands to loop-mount the two image files:
Commands to do the above:

losetup  /dev/loop0  /recover/a/a-recover-data
losetup  /dev/loop1  /recover/b/b-recover-data

This creates two "fake" (virtual) drives mounted on loop0 and loop1 that contain the contents of these two files.
Trick: You can loop-mount ANY valid file-system image - including things like cd/dvd ISO images, etc.

  • Merge the images into a copy of their original array
    • Execute the following command to re-create the original MBWE array structure:
Commands to do the above:

mdadm --assemble  /dev/md1  --force  /dev/loop0  /dev/loop1

This command takes the two loop-mounted array parts and (hopefully!) merges them into an array image similar to the one on the MBWE that the two drives came out of.

Hopefully the array built - and started! - correctly. If it didn't, I don't know how to help you here.

Assuming the array built correctly - mount /dev/md1 wherever convenient. (Let's assume /recover/md1)

Navigate to the mount point, and view the contents of the root of that "drive". If all has gone well, at this point you should see a filesystem containing folders and data - as you had it on the original MBWE.

If you successfully see a filesystem - congratulate yourself, take a deep breath, and perhaps take a short break.
If you don't have a filesystem here - I am not sure how to fix this. Not without messing with it myself.


Make a "backup" of the filesystem's apparent content.

  • Very Important!
    • Using "cp -R", copy the entire contents of the /dev/md1 mount point to the empty drive you have mounted at your third hard drive mount point.
    • This will take a while. Take careful note of any files that generate errors.
    • We do this because when we try to repair the two partition images, things might get destroyed.

Attempt to repair / recover the partition images

  • Un-mount the /dev/mdx filesystem

The fsck command gets really fussy if you try to repair a mounted drive.

  • Check array partitions for consistency
    • Execute the following command to verify the structure of the array partition's filesystem.
Commands to do the above:

fsec -t ext3 /dev/mdx -- -n -f -v

-n = Don't actually fix anything
-f = Force scan, even if screwy.
-v = Tell us a lot about what you see.
  • Again, remember to take careful note of any errors or issues seen.
  • In my case, there were a lot of "inode hash" errors.
  • Try a "real" fsck to clean up issues
    • This will discover if any of the issues disclosed were "serious" issues. (They probably are, but we can see if we get lucky…)
    • Execute the following command:
Commands to do the above:

fsck -t ext3  /dev/mdx  -- -D -p -f -v

D = consolidate and re-index directories.
p = "Preen" (auto-repair) non-critical problems.
f = Force checking
v = Tell us what's happening.
  • You may get a "/dev/mdx: Adding dirhash hint to filesystem" message when you start the "real" fsck. This is indicating that fsck is updating the partition to handle indexing properly. This is a non-problem.
  • When I did this, it still bailed out on me because "inode hash" issues are considered "critical" problems. What will happen is that if you force a file system repair, (and you will need to, trust me), the directories and/or files with the inode hash errors will be deleted and the space consumed returned to the free space pool.
  • Retry fsck forcing it to fix all errors found
    • We still need to completely clean up the issues found, so we must (at this point) force fsck to fix things.
    • Execute the following code to do this:
Commands to do the above:

fsck -t ext3  /dev/md1  -- -y -f -v

(note, we're omitting the "-D" here deliberately.)

y = force auto fix (answer any question "yes!")
  • Re-execute the same command again to verify all issues have been resolved.
  • Repeat until there are no more errors found.
  • Once everything is OK, re-run fsck again to optimize and re-index directories.
Commands to do the above:

fsck -t ext3 /dev/md1 -- -D -y -f -v

This does just like before, except the "-D" forces directory re-indexing and optimization again, which helps.

  • Stop the array
Commands to do the above:

mdadm --stop /dev/md1

Stop and take stock of things

Where we should be now

  • We should have two partition image files loop-mounted.
  • We should have them successfully assembled into an array.
  • We should have successfully run fsck on the array partition and cleaned up any errors.
  • We should have at least ONE good disk out of the two that came from the MBWE.
  • We should have at least ONE good system image from the two drives.
    • If you don't, you will need to download one and follow instructions to install it at a later step.

Begin the Rebuild Process

Begin rebuilding the two drives for the MBWE.

  • I am assuming that the "B" drive contained no bad blocks - and if there were, they are in the data partition, not the system partitions.
  • I am also assuming that we have a good drive "A", or a replacement, that may not have a good system image on it.
    • If this is not true - you do not have ANY good system images, skip the single step below, download a system image, and follow the instructions to install it on the two drives, creating the last (fourth) partition.
  • Using dd, copy the saved system image to drive "A". This will replace the bad/missing system partitions, and re-create the 4th partition for the data.
dd if="./updated_MyBookWorld(a).bin" of=/dev/sde  count=8112826

* If this works, remember to save this system image as a "known good" system image.

  • Using dd_rescue, copy the drive "A" data partition image that we fixed-up before, back to partition 4 of drive "A".
  • Using dd_rescue, copy the drive "B" data partition image that we fixed-up before, back to partition 4 of drive "B".
  • Once that is done, completely shut-down and turn off power.

Rebuild the MBWE

  • Re-install the hard drives
    • Replace the two side-rails on each hard drive (if you removed them)
    • Re-insert the two drives into the MBWE, remembering that drive "A" goes in the slot closest to the controller electronics.
    • Re-connect all connectors removed during MBWE tear-down.
  • Reconnect network and power
    • Re-attach the network cable to the MBWE.
    • Re-attach the power connector to the MBWE.
  • FIRE THAT PUPPY UP!! (and pray…)
    • Re-connect power.
    • Carefully monitor the front-panel lights.

At this point, the MBWE should boot, do a final internal fsck - which is indicated by the internal lit ring spinning - and then come fully back on-line.

  • Note If you replaced the system partitions with downloaded partition data, you may have to re-configure the MBWE to your needs.

Verify correct operation

  • Attempt to access the web setup page
    • Verify that the web-setup page works, and that the drive status is "OK"
    • Re-configure any settings that you need to change.
  • Attempt to access the pre-existing shares on the MBWE
    • Verify that the original shares on the MBWE exist, you can access them, and you can read-and-write data to them.
    • Note that any files or directories that were "corrected" during the fsck of the partition array above may not be there - you may have to replace this data. THAT is why I asked you to take notes!

Verify everything's correct, replace any lost data, return to service

  • Satisfy yourself that everything is back to normal, by shutting down the MBWE, re-booting it, etc.
    • You will probably notice that the MBWE is booting up - and serving files - much faster now than ever it did before.
    • This is a result of both cleaning up all the cruft and problems, as well as the consolidate, optimize, and re-index steps that we performed during the FSCK operations above.
  • Replace any necessary lost data
    • Replace any necessary lost data as noted during the FSCK passes above.
  • Return to Service
    • Return the MBWE to normal operational status.

Congratulate Yourself on a Job Well Done!

Final Notes

I sincerely believe that the primary cause for the hard-drive failure on my MBWE was that the cooling fan never came on, which caused the drives to overheat.

The MBWE-II comes with a fan that is supposed to both come on, and adjust it's speed, based on the internal temperature of the hard drives. Unfortunately for me, I do not believe that the stinkin' little fan EVER came on - even when the drives were hot enough for me to smell.

We all know the standard equation for hard drives and heat:

  • Hard Drive + Heat = Disaster

To mitigate the problem, I found a big fan attached to the bottom cover of a dead PC power supply. I hooked it up to a 12v wall adapter, turned it on, and placed my newly recovered MBW on top of it - after removing the hard drive compartment's lid, (actually the bottom metal panel) - so that there was airflow over the drives keeping them much cooler. Except that the fan, though powerful, was also driving me NUTS with it's noise.

The real fix is to modify the internal fan and temperature monitoring software as well as the associated scripts to make sure the fan is always on at some reasonable speed.


Prerequisites:

  • You will need the ability to actually log into the internal Linux operating system
    • First, you will need to download the PUTTY terminal emulator so that you can actually log into the MBWE, once the hacks are in place.
    • Next, follow the steps on the "First Steps with MBWE page" which will actually enable the access to your MBWE device.
    • I also strongly suggest that you look at the other updates on that page - steps two through five - as they will dramatically improve your experience with the MBWE
  • Look at the "Smart Spindown" article
    • I, personally, think the "Smart Spindown" feature is about as useful as Teats on a Boar Hog. However, we need to get ahold of the updated hdparm file, and the instructions to do so are located here.
      • I would NOT install the entire Smart Spindown package. In my case it simply messed up the operation of the fan, so I ended up disabling it.
      • I would follow the first two steps, so that you have installed the new hdparm driver, (if running Optware), or built it from scratch, (if running without Optware)

Modifying the fan's behavior

I followed the instructions given in this article: Alternative temperature monitor and fan control to modify the software and scripts needed to run the fan.

In this article's case, the author wanted to STOP the fan from running all the time. I wanted to START it running. So, after I finished the install, I made a few changes to the scripts he wrote. There are two scripts in particular, The "S20wdc-fan-test" script in /etc/init.d, and The "temperature_monitor" script in /usr/sbin.

I have in-lined both scripts in their entirety, so that you can understand exactly what I did to make this happen.

The specific changes I made are delineated by both explanatory comments, and rows of equal-signs (======)


The "S20wdc-fan-test" script in /etc/init.d:

The original "fan test" script caused the fan to start running at 100% (full speed), then at 50%, then 0% (stopped), leaving the fan stopped until something decided to start it - if ever. I changed the script to add a fourth step to restart the fan running at 60%, and leave it running until the rest of the system catches up with it.

I modified the original fan test script to read as follows:

#!/bin/sh
#
# Copyright (C) 2007 Western Digital
#
# This is a quickie fan test.
# This script assumes the appropriate kernel modules are loaded., e.g.,
# wdc-power-fan.ko

FAN_DEVICE=/sys/devices/platform/wdc-fan
FAN_SPEED=$FAN_DEVICE/speed

case "$1" in
    start|restart)
        # Wait a few seconds for the heat-monitor to get going,
        # then turn on the fan for testing. Test in the background
        # so we don't delay booting.

        {
            sleep 3
            echo 100 > $FAN_SPEED
            sleep 5
            echo 50 > $FAN_SPEED
            sleep 5
            echo 0 > $FAN_SPEED
#
# ===========================
#  I added this section
# 
            sleep 5
            echo 60 > $FAN_SPEED
#
# ===========================
# 
        } &

        ;;

    stop)
        # Nothing to stop!
        ;;

    *)
        echo $"Usage: $0 {start|stop|restart} where restart and start
are the same"
        exit 1
esac

exit $?

The "temperature_monitor" script in /usr/sbin:

The temperature monitor script (supposedly) replaces the original MBWE fan control logic, creating limit bands and specifying the speed that the fan should run at, based on those temperature bands.

The original scripts specified that at any temperature below 50o Celsius, (somewhere in the 110-120o or so range Fahrenheit), the fan should be off. ( ! ! )

I personally believe that if the hard drives get above about 25-30o Celsius, (a bit above room temperature), the fan should begin the active cooling process - as it's easier to prevent heat build-up than to reduce it after things have gotten too hot.

To do this, I did two things:

  • I changed the lower temperature band to allow the fan to turn off if the temperature goes below about 4o Celsius, (about 35o Fahrenheit), let it run at 50% speed until the hard drive temperature rises above 55o Celsius, and then run it at 100% of its rated speed.
  • I changed the actual temperature modification steps so that the fan is never dropped below 50% of it's rated speed.
#!/bin/sh
# An alternative temperature monitoring and fan control script for
# WD MyBook World Edition external hard drives
#
# Copyright (C) 2007 by kyyhkynen at gmail.com
#
# You may do with this file (and parts thereof) whatever you want, as long
# as my copyright notice is retained.
# last updated 2008/04/23
#
#######################################################################
#
# Configuration settings
#
# Disk(s) to monitor.
# If you have only one disk, leave DISK2 empty
# In that case, however, this script is pretty much useless.
DISK1=sda
DISK2=sdb

# hdparm location. If you are using the optware version,
# you have to change this into /opt/sbin/hdparm
# HDPARM=/sbin/hdparm

# ====================================
# I added this line and commented out the line above
# 
HDPARM=/opt/sbin/hdparm
#
#=====================================

# Temperature limits for speeds.
# ====================================
# ** I've changed the low to 50 or so **
# When the temperature of the disk(s) drop below TEMP_LIMIT_0, the fan speed is set to 0
# When the temperature of the disk(s) is between the limits, the fan speed is set to 50
# When the temperature of the disk(s) is above TEMP_LIMIT_50, the fan speed is set to 100

TEMP_LIMIT_0=4  # the original "low" temp limit was something like +50 Celsius.  I made it +4
TEMP_LIMIT_50=55

# ====================================

# The amount of time (in minutes) to wait before turning the fan off after the disk(s) have been spun down.
SPINDOWN_WAIT=10

# Logging stuff
LOG_ENABLED=false
LOG_INTERVAL=5
LOGGER=echo

#
# End of configuration. You shouldn't need to touch lines below this :)
#
#######################################################################

# Device name(s) for the disk(s).
DEVNAME1=/dev/${DISK1}
DEVNAME2=/dev/${DISK2}

# The file controlling the fan speed
FAN_SPEED=/sys/devices/platform/wdc-fan/speed

# Get current fan speed
CURRENT_FAN_SPEED=`cat ${FAN_SPEED}` ;

LOG_COUNTER=0
SPINDOWN_COUNTER=0

while [[ /bin/true ]]; do

        TEMPERATURE_DISK1=$(smartctl -d ata -A ${DEVNAME1}|grep Tempera|cut -c 88-90) ;

        $HDPARM -C $DEVNAME1 |grep active >/dev/null
        if [ "$?" == "0" ] ; then
                STATUS_DISK1=u ;
        else
                STATUS_DISK1=d ;
        fi ;

        if [ -z "$DISK2" ] ; then
                TEMPERATURE_DISK2=$TEMPERATURE_DISK1 ;
                STATUS_DISK2=$STATUS_DISK1 ;
        else
                TEMPERATURE_DISK2=$(smartctl -d ata -A ${DEVNAME2}|grep Tempera|cut -c 88-90) ;
                $HDPARM -C $DEVNAME2 |grep active >/dev/null
                if [ "$?" == "0" ] ; then
                        STATUS_DISK2=u ;
                else
                        STATUS_DISK2=d ;
                fi ;
        fi ;

        if [ $STATUS_DISK1 = "d" -a $STATUS_DISK2 = "d" ] ; then
                if [ $CURRENT_FAN_SPEED -gt 0 ] ; then
                        SPINDOWN_COUNTER=$((SPINDOWN_COUNTER+1)) ;
                        if [[ $SPINDOWN_COUNTER -ge $SPINDOWN_WAIT ]] ; then
                                echo 100 > $FAN_SPEED ;
                                CURRENT_FAN_SPEED=100 ;
                                SPINDOWN_COUNTER=0 ;
                        fi ;
                fi ;
        else
                if [ $TEMPERATURE_DISK1 -ge $TEMP_LIMIT_0 -o $TEMPERATURE_DISK2 -ge $TEMP_LIMIT_0 ] ; then
                        if [ $TEMPERATURE_DISK1 -ge $TEMP_LIMIT_50 -o $TEMPERATURE_DISK2 -ge $TEMP_LIMIT_50 ] ; then
                                echo 100 > $FAN_SPEED ;
                                CURRENT_FAN_SPEED=100 ;
                        else
                                if [ $CURRENT_FAN_SPEED -lt 50 ] ; then
                                        echo 100 > $FAN_SPEED ;
                                        sleep 2 ;
                                fi ;

#  ============================================================
#   I modified this section and changed the "default" fan speeds to 70% here and 50% below
#   This way, no matter what happens, the fan will always be running at - at least - 50%
# 
                                echo 70 > $FAN_SPEED ;
                                CURRENT_FAN_SPEED=70 ;
                        fi ;
                else
                        echo 50 > $FAN_SPEED ;
                        CURRENT_FAN_SPEED=50 ;
#
# =============================================================
# 
                fi ;
        fi ;

        LOG_COUNTER=$((LOG_COUNTER+1));
        if [[ $LOG_COUNTER -ge $LOG_INTERVAL ]] ; then
                $LOG_ENABLED && ${LOGGER} "${TEMPERATURE_DISK1}${STATUS_DISK1} ${TEMPERATURE_DISK2}${STATUS_DISK2} ${CURRENT_FAN_SPEED}" ;
                LOG_COUNTER=0 ;
        fi ;
        sleep 60 ;
done

Having done all this, the fan speed never drops below 50%, no matter what. The result: A MBWE where the drives stay nice and cool.


What say ye?

Jim

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License