Hi all, I can connect to 1Tb MBWE via ssh and samba. but overnight the MBWE becomes unavailable and i can't reconnect the smba shares. When this happens the MBWE can't be found on the network. SSH/SCP continues to work fine but samba seems to fail. nmbd is still running (ps -A) but can't connect to smbd. both on vista32 and XP machines. reboot fixes it but not permanently. any help really appreciated.
Sounds similar to my experience. My recently purchased 1Tb MBWE is connected to my home network and appears as a device in the Microsoft Windows Network in my workgroup and the shares created on the MBWE are visible and accessible from Windows.
After a while of either copying files or idle time, the shares become unavailable and return an 'access denied' error to windows even in the middle of copying files. The Shared Storage Manager web interface (port 80) remains available at all times.
A port scan shows that after a clean boot the MBWE provides a 'netbios-ssn Samba smbd' service on port 139 to the network. When the MBWE fails, the service on port 139 shows only 'tcpwrapped'.
In this state the MBWE shows no shares in windows explorer and no files are accessible. The device has to be re-booted to return it to a useable state.
WD aren't helping - anyone got any ideas?
I'm seeing similar behavior on my new 1TB MBWE. I have to reboot it almost every day, which is very strange for a Linux based device. I have a couple of linux based routers (WRT54G) that go for weeks without a problem. Also, I have a couple of Linux PCs using Samba that just keep working.
I had the same issue every night again. Even FTP wouldn't work. The problem with FTP was that it couldn't write to /var/log/vsftpd.log because /dev/md3 became read-only overnight….. I will try these solutions and let you know if it worked!!
yes, strange, service seems to die after a set time. i wonder if it is in the firmware?
I have 02.00.15 firmware.
i have sshd running from inittab, and have disabled mionet in post_network_start.sh as kindly suggested in this forum.
i haven't done anything else as all i need right now are samba shares on my local network and ssh/sftp access from outside.
does anyone know where samba is started from? i notice that smb.conf is in /etc not in /var/oxsemi as others have it. would this suggest my samba install is different to others?
I wonder if installing a new samba might fix things, though it seems silly to have 2 installations.
anyway if anyone has any further suggestions??
and thanks for the site, otherwise i would have a useless doorstop that fell over everytime i tried to copy a decent amount of files over. (formatting md4 fixed it)
I was having the same problem. I noticed that /dev/md3 was mounted read only (this mounts to /var and is where the logs are written to). I had to resort to using 'umount /dev/md3 -l' to unmount the volume. I'm sure some linux guru will tell me the error of my ways, but heck it worked for me. Ran 'fsck.ext3 /dev/md3' and fixed the problems that it found. Ran it a second time for good measure and it came up clean. Rebooted the box and it's been working fine now for 2 days straight.
OK - following the advice on this forum I've installed SSH to get a closer look at this thing. From the log I see these errors when the drive disappears:
user.alert kernel: journal_bmap: journal block not found at offset 3084 on md4
user.err kernel: Aborting journal on device md4.
user.warn kernel: __journal_remove_journal_head: freeing b_frozen_data
user.crit kernel: ext3_abort called.
user.crit kernel: EXT3-fs error (device md4): ext3_journal_start_sb: Detected aborted journal
user.crit kernel: Remounting filesystem read-only
Trawling the posts on this forum and looking on other linux sources suggests that there is a problem with the combination of samba, an ext3 filesystem, and raid which commonly gives this result. i.e. md4 remounts as read-only and access is denied.
Running fsck on md4 results in an error on Pass 1 which is:
Inode 8, i_blocks is 262416, should be 106768.
Again I think this is reported by others.
I'm wondering whether this is maybe:
1) a corrupt partition from a disk image in manufacturing, or
2) a more fundmental problem from the combination of ext3 and raid, or
3) a journalling problem - maybe processing speed, or
4) a samba version issue
- but why raid on a single disk is a puzzle?
OK, I'm on my last nerves on this… I have exactly the same issue as all of you above. I have only one share for testing (PUBLIC). It used to crap out after some files have been copied - reformatting of /dev/md4 did help me on this issue. But the accessibility dissapearing overinght has still not been fixed. Mionet disabled, sshd as permanent, no issues reported after fsck scan of /dev/mda4…
I've raised a ticket with WD, will let you know what their answer will be.
ah, btw: I'm running on 2.00.18
hmm OK, I can't access MyBook through SMB, but SSH works fine. Here is what I've noticed:
/dev/root on / type ext3 (rw,noatime,data=ordered)
proc on /proc type proc (rw)
devpts on /dev/pts type devpts (rw)
sys on /sys type sysfs (rw)
/dev/md3 on /var type ext3 (ro,noatime,data=ordered)
/dev/md4 on /shares/internal type ext3 (rw,noatime,data=ordered)
usbfs on /proc/bus/usb type usbfs (rw)
Why on earth does /dev/md3 turn to read only mode?
Any ideas how can I scan this share? When I try to unmount to run fsck I get an error
umount: Couldn't umount /var: Device or resource busy
EDIT:
Here's what I found:
dmesg | grep md3
<6>md: created md3
<6>raid1: raid set md3 active with 1 out of 2 mirrors
<4>EXT3-fs warning (device md3): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
<4>EXT3-fs warning (device md3): ext3_clear_journal_err: Marking fs in need of filesystem check.
<6>EXT3 FS on md3, internal journal
OK it seems that I should reformat /md3 (worked for my issues with /md4). Therefore the question: is there any step by step instruction what should I do? /md3 is mounted with /var so from what I know there's no way to unmount it while working…
EDIT #2:
I spotted "umount /dev/md3 -l" couple of posts ago, but this didn't work for me (umount: forced umount of /var failed!)
EDIT #3:
OK this time I managed to force unmounting and ran fsck - fixed couple of errors. Will let you know tomorrow if this has worked for me…
OK, it seems that running fsck on /md3 actually DID work. I could normally use the disk in the morning and overnight I've uploaded about 120 GB of data… Problem solved?
No, actually not solved at all.
When i returned home there it was again:
[root@WDMyBook ~]# mount
/dev/root on / type ext3 (rw,noatime,data=ordered)
proc on /proc type proc (rw)
devpts on /dev/pts type devpts (rw)
sys on /sys type sysfs (rw)
/dev/md3 on /var type ext3 (ro,noatime,data=ordered)
/dev/md4 on /shares/internal type ext3 (rw,noatime,data=ordered)
usbfs on /proc/bus/usb type usbfs (rw)
It seems that running a fsck on /dev/md3 didn't cure my problem either. I eventually had to format /dev/md3. Follow the guide under troubleshooting/hardware but leave out the 'rm' commands. You'll have to # out the entry for md3 in the /etc/fstab file and issue a reboot command to be able to format md3. Your objective is to tar everything in var (complete with permissions), unmount and reformat /dev/md3, remount /dev/md3 and untar everything back to var. I suggest storing the archive on /dev/md4.
My wb has been running for 4 days now with no problems
Thanks darkhorse.
I actually plan to attach it to my desktop PC and run full format.
Anyway if I'd go without full format? Here's the list of commands I understand I'd have to run:
cd /var
tar cpsf /root/var.tar *
umount /dev/md3
mkfs.ext3 -m 0 /dev/md3
mount /dev/md3
cd /var
tar xf /shares/internal/PUBLIC/var.tar
rm /shares/internal/PUBLIC/var.tar
Can someone confirm if the above is correct?
Anyway here's a strange issue: I've left music player running for over 26 hours straight taking music directly from MyBook. At this stage:
- The music still plays
- I CAN access it from windows normally from the PC where music is being played
- I CAN'T access the disk from my laptop or my girlfriend PC
- I did access it through SSH and I can see that /md3 has gone to read only mode again…
I just gone done reformatting my /dev/md3 partition following Darkhouse's suggestion. It's too early to tell if that will resolve the samba going away issue, but I sure hope it does.
I had the same issue, with /dev/md3 mounting itself into read-only mode, despite correct entry in fstab.
Reformatting /dev/md3 helped. My MyBook is running straight for more than a week with 150+GB being copied onto it without any issues.
After reformatting md4 and then md3 it's finally looking like it's working the way it should have out of the box! I've copied 500GB onto it so far with no problems. It's slow, but I didn't buy it for speed just the fact that my PC can be in bits for a number of days without the rest of the household complaining that they can't refill their ipods or watch any films.
[root@nas ~]# uptime
22:58:54 up 4 days, 16:29, 0 users, load average: 0.00, 0.00, 0.00
Radziecki - Your almost there. Here is a list of what needs doing.
Make sure you are running as root
cd /var
tar cpsf /shares/internal/var.tar *
I got hold of winscp (Windows program) and transferred the var.tar file to my PC just in case something went wrong.
/etc/init.d/network_servers.sh stop
umount /dev/md3
mkfs.ext3 -m 0 /dev/md3
mount /dev/md3 /var
cd /var
tar xf /shares/internal/var.tar
/etc/init.d/network_servers.sh start
And hopefully you have a working system as well :)
When your sure it's working
rm /shares/internal/var.tar
Thanks to alephsmith, Timon_vdm, smarty and hangon for their comments\advice
Not to be a pain, but I think the convention is to prepend a # to lines of code at the command prompt and put them in monospaced font.
You can use the button that looks like i++ to input code blocks. This will also ensure that commands are properly escaped and are not parsed by the wikisystem.
The chmod 1777 /tmp was due to a mistake in my instructions (see 25th June post by me) which caused /tmp to get overwritten with one that was stored in the var.tar file.
I corrected that mistake in the instructions above.
Just a note:
Instead of:
cd /var
tar cpsf /shares/internal/var.tar *
...
cd /var
tar xf /shares/internal/var.tar
I did the simpler:
cp -a /var /shares/internal/PUBLIC/
...
cp -a /shares/internal/PUBLIC/var /
It worked fine.
yep, I agree with these observations. /dev/md3/ drops to ro rather than rw and the shares are not accessible. So, from /etc/inet.d execute:
sh network_servers.sh stop
umount /dev/md3
and then with a swapfile configured, run fsck which confirms disk errors on md3. After this:
mount /dev/md3
sh network_servers.sh start
will bring things back to normal - but only for a while! Repeating this doesn't help - the errors recur and the device soon becomes read-only again.
I decided to follow the good advice in this thread and re-format /dev/md3 and this seems to have fixed the problem!
However, I think the restore of the /var files as described by darkhorse may result in them ending up in root rather than /var - anyway for me they did. After cleaning up things look fine except that lighttpd doesn't start up but gives the errors:
2008-06-20 19:54:36: (mod_fastcgi.c.807) bind failed for: /tmp/lighttpd.fcgi.socket-0 0 Permission denied
2008-06-20 19:54:36: (mod_fastcgi.c.1200) [ERROR]: spawning fcgi failed.
2008-06-20 19:54:36: (server.c.673) Configuration of plugins failed. Going down.
I guess it's permissions (somewhere!) but no luck tracking it down yet. Any ideas what might cause this with lighttpd and how to fix?
OOPS! did the exact same thing. didn't read your post first. extracted var.tar to root and that's where it went. cleaned it up and lighttpd failed.
I did a chmod o+w tmp from root and it worked. i'm not sure what the original permissions were for the tmp folder.
BTW most things in Darkhorse's post worked for me to fix samba with the exception of:
tar cpsf /shares/internal/var.tar * (wildcard at end)
/etc/init.d/network_servers.sh stop (as per smarty, to stop all network services, some individual services complained otherwise.)
i probably needed to cd /var before extracting var.tar as they ended up in root.
firmware 02.00.15
THANKS ALL !! i woke up this morning, fired up laptop and brick still running!!
I'm not sure how I made such a basic error with the tar extraction process, but apologies to smarty and hangon for causing complications for you both.
I've updated the guide with the information from both of your posts.
Incidentally my /tmp directory has the permissions:
rwxrwxrwt
well i tried the steps suggested by DarkHorse but i can't get the mybook to start again after the first reboot. The device won't start or it is lost on my network with another ip-adres…… So i think editing fstab and restarting is not a good idea. Any suggestions how i can fix this??
EDIT: Well i found the device with the following NMAP command: nmap -p 22 192.168.1.1-254. But i think it's a better idea to unmount the device rather then to restart it….
Timon - Is your Mybook set up to get it's IP address from DHCP? What firmware is your mybook running?
But i think it's a better idea to unmount the device rather then to restart it…
I couldn't find a way to unmount /dev/md3 once the system had mounted it and was using it, hence the reason to edit /etc/fstab. If you know of a way to unmount it without the edit + reboot then please let me know and I'll edit my guide.
No my mybook had a fixed IP wich probably was stored somewhere in the /var folder. If you use DHCP it will probably stick with the same IP (depends on the router).
You can unmount /dev/md3 by first killing processes that use the /var folder. Killing Samba and ftp did the trick.
But besides that your fix worked out fine! Thanx!
You are absolutely right Timon. After a bit more investigation it seems that the ip settings are stored in /var/oxsemi/network-settings.
I'll edit my guide to show the processes that need to be stopped before /dev/md3 can be dismounted.
Thanks for your feedback.
well, at least /var hasn't gone read-only - this time it's /shares/internal i.e. /dev/md4 that has dropped to ro after 4Gb of backup has been written. Rather strange after 4Gb but I'm guessing that fsck will show errors on /dev/md4 and that a reformat will be required to fix it.
After the 'reformat fix' on /dev/md3 the share has remained visible. A step forward I think! And thanks for the /tmp directory permissions darkhorse. Correcting that has fixed the lighttpd problem (for reference: chmod 1777 /tmp)
Anybody got a step-by-step for a re-format of /dev/md4 ?
Reformatting /dev/md4 is more or less the same as reformatting /dev/md3. I recommend removing all the shares using the web interface before reformatting the partition to prevent any weirdness later on.
umount /dev/md4
mkfs.ext3 -m 0 /dev/md4
mount /dev/md4 /shares/internal
mkdir /shares/internal/PUBLIC
reboot
The reboot fixes the permissions on PUBLIC.
I'm cautiously coming to the view that MBWE drives are liable to have errors out of the box - certainly it looks like mine did and others experiences shared on this site tend to suggest the same.
Reformatting both /dev/md3 and /dev/md4 as discussed in this thread has improved the device from a drive that would become unusable for seemingly unpredictable reasons (both /var and /shares/internal switching to RO) to a NAS that will store files as advertised.
It's accessible as a windows share across my network without the mionet software, as you would expect. I now know that's it's based on the busybox distro (which I shouldn't have to!) and my long forgotten linux knowledge is gradually resurfacing! Thanks for the help.
It's looks to me like the master image they use to clone onto the MBWE drives has some corruption.
The MBWE is marketed as a easy to use shared network storage….If it worked the way it was supposed to out of the box it would be. If it wasn't for the fact that I do IT for a living and don't like declaring something 'broken' until I've spent at least 12 hours searching the web for possible solutions, I would have got a refund. The people I feel really sorry for are the ones that don't want to know how it works, they just want it to work, which it doesn't.
I'm a sad geek at heart. I quite enjoyed fixing my MBWE and take a certain amount of pride in the fact it now works the way it is supposed to. I also freely admit that I didn't discover any of the solutions that I applied or wrote a guide on. I simply took other peoples work and modified it to fix the problems I was having.
I agree - this isn't consumer electronics by any stretch of the imagination! At last I've managed to backup my home server (66Gb) and transfer our 'resources files' (12Gb) to our MBWE NAS as planned some weeks ago now with no probs. It shouldn't be this difficult.
I have a 1 TB white light edition and it keeps stopping torrent downloads. I also have problems copying large files. I have tried to umount /dev/md4 and umount /dev/md3 but I get the same error.
[root@Myserver ~]# umount /dev/md4
umount: Couldn't umount /dev/md4: Invalid argument
I have a copied 59 GB to the drive so far and I think I need to correct the problems I am having before I go further. What all do you guys think I should do?
I know just enough linux to get myself in trouble.
thanks in advance.
Moltra