[Unionfs] RHEL 2.6.9 hangs with unionfs-1.1.5
Benoit Guillon
guillon at thalescomputers.fr
Thu Nov 30 12:07:54 EST 2006
Josef Sipek wrote:
>On Wed, Nov 29, 2006 at 06:33:09PM +0100, Benoit Guillon wrote:
>
>
>>Ok, I can reproduce the same problem with a server and a diskless node
>>having the same ix86 architecture. I use the crazy sparse file for this.
>>My conclusions so far is that it is not related to architecture but to
>>the sparse file. I attach the kernel trace (it's always the same thing,
>>pointing to the BUG_ON at unionfs_debugmacros.h:291).
>>
>>To reproduce this, I do the following on the diskless node once booted:
>>
>>echo 1 >> /var/log/lastlog
>>
>>I still continue to investigate. I'd like to create a sparse file from
>>scratch that reproduces the problem.
>>
>>
>
>dd if=/dev/zero of=file bs=1024k seek=10 count=1
>
>This makes a sparse file, 11MB in size, where the first 10MB make up a hole.
>
>
Thanks for the hint. With this command I've seen some interesting
things. With this simple script several sparse files are created in a
read-only layer:
#! /bin/sh
for i in 100 1000 10000 100000 1000000; do
dd if=/dev/zero of=sparse$i.bin bs=1M seek=$i count=1
done
It gives:
[guest at node1 log]$ ll
total 5260
-rwxr-xr-x 1 root root 115 Nov 30 16:44 build_sparse
-rw-r--r-- 1 root root 1048577048577 Nov 30 16:57 sparse1000000.bin
-rw-r--r-- 1 root root 104858648576 Nov 30 16:44 sparse100000.bin
-rw-r--r-- 1 root root 10486808576 Nov 30 16:44 sparse10000.bin
-rw-r--r-- 1 root root 1049624576 Nov 30 16:44 sparse1000.bin
-rw-r--r-- 1 root root 105906176 Nov 30 16:44 sparse100.bin
[guest at node1 log]$ du -kh sparse100*
1.1M sparse1000000.bin
1.1M sparse100000.bin
1.1M sparse10000.bin
1.1M sparse1000.bin
1.1M sparse100.bin
The stack is unionfs mounted, NFS exported, and the diskless node
(node2) boots on this file system.
[root at node1 diskless]# ssh node2
root at node2's password:
-bash-3.00#
-bash-3.00# cd /var/log/
-bash-3.00# ll
...
-rw-r--r-- 1 root root 105906178 Nov 30 2006 sparse100.bin
-rw-r--r-- 1 root root 1049624578 Nov 30 2006 sparse1000.bin
-rw-r--r-- 1 root root 10486808578 Nov 30 2006 sparse10000.bin
-rw-r--r-- 1 root root 104858648578 Nov 30 2006 sparse100000.bin
-rw-r--r-- 1 root root 1048577048578 Nov 30 2006 sparse1000000.bin
-bash-3.00# du -kh sparse100*
1.1M sparse100.bin
1.1M sparse1000.bin
1.1M sparse10000.bin
1.1M sparse100000.bin
1.1M sparse1000000.bin
Then I cat 2 characters to sparse1000:
-bash-3.00# echo 1 >> sparse1000.bin
It takes a while to finish and... the file is no more a sparse file !
-bash-3.00# du -kh sparse100*
1.1M sparse100.bin
1002M sparse1000.bin
1.1M sparse10000.bin
1.1M sparse100000.bin
1.1M sparse1000000.bin
Incidently the giga byte file is created in the COW directory.
Now, doing this on sparse10000.bin then freezes the server, with always
the same kernel trace (attached).
I guess one thing wrong is that the sparse nature is not respected by
unionfs. Do you need some details about how things are mounted or
exported? Except providing such kind of information I can hardly do
further investigations. Can you reproduce the problem?
Thanks,
--
Benoît Guillon guillon at thalescomputers.fr
TRT/SML tel. : 33 (0)4 98 16 33 90
THALES RESEARCH & TECHNOLOGY
-------------- next part --------------
Nov 30 17:51:13 node1 kernel: ------------[ cut here ]------------
Nov 30 17:51:13 node1 kernel: kernel BUG at /tmp/build.mmm/unionfs-tools-1.1.5/unionfs_debugmacros.h:291!
Nov 30 17:51:13 node1 kernel: invalid operand: 0000 [#1]
Nov 30 17:51:13 node1 kernel: Modules linked in: unionfs(U) i915 nfsd exportfs lockd nfs_acl sunrpc i2c_dev i2c_core ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod md5 ipv6 uhci_hcd ehci_hcd hw_random e1000 ext3 jbd
Nov 30 17:51:13 node1 kernel: CPU: 0
Nov 30 17:51:13 node1 kernel: EIP: 0060:[<f8e9d6c9>] Not tainted VLI
Nov 30 17:51:13 node1 kernel: EFLAGS: 00010246 (2.6.9-34.EL)
Nov 30 17:51:13 node1 kernel: EIP is at unionfs_d_revalidate+0x1a79/0x1b00 [unionfs]
Nov 30 17:51:13 node1 kernel: eax: 00000000 ebx: f084c094 ecx: f7aa5048 edx: 00000003
Nov 30 17:51:13 node1 kernel: esi: 00000002 edi: f5167800 ebp: f084c094 esp: f66f3d44
Nov 30 17:51:13 node1 kernel: ds: 007b es: 007b ss: 0068
Nov 30 17:51:13 node1 kernel: Process nfsd (pid: 2717, threadinfo=f66f3000 task=f671d970)
Nov 30 17:51:13 node1 kernel: Stack: f8ed9424 f8ed89e2 0000002c 00000006 f8edbe7b f8ee2600 f8ed89e2 f8ed9424
Nov 30 17:51:13 node1 kernel: 0000002c f8ed6bdc f8edb918 00000002 000000f4 00000008 f8edcf57 f8edcc60
Nov 30 17:51:13 node1 kernel: 00000000 00000001 00000001 00000000 f6b374e0 f66a0180 f66a018c f8ed90eb
Nov 30 17:51:13 node1 kernel: Call Trace:
Nov 30 17:51:13 node1 kernel: [<f8ed6bdc>] fist_print_file+0x17c/0x210 [unionfs]
Nov 30 17:51:13 node1 kernel: [<f8ed0d52>] unionfs_file_revalidate+0x132/0x14a0 [unionfs]
Nov 30 17:51:13 node1 kernel: [<c030fbf0>] __cond_resched+0x14/0x3b
Nov 30 17:51:13 node1 kernel: [<f8e9ef20>] unionfs_write+0x0/0x240 [unionfs]
Nov 30 17:51:13 node1 kernel: [<f8e9efae>] unionfs_write+0x8e/0x240 [unionfs]
Nov 30 17:51:13 node1 kernel: [<f8e9ef20>] unionfs_write+0x0/0x240 [unionfs]
Nov 30 17:51:13 node1 kernel: [<c0169091>] do_readv_writev+0x1c5/0x21d
Nov 30 17:51:13 node1 kernel: [<c0167dc1>] __dentry_open+0xca/0x16a
Nov 30 17:51:13 node1 kernel: [<c0167cf2>] dentry_open+0x48/0x4d
Nov 30 17:51:13 node1 kernel: [<c0169167>] vfs_writev+0x3e/0x43
Nov 30 17:51:13 node1 kernel: [<f8b37600>] nfsd_write+0xeb/0x28f [nfsd]
Nov 30 17:51:13 node1 kernel: [<c030fbf0>] __cond_resched+0x14/0x3b
Nov 30 17:51:13 node1 kernel: [<f8b3eee6>] nfsd3_proc_write+0xbf/0xd5 [nfsd]
Nov 30 17:51:13 node1 kernel: [<f8b40f94>] nfs3svc_decode_writeargs+0x0/0x243 [nfsd]
Nov 30 17:51:13 node1 kernel: [<f8b33947>] nfsd_dispatch+0xba/0x16f [nfsd]
Nov 30 17:51:13 node1 kernel: [<f8add8ec>] svc_process+0x432/0x6da [sunrpc]
Nov 30 17:51:13 node1 kernel: [<f8b335eb>] nfsd+0x2a7/0x549 [nfsd]
Nov 30 17:51:13 node1 kernel: [<f8b33344>] nfsd+0x0/0x549 [nfsd]
Nov 30 17:51:13 node1 kernel: [<c01041dd>] kernel_thread_helper+0x5/0xb
Nov 30 17:51:13 node1 kernel: Code: a4 94 ed f8 e9 aa fc ff ff 0f 0b 4a 00 a4 94 ed f8 0f 0b 41 00 a4 94 ed f8 e9 1d f2 ff ff 0f 0b 44 00 a4 94 ed f8 e9 86 f1 ff ff <0f> 0b 23 01 a4 94 ed f8 e9 24 fc ff ff 0f 0b 44 00 a4 94 ed f8
Nov 30 17:51:13 node1 kernel: <0>Fatal exception: panic in 5 seconds
More information about the unionfs
mailing list