[Unionfs] RHEL 2.6.9 hangs with unionfs-1.1.5

Benoit Guillon guillon at thalescomputers.fr
Thu Nov 30 12:07:54 EST 2006


Josef Sipek wrote:

>On Wed, Nov 29, 2006 at 06:33:09PM +0100, Benoit Guillon wrote:
>  
>
>>Ok, I can reproduce the same problem with a server and a diskless node 
>>having the same ix86 architecture. I use the crazy sparse file for this. 
>>My conclusions so far is that it is not related to architecture but to 
>>the sparse file. I attach the kernel trace (it's always the same thing, 
>>pointing to the BUG_ON at unionfs_debugmacros.h:291).
>>
>>To reproduce this, I do the following on the diskless node once booted:
>>
>>echo 1 >> /var/log/lastlog
>>
>>I still continue to investigate. I'd like to create a sparse file from 
>>scratch that reproduces the problem.
>>    
>>
> 
>dd if=/dev/zero of=file bs=1024k seek=10 count=1
>
>This makes a sparse file, 11MB in size, where the first 10MB make up a hole.
>  
>
Thanks for the hint. With this command I've seen some interesting 
things. With this simple script several sparse files are created in a 
read-only layer:

#! /bin/sh
for i in 100 1000 10000 100000 1000000; do
  dd if=/dev/zero of=sparse$i.bin bs=1M seek=$i count=1
done

It gives:
[guest at node1 log]$ ll
total 5260
-rwxr-xr-x  1 root root           115 Nov 30 16:44 build_sparse
-rw-r--r--  1 root root 1048577048577 Nov 30 16:57 sparse1000000.bin
-rw-r--r--  1 root root  104858648576 Nov 30 16:44 sparse100000.bin
-rw-r--r--  1 root root   10486808576 Nov 30 16:44 sparse10000.bin
-rw-r--r--  1 root root    1049624576 Nov 30 16:44 sparse1000.bin
-rw-r--r--  1 root root     105906176 Nov 30 16:44 sparse100.bin

[guest at node1 log]$ du -kh sparse100*
1.1M    sparse1000000.bin
1.1M    sparse100000.bin
1.1M    sparse10000.bin
1.1M    sparse1000.bin
1.1M    sparse100.bin

The stack is unionfs mounted, NFS exported, and the diskless node 
(node2) boots on this file system.

[root at node1 diskless]# ssh node2
root at node2's password:
-bash-3.00#
-bash-3.00# cd /var/log/
-bash-3.00# ll
...
-rw-r--r--  1 root root     105906178 Nov 30  2006 sparse100.bin
-rw-r--r--  1 root root    1049624578 Nov 30  2006 sparse1000.bin
-rw-r--r--  1 root root   10486808578 Nov 30  2006 sparse10000.bin
-rw-r--r--  1 root root  104858648578 Nov 30  2006 sparse100000.bin
-rw-r--r--  1 root root 1048577048578 Nov 30  2006 sparse1000000.bin

-bash-3.00# du -kh sparse100*
1.1M    sparse100.bin
1.1M    sparse1000.bin
1.1M    sparse10000.bin
1.1M    sparse100000.bin
1.1M    sparse1000000.bin

Then I cat 2 characters to sparse1000:

-bash-3.00# echo 1 >> sparse1000.bin

It takes a while to finish and... the file is no more a sparse file !

-bash-3.00# du -kh sparse100*
1.1M    sparse100.bin
1002M   sparse1000.bin
1.1M    sparse10000.bin
1.1M    sparse100000.bin
1.1M    sparse1000000.bin

Incidently the giga byte file is created in the COW directory.
Now, doing this on sparse10000.bin then freezes the server, with always 
the same kernel trace (attached).

I guess one thing wrong is that the sparse nature is not respected by 
unionfs. Do you need some details about how things are mounted or 
exported? Except providing such kind of information I can hardly do 
further investigations. Can you reproduce the problem?

Thanks,

-- 
Benoît Guillon                guillon at thalescomputers.fr
TRT/SML                       tel. : 33 (0)4 98 16 33 90
 
THALES RESEARCH & TECHNOLOGY

-------------- next part --------------
Nov 30 17:51:13 node1 kernel: ------------[ cut here ]------------
Nov 30 17:51:13 node1 kernel: kernel BUG at /tmp/build.mmm/unionfs-tools-1.1.5/unionfs_debugmacros.h:291!
Nov 30 17:51:13 node1 kernel: invalid operand: 0000 [#1]
Nov 30 17:51:13 node1 kernel: Modules linked in: unionfs(U) i915 nfsd exportfs lockd nfs_acl sunrpc i2c_dev i2c_core ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod md5 ipv6 uhci_hcd ehci_hcd hw_random e1000 ext3 jbd
Nov 30 17:51:13 node1 kernel: CPU:    0
Nov 30 17:51:13 node1 kernel: EIP:    0060:[<f8e9d6c9>]    Not tainted VLI
Nov 30 17:51:13 node1 kernel: EFLAGS: 00010246   (2.6.9-34.EL)
Nov 30 17:51:13 node1 kernel: EIP is at unionfs_d_revalidate+0x1a79/0x1b00 [unionfs]
Nov 30 17:51:13 node1 kernel: eax: 00000000   ebx: f084c094   ecx: f7aa5048   edx: 00000003
Nov 30 17:51:13 node1 kernel: esi: 00000002   edi: f5167800   ebp: f084c094   esp: f66f3d44
Nov 30 17:51:13 node1 kernel: ds: 007b   es: 007b   ss: 0068
Nov 30 17:51:13 node1 kernel: Process nfsd (pid: 2717, threadinfo=f66f3000 task=f671d970)
Nov 30 17:51:13 node1 kernel: Stack: f8ed9424 f8ed89e2 0000002c 00000006 f8edbe7b f8ee2600 f8ed89e2 f8ed9424
Nov 30 17:51:13 node1 kernel:        0000002c f8ed6bdc f8edb918 00000002 000000f4 00000008 f8edcf57 f8edcc60
Nov 30 17:51:13 node1 kernel:        00000000 00000001 00000001 00000000 f6b374e0 f66a0180 f66a018c f8ed90eb
Nov 30 17:51:13 node1 kernel: Call Trace:
Nov 30 17:51:13 node1 kernel:  [<f8ed6bdc>] fist_print_file+0x17c/0x210 [unionfs]
Nov 30 17:51:13 node1 kernel:  [<f8ed0d52>] unionfs_file_revalidate+0x132/0x14a0 [unionfs]
Nov 30 17:51:13 node1 kernel:  [<c030fbf0>] __cond_resched+0x14/0x3b
Nov 30 17:51:13 node1 kernel:  [<f8e9ef20>] unionfs_write+0x0/0x240 [unionfs]
Nov 30 17:51:13 node1 kernel:  [<f8e9efae>] unionfs_write+0x8e/0x240 [unionfs]
Nov 30 17:51:13 node1 kernel:  [<f8e9ef20>] unionfs_write+0x0/0x240 [unionfs]
Nov 30 17:51:13 node1 kernel:  [<c0169091>] do_readv_writev+0x1c5/0x21d
Nov 30 17:51:13 node1 kernel:  [<c0167dc1>] __dentry_open+0xca/0x16a
Nov 30 17:51:13 node1 kernel:  [<c0167cf2>] dentry_open+0x48/0x4d
Nov 30 17:51:13 node1 kernel:  [<c0169167>] vfs_writev+0x3e/0x43
Nov 30 17:51:13 node1 kernel:  [<f8b37600>] nfsd_write+0xeb/0x28f [nfsd]
Nov 30 17:51:13 node1 kernel:  [<c030fbf0>] __cond_resched+0x14/0x3b
Nov 30 17:51:13 node1 kernel:  [<f8b3eee6>] nfsd3_proc_write+0xbf/0xd5 [nfsd]
Nov 30 17:51:13 node1 kernel:  [<f8b40f94>] nfs3svc_decode_writeargs+0x0/0x243 [nfsd]
Nov 30 17:51:13 node1 kernel:  [<f8b33947>] nfsd_dispatch+0xba/0x16f [nfsd]
Nov 30 17:51:13 node1 kernel:  [<f8add8ec>] svc_process+0x432/0x6da [sunrpc]
Nov 30 17:51:13 node1 kernel:  [<f8b335eb>] nfsd+0x2a7/0x549 [nfsd]
Nov 30 17:51:13 node1 kernel:  [<f8b33344>] nfsd+0x0/0x549 [nfsd]
Nov 30 17:51:13 node1 kernel:  [<c01041dd>] kernel_thread_helper+0x5/0xb
Nov 30 17:51:13 node1 kernel: Code: a4 94 ed f8 e9 aa fc ff ff 0f 0b 4a 00 a4 94 ed f8 0f 0b 41 00 a4 94 ed f8 e9 1d f2 ff ff 0f 0b 44 00 a4 94 ed f8 e9 86 f1 ff ff <0f> 0b 23 01 a4 94 ed f8 e9 24 fc ff ff 0f 0b 44 00 a4 94 ed f8
Nov 30 17:51:13 node1 kernel:  <0>Fatal exception: panic in 5 seconds


More information about the unionfs mailing list