original development tree for Linux kernel GTP module; now long in mainline.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1694 lines
44 KiB

[PATCH] ext[34]: EA block reference count racing fix There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago
[PATCH] ext[34]: EA block reference count racing fix There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago
[PATCH] ext[34]: EA block reference count racing fix There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago
[PATCH] ext[34]: EA block reference count racing fix There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago
[PATCH] ext[34]: EA block reference count racing fix There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
[PATCH] ext[34]: EA block reference count racing fix There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago
[PATCH] ext[34]: EA block reference count racing fix There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago
[PATCH] ext[34]: EA block reference count racing fix There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 years ago
ext4: fix race in xattr block allocation path Ceph users reported that when using Ceph on ext4, the filesystem would often become corrupted, containing inodes with incorrect i_blocks counters. I managed to reproduce this with a very hacked-up "streamtest" binary from the Ceph tree. Ceph is doing a lot of xattr writes, to out-of-inode blocks. There is also another thread which does sync_file_range and close, of the same files. The problem appears to happen due to this race: sync/flush thread xattr-set thread ----------------- ---------------- do_writepages ext4_xattr_set ext4_da_writepages ext4_xattr_set_handle mpage_da_map_blocks ext4_xattr_block_set set DELALLOC_RESERVE ext4_new_meta_blocks ext4_mb_new_blocks if (!i_delalloc_reserved_flag) vfs_dq_alloc_block ext4_get_blocks down_write(i_data_sem) set i_delalloc_reserved_flag ... up_write(i_data_sem) if (i_delalloc_reserved_flag) vfs_dq_alloc_block_nofail In other words, the sync/flush thread pops in and sets i_delalloc_reserved_flag on the inode, which makes the xattr thread think that it's in a delalloc path in ext4_new_meta_blocks(), and add the block for a second time, after already having added it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks The real problem is that we shouldn't be using the DELALLOC_RESERVED state flag, and instead we should be passing EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of using an inode state flag. We'll fix this for now with using i_data_sem to prevent this race, but this is really not the right way to fix things. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
10 years ago
ext4: fix race in xattr block allocation path Ceph users reported that when using Ceph on ext4, the filesystem would often become corrupted, containing inodes with incorrect i_blocks counters. I managed to reproduce this with a very hacked-up "streamtest" binary from the Ceph tree. Ceph is doing a lot of xattr writes, to out-of-inode blocks. There is also another thread which does sync_file_range and close, of the same files. The problem appears to happen due to this race: sync/flush thread xattr-set thread ----------------- ---------------- do_writepages ext4_xattr_set ext4_da_writepages ext4_xattr_set_handle mpage_da_map_blocks ext4_xattr_block_set set DELALLOC_RESERVE ext4_new_meta_blocks ext4_mb_new_blocks if (!i_delalloc_reserved_flag) vfs_dq_alloc_block ext4_get_blocks down_write(i_data_sem) set i_delalloc_reserved_flag ... up_write(i_data_sem) if (i_delalloc_reserved_flag) vfs_dq_alloc_block_nofail In other words, the sync/flush thread pops in and sets i_delalloc_reserved_flag on the inode, which makes the xattr thread think that it's in a delalloc path in ext4_new_meta_blocks(), and add the block for a second time, after already having added it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks The real problem is that we shouldn't be using the DELALLOC_RESERVED state flag, and instead we should be passing EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of using an inode state flag. We'll fix this for now with using i_data_sem to prevent this race, but this is really not the right way to fix things. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
10 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years ago
  1. /*
  2. * linux/fs/ext4/xattr.c
  3. *
  4. * Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de>
  5. *
  6. * Fix by Harrison Xing <harrison@mountainviewdata.com>.
  7. * Ext4 code with a lot of help from Eric Jarman <ejarman@acm.org>.
  8. * Extended attributes for symlinks and special files added per
  9. * suggestion of Luka Renko <luka.renko@hermes.si>.
  10. * xattr consolidation Copyright (c) 2004 James Morris <jmorris@redhat.com>,
  11. * Red Hat Inc.
  12. * ea-in-inode support by Alex Tomas <alex@clusterfs.com> aka bzzz
  13. * and Andreas Gruenbacher <agruen@suse.de>.
  14. */
  15. /*
  16. * Extended attributes are stored directly in inodes (on file systems with
  17. * inodes bigger than 128 bytes) and on additional disk blocks. The i_file_acl
  18. * field contains the block number if an inode uses an additional block. All
  19. * attributes must fit in the inode and one additional block. Blocks that
  20. * contain the identical set of attributes may be shared among several inodes.
  21. * Identical blocks are detected by keeping a cache of blocks that have
  22. * recently been accessed.
  23. *
  24. * The attributes in inodes and on blocks have a different header; the entries
  25. * are stored in the same format:
  26. *
  27. * +------------------+
  28. * | header |
  29. * | entry 1 | |
  30. * | entry 2 | | growing downwards
  31. * | entry 3 | v
  32. * | four null bytes |
  33. * | . . . |
  34. * | value 1 | ^
  35. * | value 3 | | growing upwards
  36. * | value 2 | |
  37. * +------------------+
  38. *
  39. * The header is followed by multiple entry descriptors. In disk blocks, the
  40. * entry descriptors are kept sorted. In inodes, they are unsorted. The
  41. * attribute values are aligned to the end of the block in no specific order.
  42. *
  43. * Locking strategy
  44. * ----------------
  45. * EXT4_I(inode)->i_file_acl is protected by EXT4_I(inode)->xattr_sem.
  46. * EA blocks are only changed if they are exclusive to an inode, so
  47. * holding xattr_sem also means that nothing but the EA block's reference
  48. * count can change. Multiple writers to the same block are synchronized
  49. * by the buffer lock.
  50. */
  51. #include <linux/init.h>
  52. #include <linux/fs.h>
  53. #include <linux/slab.h>
  54. #include <linux/mbcache.h>
  55. #include <linux/quotaops.h>
  56. #include <linux/rwsem.h>
  57. #include "ext4_jbd2.h"
  58. #include "ext4.h"
  59. #include "xattr.h"
  60. #include "acl.h"
  61. #ifdef EXT4_XATTR_DEBUG
  62. # define ea_idebug(inode, f...) do { \
  63. printk(KERN_DEBUG "inode %s:%lu: ", \
  64. inode->i_sb->s_id, inode->i_ino); \
  65. printk(f); \
  66. printk("\n"); \
  67. } while (0)
  68. # define ea_bdebug(bh, f...) do { \
  69. char b[BDEVNAME_SIZE]; \
  70. printk(KERN_DEBUG "block %s:%lu: ", \
  71. bdevname(bh->b_bdev, b), \
  72. (unsigned long) bh->b_blocknr); \
  73. printk(f); \
  74. printk("\n"); \
  75. } while (0)
  76. #else
  77. # define ea_idebug(inode, fmt, ...) no_printk(fmt, ##__VA_ARGS__)
  78. # define ea_bdebug(bh, fmt, ...) no_printk(fmt, ##__VA_ARGS__)
  79. #endif
  80. static void ext4_xattr_cache_insert(struct buffer_head *);
  81. static struct buffer_head *ext4_xattr_cache_find(struct inode *,
  82. struct ext4_xattr_header *,
  83. struct mb_cache_entry **);
  84. static void ext4_xattr_rehash(struct ext4_xattr_header *,
  85. struct ext4_xattr_entry *);
  86. static int ext4_xattr_list(struct dentry *dentry, char *buffer,
  87. size_t buffer_size);
  88. static struct mb_cache *ext4_xattr_cache;
  89. static const struct xattr_handler *ext4_xattr_handler_map[] = {
  90. [EXT4_XATTR_INDEX_USER] = &ext4_xattr_user_handler,
  91. #ifdef CONFIG_EXT4_FS_POSIX_ACL
  92. [EXT4_XATTR_INDEX_POSIX_ACL_ACCESS] = &ext4_xattr_acl_access_handler,
  93. [EXT4_XATTR_INDEX_POSIX_ACL_DEFAULT] = &ext4_xattr_acl_default_handler,
  94. #endif
  95. [EXT4_XATTR_INDEX_TRUSTED] = &ext4_xattr_trusted_handler,
  96. #ifdef CONFIG_EXT4_FS_SECURITY
  97. [EXT4_XATTR_INDEX_SECURITY] = &ext4_xattr_security_handler,
  98. #endif
  99. };
  100. const struct xattr_handler *ext4_xattr_handlers[] = {
  101. &ext4_xattr_user_handler,
  102. &ext4_xattr_trusted_handler,
  103. #ifdef CONFIG_EXT4_FS_POSIX_ACL
  104. &ext4_xattr_acl_access_handler,
  105. &ext4_xattr_acl_default_handler,
  106. #endif
  107. #ifdef CONFIG_EXT4_FS_SECURITY
  108. &ext4_xattr_security_handler,
  109. #endif
  110. NULL
  111. };
  112. static __le32 ext4_xattr_block_csum(struct inode *inode,
  113. sector_t block_nr,
  114. struct ext4_xattr_header *hdr)
  115. {
  116. struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
  117. __u32 csum;
  118. __le32 save_csum;
  119. __le64 dsk_block_nr = cpu_to_le64(block_nr);
  120. save_csum = hdr->h_checksum;
  121. hdr->h_checksum = 0;
  122. csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&dsk_block_nr,
  123. sizeof(dsk_block_nr));
  124. csum = ext4_chksum(sbi, csum, (__u8 *)hdr,
  125. EXT4_BLOCK_SIZE(inode->i_sb));
  126. hdr->h_checksum = save_csum;
  127. return cpu_to_le32(csum);
  128. }
  129. static int ext4_xattr_block_csum_verify(struct inode *inode,
  130. sector_t block_nr,
  131. struct ext4_xattr_header *hdr)
  132. {
  133. if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
  134. EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
  135. (hdr->h_checksum != ext4_xattr_block_csum(inode, block_nr, hdr)))
  136. return 0;
  137. return 1;
  138. }
  139. static void ext4_xattr_block_csum_set(struct inode *inode,
  140. sector_t block_nr,
  141. struct ext4_xattr_header *hdr)
  142. {
  143. if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
  144. EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
  145. return;
  146. hdr->h_checksum = ext4_xattr_block_csum(inode, block_nr, hdr);
  147. }
  148. static inline int ext4_handle_dirty_xattr_block(handle_t *handle,
  149. struct inode *inode,
  150. struct buffer_head *bh)
  151. {
  152. ext4_xattr_block_csum_set(inode, bh->b_blocknr, BHDR(bh));
  153. return ext4_handle_dirty_metadata(handle, inode, bh);
  154. }
  155. static inline const struct xattr_handler *
  156. ext4_xattr_handler(int name_index)
  157. {
  158. const struct xattr_handler *handler = NULL;
  159. if (name_index > 0 && name_index < ARRAY_SIZE(ext4_xattr_handler_map))
  160. handler = ext4_xattr_handler_map[name_index];
  161. return handler;
  162. }
  163. /*
  164. * Inode operation listxattr()
  165. *
  166. * dentry->d_inode->i_mutex: don't care
  167. */
  168. ssize_t
  169. ext4_listxattr(struct dentry *dentry, char *buffer, size_t size)
  170. {
  171. return ext4_xattr_list(dentry, buffer, size);
  172. }
  173. static int
  174. ext4_xattr_check_names(struct ext4_xattr_entry *entry, void *end)
  175. {
  176. while (!IS_LAST_ENTRY(entry)) {
  177. struct ext4_xattr_entry *next = EXT4_XATTR_NEXT(entry);
  178. if ((void *)next >= end)
  179. return -EIO;
  180. entry = next;
  181. }
  182. return 0;
  183. }
  184. static inline int
  185. ext4_xattr_check_block(struct inode *inode, struct buffer_head *bh)
  186. {
  187. int error;
  188. if (buffer_verified(bh))
  189. return 0;
  190. if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
  191. BHDR(bh)->h_blocks != cpu_to_le32(1))
  192. return -EIO;
  193. if (!ext4_xattr_block_csum_verify(inode, bh->b_blocknr, BHDR(bh)))
  194. return -EIO;
  195. error = ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size);
  196. if (!error)
  197. set_buffer_verified(bh);
  198. return error;
  199. }
  200. static inline int
  201. ext4_xattr_check_entry(struct ext4_xattr_entry *entry, size_t size)
  202. {
  203. size_t value_size = le32_to_cpu(entry->e_value_size);
  204. if (entry->e_value_block != 0 || value_size > size ||
  205. le16_to_cpu(entry->e_value_offs) + value_size > size)
  206. return -EIO;
  207. return 0;
  208. }
  209. static int
  210. ext4_xattr_find_entry(struct ext4_xattr_entry **pentry, int name_index,
  211. const char *name, size_t size, int sorted)
  212. {
  213. struct ext4_xattr_entry *entry;
  214. size_t name_len;
  215. int cmp = 1;
  216. if (name == NULL)
  217. return -EINVAL;
  218. name_len = strlen(name);
  219. entry = *pentry;
  220. for (; !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) {
  221. cmp = name_index - entry->e_name_index;
  222. if (!cmp)
  223. cmp = name_len - entry->e_name_len;
  224. if (!cmp)
  225. cmp = memcmp(name, entry->e_name, name_len);
  226. if (cmp <= 0 && (sorted || cmp == 0))
  227. break;
  228. }
  229. *pentry = entry;
  230. if (!cmp && ext4_xattr_check_entry(entry, size))
  231. return -EIO;
  232. return cmp ? -ENODATA : 0;
  233. }
  234. static int
  235. ext4_xattr_block_get(struct inode *inode, int name_index, const char *name,
  236. void *buffer, size_t buffer_size)
  237. {
  238. struct buffer_head *bh = NULL;
  239. struct ext4_xattr_entry *entry;
  240. size_t size;
  241. int error;
  242. ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld",
  243. name_index, name, buffer, (long)buffer_size);
  244. error = -ENODATA;
  245. if (!EXT4_I(inode)->i_file_acl)
  246. goto cleanup;
  247. ea_idebug(inode, "reading block %llu",
  248. (unsigned long long)EXT4_I(inode)->i_file_acl);
  249. bh = sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl);
  250. if (!bh)
  251. goto cleanup;
  252. ea_bdebug(bh, "b_count=%d, refcount=%d",
  253. atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
  254. if (ext4_xattr_check_block(inode, bh)) {
  255. bad_block:
  256. EXT4_ERROR_INODE(inode, "bad block %llu",
  257. EXT4_I(inode)->i_file_acl);
  258. error = -EIO;
  259. goto cleanup;
  260. }
  261. ext4_xattr_cache_insert(bh);
  262. entry = BFIRST(bh);
  263. error = ext4_xattr_find_entry(&entry, name_index, name, bh->b_size, 1);
  264. if (error == -EIO)
  265. goto bad_block;
  266. if (error)
  267. goto cleanup;
  268. size = le32_to_cpu(entry->e_value_size);
  269. if (buffer) {
  270. error = -ERANGE;
  271. if (size > buffer_size)
  272. goto cleanup;
  273. memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
  274. size);
  275. }
  276. error = size;
  277. cleanup:
  278. brelse(bh);
  279. return error;
  280. }
  281. int
  282. ext4_xattr_ibody_get(struct inode *inode, int name_index, const char *name,
  283. void *buffer, size_t buffer_size)
  284. {
  285. struct ext4_xattr_ibody_header *header;
  286. struct ext4_xattr_entry *entry;
  287. struct ext4_inode *raw_inode;
  288. struct ext4_iloc iloc;
  289. size_t size;
  290. void *end;
  291. int error;
  292. if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR))
  293. return -ENODATA;
  294. error = ext4_get_inode_loc(inode, &iloc);
  295. if (error)
  296. return error;
  297. raw_inode = ext4_raw_inode(&iloc);
  298. header = IHDR(inode, raw_inode);
  299. entry = IFIRST(header);
  300. end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
  301. error = ext4_xattr_check_names(entry, end);
  302. if (error)
  303. goto cleanup;
  304. error = ext4_xattr_find_entry(&entry, name_index, name,
  305. end - (void *)entry, 0);
  306. if (error)
  307. goto cleanup;
  308. size = le32_to_cpu(entry->e_value_size);
  309. if (buffer) {
  310. error = -ERANGE;
  311. if (size > buffer_size)
  312. goto cleanup;
  313. memcpy(buffer, (void *)IFIRST(header) +
  314. le16_to_cpu(entry->e_value_offs), size);
  315. }
  316. error = size;
  317. cleanup:
  318. brelse(iloc.bh);
  319. return error;
  320. }
  321. /*
  322. * ext4_xattr_get()
  323. *
  324. * Copy an extended attribute into the buffer
  325. * provided, or compute the buffer size required.
  326. * Buffer is NULL to compute the size of the buffer required.
  327. *
  328. * Returns a negative error number on failure, or the number of bytes
  329. * used / required on success.
  330. */
  331. int
  332. ext4_xattr_get(struct inode *inode, int name_index, const char *name,
  333. void *buffer, size_t buffer_size)
  334. {
  335. int error;
  336. down_read(&EXT4_I(inode)->xattr_sem);
  337. error = ext4_xattr_ibody_get(inode, name_index, name, buffer,
  338. buffer_size);
  339. if (error == -ENODATA)
  340. error = ext4_xattr_block_get(inode, name_index, name, buffer,
  341. buffer_size);
  342. up_read(&EXT4_I(inode)->xattr_sem);
  343. return error;
  344. }
  345. static int
  346. ext4_xattr_list_entries(struct dentry *dentry, struct ext4_xattr_entry *entry,
  347. char *buffer, size_t buffer_size)
  348. {
  349. size_t rest = buffer_size;
  350. for (; !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) {
  351. const struct xattr_handler *handler =
  352. ext4_xattr_handler(entry->e_name_index);
  353. if (handler) {
  354. size_t size = handler->list(dentry, buffer, rest,
  355. entry->e_name,
  356. entry->e_name_len,
  357. handler->flags);
  358. if (buffer) {
  359. if (size > rest)
  360. return -ERANGE;
  361. buffer += size;
  362. }
  363. rest -= size;
  364. }
  365. }
  366. return buffer_size - rest;
  367. }
  368. static int
  369. ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size)
  370. {
  371. struct inode *inode = dentry->d_inode;
  372. struct buffer_head *bh = NULL;
  373. int error;
  374. ea_idebug(inode, "buffer=%p, buffer_size=%ld",
  375. buffer, (long)buffer_size);
  376. error = 0;
  377. if (!EXT4_I(inode)->i_file_acl)
  378. goto cleanup;
  379. ea_idebug(inode, "reading block %llu",
  380. (unsigned long long)EXT4_I(inode)->i_file_acl);
  381. bh = sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl);
  382. error = -EIO;
  383. if (!bh)
  384. goto cleanup;
  385. ea_bdebug(bh, "b_count=%d, refcount=%d",
  386. atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
  387. if (ext4_xattr_check_block(inode, bh)) {
  388. EXT4_ERROR_INODE(inode, "bad block %llu",
  389. EXT4_I(inode)->i_file_acl);
  390. error = -EIO;
  391. goto cleanup;
  392. }
  393. ext4_xattr_cache_insert(bh);
  394. error = ext4_xattr_list_entries(dentry, BFIRST(bh), buffer, buffer_size);
  395. cleanup:
  396. brelse(bh);
  397. return error;
  398. }
  399. static int
  400. ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size)
  401. {
  402. struct inode *inode = dentry->d_inode;
  403. struct ext4_xattr_ibody_header *header;
  404. struct ext4_inode *raw_inode;
  405. struct ext4_iloc iloc;
  406. void *end;
  407. int error;
  408. if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR))
  409. return 0;
  410. error = ext4_get_inode_loc(inode, &iloc);
  411. if (error)
  412. return error;
  413. raw_inode = ext4_raw_inode(&iloc);
  414. header = IHDR(inode, raw_inode);
  415. end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
  416. error = ext4_xattr_check_names(IFIRST(header), end);
  417. if (error)
  418. goto cleanup;
  419. error = ext4_xattr_list_entries(dentry, IFIRST(header),
  420. buffer, buffer_size);
  421. cleanup:
  422. brelse(iloc.bh);
  423. return error;
  424. }
  425. /*
  426. * ext4_xattr_list()
  427. *
  428. * Copy a list of attribute names into the buffer
  429. * provided, or compute the buffer size required.
  430. * Buffer is NULL to compute the size of the buffer required.
  431. *
  432. * Returns a negative error number on failure, or the number of bytes
  433. * used / required on success.
  434. */
  435. static int
  436. ext4_xattr_list(struct dentry *dentry, char