original development tree for Linux kernel GTP module; now long in mainline.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

943 lines
22 KiB

[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
[PATCH] fdtable: Implement new pagesize-based fdtable allocator This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
15 years ago
  1. /*
  2. * linux/fs/file.c
  3. *
  4. * Copyright (C) 1998-1999, Stephen Tweedie and Bill Hawes
  5. *
  6. * Manage the dynamic fd arrays in the process files_struct.
  7. */
  8. #include <linux/syscalls.h>
  9. #include <linux/export.h>
  10. #include <linux/fs.h>
  11. #include <linux/mm.h>
  12. #include <linux/mmzone.h>
  13. #include <linux/time.h>
  14. #include <linux/sched.h>
  15. #include <linux/slab.h>
  16. #include <linux/vmalloc.h>
  17. #include <linux/file.h>
  18. #include <linux/fdtable.h>
  19. #include <linux/bitops.h>
  20. #include <linux/interrupt.h>
  21. #include <linux/spinlock.h>
  22. #include <linux/rcupdate.h>
  23. #include <linux/workqueue.h>
  24. int sysctl_nr_open __read_mostly = 1024*1024;
  25. int sysctl_nr_open_min = BITS_PER_LONG;
  26. int sysctl_nr_open_max = 1024 * 1024; /* raised later */
  27. static void *alloc_fdmem(size_t size)
  28. {
  29. /*
  30. * Very large allocations can stress page reclaim, so fall back to
  31. * vmalloc() if the allocation size will be considered "large" by the VM.
  32. */
  33. if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
  34. void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
  35. if (data != NULL)
  36. return data;
  37. }
  38. return vmalloc(size);
  39. }
  40. static void free_fdmem(void *ptr)
  41. {
  42. is_vmalloc_addr(ptr) ? vfree(ptr) : kfree(ptr);
  43. }
  44. static void __free_fdtable(struct fdtable *fdt)
  45. {
  46. free_fdmem(fdt->fd);
  47. free_fdmem(fdt->open_fds);
  48. kfree(fdt);
  49. }
  50. static void free_fdtable_rcu(struct rcu_head *rcu)
  51. {
  52. __free_fdtable(container_of(rcu, struct fdtable, rcu));
  53. }
  54. /*
  55. * Expand the fdset in the files_struct. Called with the files spinlock
  56. * held for write.
  57. */
  58. static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
  59. {
  60. unsigned int cpy, set;
  61. BUG_ON(nfdt->max_fds < ofdt->max_fds);
  62. cpy = ofdt->max_fds * sizeof(struct file *);
  63. set = (nfdt->max_fds - ofdt->max_fds) * sizeof(struct file *);
  64. memcpy(nfdt->fd, ofdt->fd, cpy);
  65. memset((char *)(nfdt->fd) + cpy, 0, set);
  66. cpy = ofdt->max_fds / BITS_PER_BYTE;
  67. set = (nfdt->max_fds - ofdt->max_fds) / BITS_PER_BYTE;
  68. memcpy(nfdt->open_fds, ofdt->open_fds, cpy);
  69. memset((char *)(nfdt->open_fds) + cpy, 0, set);
  70. memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy);
  71. memset((char *)(nfdt->close_on_exec) + cpy, 0, set);
  72. }
  73. static struct fdtable * alloc_fdtable(unsigned int nr)
  74. {
  75. struct fdtable *fdt;
  76. void *data;
  77. /*
  78. * Figure out how many fds we actually want to support in this fdtable.
  79. * Allocation steps are keyed to the size of the fdarray, since it
  80. * grows far faster than any of the other dynamic data. We try to fit
  81. * the fdarray into comfortable page-tuned chunks: starting at 1024B
  82. * and growing in powers of two from there on.
  83. */
  84. nr /= (1024 / sizeof(struct file *));
  85. nr = roundup_pow_of_two(nr + 1);
  86. nr *= (1024 / sizeof(struct file *));
  87. /*
  88. * Note that this can drive nr *below* what we had passed if sysctl_nr_open
  89. * had been set lower between the check in expand_files() and here. Deal
  90. * with that in caller, it's cheaper that way.
  91. *
  92. * We make sure that nr remains a multiple of BITS_PER_LONG - otherwise
  93. * bitmaps handling below becomes unpleasant, to put it mildly...
  94. */
  95. if (unlikely(nr > sysctl_nr_open))
  96. nr = ((sysctl_nr_open - 1) | (BITS_PER_LONG - 1)) + 1;
  97. fdt = kmalloc(sizeof(struct fdtable), GFP_KERNEL);
  98. if (!fdt)
  99. goto out;
  100. fdt->max_fds = nr;
  101. data = alloc_fdmem(nr * sizeof(struct file *));
  102. if (!data)
  103. goto out_fdt;
  104. fdt->fd = data;
  105. data = alloc_fdmem(max_t(size_t,
  106. 2 * nr / BITS_PER_BYTE, L1_CACHE_BYTES));
  107. if (!data)
  108. goto out_arr;
  109. fdt->open_fds = data;
  110. data += nr / BITS_PER_BYTE;
  111. fdt->close_on_exec = data;
  112. return fdt;
  113. out_arr:
  114. free_fdmem(fdt->fd);
  115. out_fdt:
  116. kfree(fdt);
  117. out:
  118. return NULL;
  119. }
  120. /*
  121. * Expand the file descriptor table.
  122. * This function will allocate a new fdtable and both fd array and fdset, of
  123. * the given size.
  124. * Return <0 error code on error; 1 on successful completion.
  125. * The files->file_lock should be held on entry, and will be held on exit.
  126. */
  127. static int expand_fdtable(struct files_struct *files, int nr)
  128. __releases(files->file_lock)
  129. __acquires(files->file_lock)
  130. {
  131. struct fdtable *new_fdt, *cur_fdt;
  132. spin_unlock(&files->file_lock);
  133. new_fdt = alloc_fdtable(nr);
  134. spin_lock(&files->file_lock);
  135. if (!new_fdt)
  136. return -ENOMEM;
  137. /*
  138. * extremely unlikely race - sysctl_nr_open decreased between the check in
  139. * caller and alloc_fdtable(). Cheaper to catch it here...
  140. */
  141. if (unlikely(new_fdt->max_fds <= nr)) {
  142. __free_fdtable(new_fdt);
  143. return -EMFILE;
  144. }
  145. /*
  146. * Check again since another task may have expanded the fd table while
  147. * we dropped the lock
  148. */
  149. cur_fdt = files_fdtable(files);
  150. if (nr >= cur_fdt->max_fds) {
  151. /* Continue as planned */
  152. copy_fdtable(new_fdt, cur_fdt);
  153. rcu_assign_pointer(files->fdt, new_fdt);
  154. if (cur_fdt != &files->fdtab)
  155. call_rcu(&cur_fdt->rcu, free_fdtable_rcu);
  156. } else {
  157. /* Somebody else expanded, so undo our attempt */
  158. __free_fdtable(new_fdt);
  159. }
  160. return 1;
  161. }
  162. /*
  163. * Expand files.
  164. * This function will expand the file structures, if the requested size exceeds
  165. * the current capacity and there is room for expansion.
  166. * Return <0 error code on error; 0 when nothing done; 1 when files were
  167. * expanded and execution may have blocked.
  168. * The files->file_lock should be held on entry, and will be held on exit.
  169. */
  170. static int expand_files(struct files_struct *files, int nr)
  171. {
  172. struct fdtable *fdt;
  173. fdt = files_fdtable(files);
  174. /* Do we need to expand? */
  175. if (nr < fdt->max_fds)
  176. return 0;
  177. /* Can we expand? */
  178. if (nr >= sysctl_nr_open)
  179. return -EMFILE;
  180. /* All good, so we try */
  181. return expand_fdtable(files, nr);
  182. }
  183. static inline void __set_close_on_exec(int fd, struct fdtable *fdt)
  184. {
  185. __set_bit(fd, fdt->close_on_exec);
  186. }
  187. static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
  188. {
  189. __clear_bit(fd, fdt->close_on_exec);
  190. }
  191. static inline void __set_open_fd(int fd, struct fdtable *fdt)
  192. {
  193. __set_bit(fd, fdt->open_fds);
  194. }
  195. static inline void __clear_open_fd(int fd, struct fdtable *fdt)
  196. {
  197. __clear_bit(fd, fdt->open_fds);
  198. }
  199. static int count_open_files(struct fdtable *fdt)
  200. {
  201. int size = fdt->max_fds;
  202. int i;
  203. /* Find the last open fd */
  204. for (i = size / BITS_PER_LONG; i > 0; ) {
  205. if (fdt->open_fds[--i])
  206. break;
  207. }
  208. i = (i + 1) * BITS_PER_LONG;
  209. return i;
  210. }
  211. /*
  212. * Allocate a new files structure and copy contents from the
  213. * passed in files structure.
  214. * errorp will be valid only when the returned files_struct is NULL.
  215. */
  216. struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
  217. {
  218. struct files_struct *newf;
  219. struct file **old_fds, **new_fds;
  220. int open_files, size, i;
  221. struct fdtable *old_fdt, *new_fdt;
  222. *errorp = -ENOMEM;
  223. newf = kmem_cache_alloc(files_cachep, GFP_KERNEL);
  224. if (!newf)
  225. goto out;
  226. atomic_set(&newf->count, 1);
  227. spin_lock_init(&newf->file_lock);
  228. newf->next_fd = 0;
  229. new_fdt = &newf->fdtab;
  230. new_fdt->max_fds = NR_OPEN_DEFAULT;
  231. new_fdt->close_on_exec = newf->close_on_exec_init;
  232. new_fdt->open_fds = newf->open_fds_init;
  233. new_fdt->fd = &newf->fd_array[0];
  234. spin_lock(&oldf->file_lock);
  235. old_fdt = files_fdtable(oldf);
  236. open_files = count_open_files(old_fdt);
  237. /*
  238. * Check whether we need to allocate a larger fd array and fd set.
  239. */
  240. while (unlikely(open_files > new_fdt->max_fds)) {
  241. spin_unlock(&oldf->file_lock);
  242. if (new_fdt != &newf->fdtab)
  243. __free_fdtable(new_fdt);
  244. new_fdt = alloc_fdtable(open_files - 1);
  245. if (!new_fdt) {
  246. *errorp = -ENOMEM;
  247. goto out_release;
  248. }
  249. /* beyond sysctl_nr_open; nothing to do */
  250. if (unlikely(new_fdt->max_fds < open_files)) {
  251. __free_fdtable(new_fdt);
  252. *errorp = -EMFILE;
  253. goto out_release;
  254. }
  255. /*
  256. * Reacquire the oldf lock and a pointer to its fd table
  257. * who knows it may have a new bigger fd table. We need
  258. * the latest pointer.
  259. */
  260. spin_lock(&oldf->file_lock);
  261. old_fdt = files_fdtable(oldf);
  262. open_files = count_open_files(old_fdt);
  263. }
  264. old_fds = old_fdt->fd;
  265. new_fds = new_fdt->fd;
  266. memcpy(new_fdt->open_fds, old_fdt->open_fds, open_files / 8);
  267. memcpy(new_fdt->close_on_exec, old_fdt->close_on_exec, open_files / 8);
  268. for (i = open_files; i != 0; i--) {
  269. struct file *f = *old_fds++;
  270. if (f) {
  271. get_file(f);
  272. } else {
  273. /*
  274. * The fd may be claimed in the fd bitmap but not yet
  275. * instantiated in the files array if a sibling thread
  276. * is partway through open(). So make sure that this
  277. * fd is available to the new process.
  278. */
  279. __clear_open_fd(open_files - i, new_fdt);
  280. }
  281. rcu_assign_pointer(*new_fds++, f);
  282. }
  283. spin_unlock(&oldf->file_lock);
  284. /* compute the remainder to be cleared */
  285. size = (new_fdt->max_fds - open_files) * sizeof(struct file *);
  286. /* This is long word aligned thus could use a optimized version */
  287. memset(new_fds, 0, size);
  288. if (new_fdt->max_fds > open_files) {
  289. int left = (new_fdt->max_fds - open_files) / 8;
  290. int start = open_files / BITS_PER_LONG;
  291. memset(&new_fdt->open_fds[start], 0, left);
  292. memset(&new_fdt->close_on_exec[start], 0, left);
  293. }
  294. rcu_assign_pointer(newf->fdt, new_fdt);
  295. return newf;
  296. out_release:
  297. kmem_cache_free(files_cachep, newf);
  298. out:
  299. return NULL;
  300. }
  301. static void close_files(struct files_struct * files)
  302. {
  303. int i, j;
  304. struct fdtable *fdt;
  305. j = 0;
  306. /*
  307. * It is safe to dereference the fd table without RCU or
  308. * ->file_lock because this is the last reference to the
  309. * files structure. But use RCU to shut RCU-lockdep up.
  310. */
  311. rcu_read_lock();
  312. fdt = files_fdtable(files);
  313. rcu_read_unlock();
  314. for (;;) {
  315. unsigned long set;
  316. i = j * BITS_PER_LONG;
  317. if (i >= fdt->max_fds)
  318. break;
  319. set = fdt->open_fds[j++];
  320. while (set) {
  321. if (set & 1) {
  322. struct file * file = xchg(&fdt->fd[i], NULL);
  323. if (file) {
  324. filp_close(file, files);
  325. cond_resched();
  326. }
  327. }
  328. i++;
  329. set >>= 1;
  330. }
  331. }
  332. }
  333. struct files_struct *get_files_struct(struct task_struct *task)
  334. {
  335. struct files_struct *files;
  336. task_lock(task);
  337. files = task->files;
  338. if (files)
  339. atomic_inc(&files->count);
  340. task_unlock(task);
  341. return files;
  342. }
  343. void put_files_struct(struct files_struct *files)
  344. {
  345. struct fdtable *fdt;
  346. if (atomic_dec_and_test(&files->count)) {
  347. close_files(files);
  348. /* not really needed, since nobody can see us */
  349. rcu_read_lock();
  350. fdt = files_fdtable(files);
  351. rcu_read_unlock();
  352. /* free the arrays if they are not embedded */
  353. if (fdt != &files->fdtab)
  354. __free_fdtable(fdt);
  355. kmem_cache_free(files_cachep, files);
  356. }
  357. }
  358. void reset_files_struct(struct files_struct *files)
  359. {
  360. struct task_struct *tsk = current;
  361. struct files_struct *old;
  362. old = tsk->files;
  363. task_lock(tsk);
  364. tsk->files = files;
  365. task_unlock(tsk);
  366. put_files_struct(old);
  367. }
  368. void exit_files(struct task_struct *tsk)
  369. {
  370. struct files_struct * files = tsk->files;
  371. if (files) {
  372. task_lock(tsk);
  373. tsk->files = NULL;
  374. task_unlock(tsk);
  375. put_files_struct(files);
  376. }
  377. }
  378. void __init files_defer_init(void)
  379. {
  380. sysctl_nr_open_max = min((size_t)INT_MAX, ~(size_t)0/sizeof(void *)) &
  381. -BITS_PER_LONG;
  382. }
  383. struct files_struct init_files = {
  384. .count = ATOMIC_INIT(1),
  385. .fdt = &init_files.fdtab,
  386. .fdtab = {
  387. .max_fds = NR_OPEN_DEFAULT,
  388. .fd = &init_files.fd_array[0],
  389. .close_on_exec = init_files.close_on_exec_init,
  390. .open_fds = init_files.open_fds_init,
  391. },
  392. .file_lock = __SPIN_LOCK_UNLOCKED(init_files.file_lock),
  393. };
  394. /*
  395. * allocate a file descriptor, mark it busy.
  396. */
  397. int __alloc_fd(struct files_struct *files,
  398. unsigned start, unsigned end, unsigned flags)
  399. {
  400. unsigned int fd;
  401. int error;
  402. struct fdtable *fdt;
  403. spin_lock(&files->file_lock);
  404. repeat:
  405. fdt = files_fdtable(files);
  406. fd = start;
  407. if (fd < files->next_fd)
  408. fd = files->next_fd;
  409. if (fd < fdt->max_fds)
  410. fd = find_next_zero_bit(fdt->open_fds, fdt->max_fds, fd);
  411. /*
  412. * N.B. For clone tasks sharing a files structure, this test
  413. * will limit the total number of files that can be opened.
  414. */
  415. error = -EMFILE;
  416. if (fd >= end)
  417. goto out;
  418. error = expand_files(files, fd);
  419. if (error < 0)
  420. goto out;
  421. /*
  422. * If we needed to expand the fs array we
  423. * might have blocked - try again.
  424. */
  425. if (error)
  426. goto repeat;
  427. if (start <= files->next_fd)
  428. files->next_fd = fd + 1;
  429. __set_open_fd(fd, fdt);
  430. if (flags & O_CLOEXEC)
  431. __set_close_on_exec(fd, fdt);
  432. else
  433. __clear_close_on_exec(fd, fdt);
  434. error = fd;
  435. #if 1
  436. /* Sanity check */
  437. if (rcu_dereference_raw(fdt->fd[fd]) != NULL) {
  438. printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
  439. rcu_assign_pointer(fdt->fd[fd], NULL);
  440. }
  441. #endif
  442. out:
  443. spin_unlock(&files->file_lock);
  444. return error;
  445. }
  446. static int alloc_fd(unsigned start, unsigned flags)
  447. {
  448. return __alloc_fd(current->files, start, rlimit(RLIMIT_NOFILE), flags);
  449. }
  450. int get_unused_fd_flags(unsigned flags)
  451. {
  452. return __alloc_fd(current->files, 0, rlimit(RLIMIT_NOFILE), flags);
  453. }
  454. EXPORT_SYMBOL(get_unused_fd_flags);
  455. static void __put_unused_fd(struct files_struct *files, unsigned int fd)
  456. {
  457. struct fdtable *fdt = files_fdtable(files);
  458. __clear_open_fd(fd, fdt);
  459. if (fd < files->next_fd)
  460. files->next_fd = fd;
  461. }
  462. void put_unused_fd(unsigned int fd)
  463. {
  464. struct files_struct *files = current->files;
  465. spin_lock(&files->file_lock);
  466. __put_unused_fd(files, fd);
  467. spin_unlock(&files->file_lock);
  468. }
  469. EXPORT_SYMBOL(put_unused_fd);
  470. /*
  471. * Install a file pointer in the fd array.
  472. *
  473. * The VFS is full of places where we drop the files lock between
  474. * setting the open_fds bitmap and installing the file in the file
  475. * array. At any such point, we are vulnerable to a dup2() race
  476. * installing a file in the array before us. We need to detect this and
  477. * fput() the struct file we are about to overwrite in this case.
  478. *
  479. * It should never happen - if we allow dup2() do it, _really_ bad things
  480. * will follow.
  481. *
  482. * NOTE: __fd_install() variant is really, really low-level; don't
  483. * use it unless you are forced to by truly lousy API shoved down
  484. * your throat. 'files' *MUST* be either current->files or obtained
  485. * by get_files_struct(current) done by whoever had given it to you,
  486. * or really bad things will happen. Normally you want to use
  487. * fd_install() instead.
  488. */
  489. void __fd_install(struct files_struct *files, unsigned int fd,
  490. struct file *file)
  491. {
  492. struct fdtable *fdt;
  493. spin_lock(&files->file_lock);
  494. fdt = files_fdtable(files);
  495. BUG_ON(fdt->fd[fd] != NULL);
  496. rcu_assign_pointer(fdt->fd[fd], file);
  497. spin_unlock(&files->file_lock);
  498. }
  499. void fd_install(unsigned int fd, struct file *file)
  500. {
  501. __fd_install(current->files, fd, file);
  502. }
  503. EXPORT_SYMBOL(fd_install);
  504. /*
  505. * The same warnings as for __alloc_fd()/__fd_install() apply here...
  506. */
  507. int __close_fd(struct files_struct *files, unsigned fd)
  508. {
  509. struct file *file;
  510. struct fdtable *fdt;
  511. spin_lock(&files->file_lock);
  512. fdt = files_fdtable(files);
  513. if (fd >= fdt->max_fds)
  514. goto out_unlock;
  515. file = fdt->fd[fd];
  516. if (!file)
  517. goto out_unlock;
  518. rcu_assign_pointer(fdt->fd[fd], NULL);
  519. __clear_close_on_exec(fd, fdt);
  520. __put_unused_fd(files, fd);
  521. spin_unlock(&files->file_lock);
  522. return filp_close(file, files);
  523. out_unlock:
  524. spin_unlock(&files->file_lock);
  525. return -EBADF;
  526. }
  527. void do_close_on_exec(struct files_struct *files)
  528. {
  529. unsigned i;
  530. struct fdtable *fdt;
  531. /* exec unshares first */
  532. spin_lock(&files->file_lock);
  533. for (i = 0; ; i++) {
  534. unsigned long set;
  535. unsigned fd = i * BITS_PER_LONG;
  536. fdt = files_fdtable(files);
  537. if (fd >= fdt->max_fds)
  538. break;
  539. set = fdt->close_on_exec[i];
  540. if (!set)
  541. continue;
  542. fdt->close_on_exec[i] = 0;
  543. for ( ; set ; fd++, set >>= 1) {
  544. struct file *file;
  545. if (!(set & 1))
  546. continue;
  547. file = fdt->fd[fd];
  548. if (!file)
  549. continue;
  550. rcu_assign_pointer(fdt->fd[fd], NULL);
  551. __put_unused_fd(files, fd);
  552. spin_unlock(&files->file_lock);
  553. filp_close(file, files);
  554. cond_resched();
  555. spin_lock(&files->file_lock);
  556. }
  557. }
  558. spin_unlock(&files->file_lock);
  559. }
  560. struct file *fget(unsigned int fd)
  561. {
  562. struct file *file;
  563. struct files_struct *files = current->files;
  564. rcu_read_lock();
  565. file = fcheck_files(files, fd);
  566. if (file) {
  567. /* File object ref couldn't be taken */
  568. if (file->f_mode & FMODE_PATH ||
  569. !atomic_long_inc_not_zero(&file->f_count))
  570. file = NULL;
  571. }
  572. rcu_read_unlock();
  573. return file;
  574. }
  575. EXPORT_SYMBOL(fget);
  576. struct file *fget_raw(unsigned int fd)
  577. {
  578. struct file *file;
  579. struct files_struct *files = current->files;
  580. rcu_read_lock();
  581. file = fcheck_files(files, fd);
  582. if (file) {
  583. /* File object ref couldn't be taken */
  584. if (!atomic_long_inc_not_zero(&file->f_count))
  585. file = NULL;
  586. }
  587. rcu_read_unlock();
  588. return file;
  589. }
  590. EXPORT_SYMBOL(fget_raw);
  591. /*
  592. * Lightweight file lookup - no refcnt increment if fd table isn't shared.
  593. *
  594. * You can use this instead of fget if you satisfy all of the following
  595. * conditions:
  596. * 1) You must call fput_light before exiting the syscall and returning control
  597. * to userspace (i.e. you cannot remember the returned struct file * after
  598. * returning to userspace).
  599. * 2) You must not call filp_close on the returned struct file * in between
  600. * calls to fget_light and fput_light.
  601. * 3) You must not clone the current task in between the calls to fget_light
  602. * and fput_light.
  603. *
  604. * The fput_needed flag returned by fget_light should be passed to the
  605. * corresponding fput_light.
  606. */
  607. struct file *fget_light(unsigned int fd, int *fput_needed)
  608. {
  609. struct file *file;
  610. struct files_struct *files = current->files;
  611. *fput_needed = 0;
  612. if (atomic_read(&files->count) == 1) {
  613. file = fcheck_files(files, fd);
  614. if (file && (file->f_mode & FMODE_PATH))
  615. file = NULL;
  616. } else {
  617. rcu_read_lock();
  618. file = fcheck_files(files, fd);
  619. if (file) {
  620. if (!(file->f_mode & FMODE_PATH) &&
  621. atomic_long_inc_not_zero(&file->f_count))
  622. *fput_needed = 1;
  623. else
  624. /* Didn't get the reference, someone's freed */
  625. file = NULL;
  626. }
  627. rcu_read_unlock();
  628. }
  629. return file;
  630. }
  631. EXPORT_SYMBOL(fget_light);
  632. struct file *fget_raw_light(unsigned int fd, int *fput_needed)
  633. {
  634. struct file *file;
  635. struct files_struct *files = current->files;
  636. *fput_needed = 0;
  637. if (atomic_read(&files->count) == 1) {
  638. file = fcheck_files(files, fd);
  639. } else {
  640. rcu_read_lock();
  641. file = fcheck_files(files, fd);
  642. if (file) {
  643. if (atomic_long_inc_not_zero(&file->f_count))
  644. *fput_needed = 1;
  645. else
  646. /* Didn't get the reference, someone's freed */
  647. file = NULL;
  648. }
  649. rcu_read_unlock();
  650. }
  651. return file;
  652. }
  653. void set_close_on_exec(unsigned int fd, int flag)
  654. {
  655. struct files_struct *files = current->files;
  656. struct fdtable *fdt;
  657. spin_lock(&files->file_lock);
  658. fdt = files_fdtable(files);
  659. if (flag)
  660. __set_close_on_exec(fd, fdt);
  661. else
  662. __clear_close_on_exec(fd, fdt);
  663. spin_unlock(&files->file_lock);
  664. }
  665. bool get_close_on_exec(unsigned int fd)
  666. {
  667. struct files_struct *files = current->files;
  668. struct fdtable *fdt;
  669. bool res;
  670. rcu_read_lock();
  671. fdt = files_fdtable(files);
  672. res = close_on_exec(fd, fdt);
  673. rcu_read_unlock();
  674. return res;
  675. }
  676. static int do_dup2(struct files_struct *files,
  677. struct file *file, unsigned fd, unsigned flags)
  678. {
  679. struct file *tofree;
  680. struct fdtable *fdt;
  681. /*
  682. * We need to detect attempts to do dup2() over allocated but still
  683. * not finished descriptor. NB: OpenBSD avoids that at the price of
  684. * extra work in their equivalent of fget() - they insert struct
  685. * file immediately after grabbing descriptor, mark it larval if
  686. * more work (e.g. actual opening) is needed and make sure that
  687. * fget() treats larval files as absent. Potentially interesting,
  688. * but while extra work in fget() is trivial, locking implications
  689. * and amount of surgery on open()-related paths in VFS are not.
  690. * FreeBSD fails with -EBADF in the same situation, NetBSD "solution"
  691. * deadlocks in rather amusing ways, AFAICS. All of that is out of
  692. * scope of POSIX or SUS, since neither considers shared descriptor
  693. * tables and this condition does not arise without those.
  694. */
  695. fdt = files_fdtable(files);
  696. tofree = fdt->fd[fd];
  697. if (!tofree && fd_is_open(fd, fdt))
  698. goto Ebusy;
  699. get_file(file);
  700. rcu_assign_pointer(fdt->fd[fd], file);
  701. __set_open_fd(fd, fdt);
  702. if (flags & O_CLOEXEC)
  703. __set_close_on_exec(fd, fdt);
  704. else
  705. __clear_close_on_exec(fd, fdt);
  706. spin_unlock(&files->file_lock);
  707. if (tofree)
  708. filp_close(tofree, files);
  709. return fd;
  710. Ebusy:
  711. spin_unlock(&files->file_lock);
  712. return -EBUSY;
  713. }
  714. int replace_fd(unsigned fd, struct file *file, unsigned flags)
  715. {
  716. int err;
  717. struct files_struct *files = current->files;
  718. if (!file)
  719. return __close_fd(files, fd);
  720. if (fd >= rlimit(RLIMIT_NOFILE))
  721. return -EBADF;
  722. spin_lock(&files->file_lock);
  723. err = expand_files(files, fd);
  724. if (unlikely(err < 0))
  725. goto out_unlock;
  726. return do_dup2(files, file, fd, flags);
  727. out_unlock:
  728. spin_unlock(&files->file_lock);
  729. return err;
  730. }
  731. SYSCALL_DEFINE3(dup3, unsigned int, oldfd, unsigned int, newfd, int, flags)
  732. {
  733. int err = -EBADF;
  734. struct file *file;
  735. struct files_struct *files = current->files;
  736. if ((flags & ~O_CLOEXEC) != 0)
  737. return -EINVAL;
  738. if (unlikely(oldfd == newfd))
  739. return -EINVAL;
  740. if (newfd >= rlimit(RLIMIT_NOFILE))
  741. return -EBADF;
  742. spin_lock(&files->file_lock);
  743. err = expand_files(files, newfd);
  744. file = fcheck(oldfd);
  745. if (unlikely(!file))
  746. goto Ebadf;
  747. if (unlikely(err < 0)) {
  748. if (err == -EMFILE)
  749. goto Ebadf;
  750. goto out_unlock;
  751. }
  752. return do_dup2(files, file, newfd, flags);
  753. Ebadf:
  754. err = -EBADF;
  755. out_unlock:
  756. spin_unlock(&files->file_lock);
  757. return err;
  758. }
  759. SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
  760. {
  761. if (unlikely(newfd == oldfd)) { /* corner case */
  762. struct files_struct *files = current->files;
  763. int retval = oldfd;
  764. rcu_read_lock();
  765. if (!fcheck_files(files, oldfd))
  766. retval = -EBADF;
  767. rcu_read_unlock();
  768. return retval;
  769. }
  770. return sys_dup3(oldfd, newfd, 0);
  771. }
  772. SYSCALL_DEFINE1(dup, unsigned int, fildes)
  773. {
  774. int ret = -EBADF;
  775. struct file *file = fget_raw(fildes);
  776. if (file) {
  777. ret = get_unused_fd();
  778. if (ret >= 0)
  779. fd_install(ret, file);
  780. else
  781. fput(file);
  782. }
  783. return ret;
  784. }
  785. int f_dupfd(unsigned int from, struct file *file, unsigned flags)
  786. {
  787. int err;
  788. if (from >= rlimit(RLIMIT_NOFILE))
  789. return -EINVAL;
  790. err = alloc_fd(from, flags);
  791. if (err >= 0) {
  792. get_file(file);
  793. fd_install(err, file);
  794. }
  795. return err;
  796. }
  797. int iterate_fd(struct files_struct *files, unsigned n,
  798. int (*f)(const void *, struct file *, unsigned),
  799. const void *p)
  800. {
  801. struct fdtable *fdt;
  802. int res = 0;
  803. if (!files)
  804. return 0;
  805. spin_lock(&files->file_lock);
  806. for (fdt = files_fdtable(files); n < fdt->max_fds; n++) {
  807. struct file *file;
  808. file = rcu_dereference_check_fdtable(files, fdt->fd[n]);
  809. if (!file)
  810. continue;
  811. res = f(p, file, n);
  812. if (res)
  813. break;
  814. }
  815. spin_unlock(&files->file_lock);
  816. return res;
  817. }
  818. EXPORT_SYMBOL(iterate_fd);