Sunday 15 August 2010

c - Why readdir() call in linux grows non-linearly -



c - Why readdir() call in linux grows non-linearly -

i have directory 1000 files , readdir() takes less 1 sec, 10000 files take around 24 sec.

why? should linear.

can explain reason. , there improve solution if need file , sub-directory names in directory?

edit on local linux pc.

it might file scheme specific. perhaps using suitably configured ext4 or btrfs file scheme should help. file systems using hashing or b-tree techniques create complexity of file access in directory of size n o(log n), others still linear e.g. o(n), , kernel might weird things above that.

the shell might utilize in huge directories sort entries when globbing (see glob(7)). , don't want auto-completion lastly many seconds on each keystroke!

i believe should never have huge directories (e.g. more few hundred entries), 10000 files in single directory unreasonable. if case, you'll improve organize files differently, e.g. subdir01/file001.txt ... sbudir99/file999.txt

btw, if need have lot of little things accessible textual key, using indexed file (like gdbm) or sqlite "database", or real database (postgresql, mongodb ...) much more suitable, , more efficient. don't forget dump info (probably in textual format) backup.

notice documentation of readdir(3) on linux, , of posix readdir not mention time complexity or linear behavior. lack of mention significant. on commonly used fat filesystem (e.g. on many usb keys) time complexity quadratic.

c linux

No comments:

Post a Comment