UNIX Socket FAQ

A forum for questions and answers about network programming on Linux and all other Unix-like systems

You are not logged in.

#1 2005-01-21 11:55 PM

Nope
Administrator
From: Germany
Registered: 2004-01-24
Posts: 385
Website

Re: stat returns "Resource temporarily unavailable"

Hi there.

Running a SuSE 9.2

I've done some working on my server thingy, ya know, to squeeze some
more speed out, and I've gone over my file cache subsystem to do so.

Now, I check those cache entries in a regular interval. First, I'd call stat()
to get the file status data and if it has changed I update the cache entry.
Now I got suddenly some pretty messed up results. Whenever I run some
extensive benchmarks this stat() produces an errno 11 which perror
translates into "Resource temporarily unavailable" (EAGAIN). How the
heck can that happen? I was pretty sure that Linux only supports blocking
file operations, so I'd expect it just to hang for a moment in case the file is
busy or something? Any idea how I can avoid that error? The stat call
takes the path as argument, so it even can't be that I set the filedescriptor
myself to nonblocking. However, the file itself is mmaped at that moment
and the file was set to nonblocking before the mmap call. Could that have
anything to do with it? And no, I won't make my server slower! :wink:

Edited:
OK, I found some more in depth description of the glibc and it seems Linux
does support non-blocking file IO. Now I have to wonder where I read that
it wouldn't? Still, stat alone should use its own filedescriptor, at the time of
the error only 6 of those where used within the server app, so it can't be a
an availibilty problem. I also have only 10 mmaped files overall. I am in a
single thread and no processes where spawned.

Offline

#2 2005-01-23 05:41 PM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: stat returns "Resource temporarily unavailable"

Hi Nope,

You can get EAGAIN for other things too, that we're used to it as something specific to non-blocking sockets is another matter. I'd just retry the stat() call whenever that happens (with some code to catch when a long loop happens: if that's the case then something is really wrong).

Do you keep the files in your cache by mmapping them with the MAP_LOCKED flag or something? Then why do you need the stat() call, as the content will change immediately? If you simply read the file and cache that data then you're caching for nothing as the kernel will cache the files too, though with double caching the memory usage is higher and thus the files are evicted sooner from the cache. It's probably better to tell the kernel to keep those files in memory by mmapping them with the MAP_LOCKED flag, using mlock() or perhaps madvice().

You could also try to use dnotify (or inotify if you don't mind patching) instead of polling with stat().

I can't find the other thread where you mentioned that calling malloc was faster than using already allocated memory, so I'll answer here:

It's because of the cpu cache: Your already allocated pages aren't in the cpu cache, while the kernel will give cache hot pages first when asked for memory. That's why doing malloc can be much faster than using already allocated pages. At least that's my little theory.

Without seeing any code all the above is mere speculation and wild guessing of course. ;-)

Offline

#3 2005-01-23 07:17 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: stat returns "Resource temporarily unavailable"

I'm pretty sure one case where you can see EAGAIN in such a
call is if the file (or, some dir in the path) changed significantly while
the syscall were in the middle of processing...  Look through the
ext3 kernel code (I'm assuming you're using ext3, anyway) and
you'll see lots of EAGAIN returns when the "branch changed" (ie:
the chain of inodes and indirect blocks gets modified, which I think
could happen on any significant size change of the file)...  So, if you
had some process continually growing the file, and another continually
stat()'ing it, I would expect the stat()'er to get EAGAIN occassionally...
(Note: I'm completely guessing here, and haven't actually tested this
to see if it happens...)

Offline

#4 2005-01-23 11:39 PM

Nope
Administrator
From: Germany
Registered: 2004-01-24
Posts: 385
Website

Re: stat returns "Resource temporarily unavailable"

I indeed do mmap the file. However, it's not the content that's interesting but the changed modification time as well as the perhaps changed size. The mmap does take a size parameter after all. But the real issue is that I also cache part of the http respond header in there too and that is a thing I have to update as well as other stuff I'd need for transmitting the whole thing. I don't lock the mmapped pages else I might have to deal with segvfaults at this point.
Btw, malloc internally falls back on the mmap call anyway (MAP_ANONYMOUS). Indeed, both provide the same speed if tested under the same conditions. I use MAP_ANONYMOUS together with MAP_SHARED in my fork based server to be able to share the preloaded file cache between them (instead of normal shared mem), works like a charm.

To the topic itself. Neither the file changed nor anything else within the whole path. Basically it would return a dataset that is exactly the same as it was when I last updated it. It doesn't happen all the time either, sometimes it goes through, sometimes it doesn't. Now, whenever I check the file, no one is reading/writing it. It is actually not in use, else I'd have my "in use counter" higher than zero, what would cause the routine to skip the test alltogether. The only thing right at that time is the existing mmap on the file. The thing is that you can't really delete a file that has an open descriptor (file or mmap both increase the file systems in-use counter). A delete will make it "invisible" but it's still there until the last open filedescriptor is closed, then it vanishes. The mmap area also doesn't change automatically if the size of the file changes, so I have to update that. The original filepointer to mmap it in the first place is long closed at that moment too.
I never encountered an EAGAIN for a stat() before. It is supposed to be a blocking operation. It's filedescriptor can't be set to nonblocking as there is none in the first place (unlike fstat) and the default should be blocking. There can't be an EAGAIN for a blocking file operation, that would destroy the reason why you use that in the first place! It looks to me as if the OS sees the shared mmap and uses it with all its settings whenever a new descriptor is opened for the file and that can't be right.

It's like this, I search the dynamic file cache (dfc), if the search comes back negative I get the stat of the file. If that's positive I open the file. If the opening succeedes I mmap the file (if the size is within a set limit) and close the filedescriptor right afterwards and then add it to the dfc. If the mmap fails I keep the file open, add the stat alone to the dfc and then send the file directly with sendfile and then close the file. If the dfc already had the file in it (with or without mmap), and a certain minimum time has passed since the entry was created, I call stat again to make sure the dfc is still up to date. The adding to the dfc also creates a couple of http headerlines for later use. Why should I for example create the last-modified entry everytime anew when the time-to-string is such an expensive op? If someone wants to know, the dfc uses a self-optimising (that's not balancing!) translucent binary tree that behaves like a trinary one without the hassle due to my comparission function. I also use an additional selfsorting fifo to determin which entry to delete in case it was idle too long or in case I need more space for a new entry. However, it is fast enough to beat Zeus, LiteSpeed professional as well as Tux (at least in some cases) for static content. Even my fork/thread based ones manage to keep up with for example Boa.

Offline

#5 2005-01-24 12:29 AM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: stat returns "Resource temporarily unavailable"

Just accept that EAGAIN isn't used for non-blocking files only and that stat() can fail.

A lot of boasting, but no source. Moving on...

Offline

#6 2005-01-24 01:30 PM

RobSeace
Administrator
From: Boston, MA
Registered: 2002-06-12
Posts: 3,839
Website

Re: stat returns "Resource temporarily unavailable"

I really do tend to agree with Nope, though: one shouldn't HAVE to
deal with spuriously interrupted syscalls for no obvious or documented
reason, just due to some temporary anomoly that the system itself
could work around by simply retrying on its own, without bugging the
higher-level coder...  (Though, if you think about it, we really have to
deal with it all the time: EINTR being the classic example...  And, don't
think I don't loath that, as well... ;-)  All systems should auto-restart
interrupted syscalls, unless told otherwise, dammit... ;-))  And, I
personally have never seen stat() fail with EAGAIN, either...  And, I
definitely would be equally surprised and annoyed by it...  I certainly
never code my stat()'s in an EAGAIN loop, a la EINTR protection...
And, I'd be justifiably annoyed if I had to start doing so, and no one
could give me a clear reason WHY I suddenly need to, when it's
never been necessary in the past, or at least no one ever informed
me it might be...  (The man pages don't even mention EAGAIN as a
possible errno value for stat() failure...  Not that you can EVER trust
a man page further than you can spit a rat... ;-))  At the very least,
I'd definitely want to find out WHAT exactly is causing this strange
condition to occur, and possibly try to prevent its occurance in the
first place, rather than kluge around it with an EINTR-style loop...
(And, anyone that doesn't think EINTR loops are kluges from hell
is absolutely insane... ;-))

Offline

#7 2005-01-24 06:50 PM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: stat returns "Resource temporarily unavailable"

Yes, it's terribly annoying, but once you have a loop for EINTR it's not much work to add EAGAIN too. ;-)

Just imagining the code bloat when every program starts handling all syscall errors "properly" with loops and stuff is scary...

Offline

#8 2005-01-26 11:10 AM

Nope
Administrator
From: Germany
Registered: 2004-01-24
Posts: 385
Website

Re: stat returns "Resource temporarily unavailable"

I actually do handle as much possible error conditions as possible. I have a lot syscalls encapsulated with a class by now, just to get that code overhead out of my sight.

Yep, a lot bloating. But what code do you expect? The stat call part alone? That doesn't show anything, I'd have to post all code that does deal with the resource and to get the connections between those snippets right, I'd have to post at least a framework of the whole multiplexing core. Don't think you'd want to look through 300 or so lines where the reason still might be in the parts I left out.

It seems that the EAGAIN condition does not occure when I disable TCP_DEFER_ACCEPT. If that's because of the overall speed loss or if that pre-accepting of the OS really is the source of the whole mess is a thing I can't say. For now I just handle the EAGAIN as a condition that the file is still valid, but don't set a new re-test time, what is almost the same as a loop without causing useless delay. I have googled and been reading a couple of hundred pages too, but none mentions EAGAIN for stat either. And, I haven't seen an EAGAIN for stat in the 19 years I write C code, so it is worth a note, at least as a hint for others where they might run into trouble later.

Offline

#9 2005-01-26 06:48 PM

i3839
Oddministrator
From: Amsterdam
Registered: 2003-06-07
Posts: 2,239

Re: stat returns "Resource temporarily unavailable"

I meant the code of your server. All nice and well if you're telling how great it is compared to others, but if it's not even open source then it's not interesting at all, at least not to me. So either release the source, or stop boasting how good it is. ;-)

Perhaps putting a break point in the error path of stat() to see what's going on with gdb, or something like that? Or try with a tmpfs or another filesystem, if it's a valid error code from stat then it's probably some internal filesystem generated error code, so if switching just filesystems gets rid of it then that may be a hint.

Offline

#10 2005-01-28 12:42 AM

Nope
Administrator
From: Germany
Registered: 2004-01-24
Posts: 385
Website

Re: stat returns "Resource temporarily unavailable"

Truth is, I might go opensource one day, but as long as I am looking for a
job and I don't know if I can use that code to heighten my chances, it'll be
dumb to do so now. But in case I can't use it in a job I'll go OpenSource
asap, that's if I don't have to continue to make my living by other means
as it is now.

Honestly, the source of the problem is not an issue as long as I can't solve
it within the code. Even if I can solve it locally by changing the file system
the next user might just run into it again, so I have to leave the handling
code as it is anyway. But I might just look into the stat as you suggested
to get at least a hint where it comes from. Or, what might be a better idea,
post it in a kernel developer forum to get the right answers. I should have
gone there in the first place if I'd just know one. Time to google again I
guess.

Offline

Board footer

Powered by FluxBB