The online racing simulator
#1 - amp88
[OT] Java - Network Delay Problem
I've written a little utility app to store various details about files located on my network. I've been trying to optimise it as it's taking quite a bit of time to run. There are 3 main sections in the program flow.

1 - Gathering details for each of the files in the specified network location.
2 - Checking to see if there are any duplicates within those files gathered above.
3 - Displaying statistical information about the files

Roughly 95% of the time taken for the program to run is in the gathering of file details. This is to be expected, but now I'm trying to optimise the retrieval of the details across the network. For each file I need several pieces of information:

The file name, absolute file path, file size and last modified date.

I've timed the retrieval of each of these pieces of information (using System.nanoTime()) and the time taken to get the file name and absolute path is always 0ms, but getting the file size takes 3-4ms and getting the last modified date takes another 3-4ms.

Is it possible to force the program to retrieve both the file size and the last modified date in one network trip? This would roughly half the time taken to gather the file details.

Thanks in advance for any help or information.
If you have a method for getting all file information one function call I would use that, as it would seem that it is just your network lag time that is causing most of the problems.

system.file.info(), something like that.
java.io.File doesn't implement a way to grab all file properties at once - and, as far as I know, there isn't any default Java solution for your problem. However, this isn't a good time of the evening for me to be thinking about such things... so I'll have a fresh look at it tomorrow!
#4 - Stuff
java.io.File does have the listFiles() method (along with lastModified, length/size, hidden, etc) that returns an array of java.io.File objects in the directory. I have not tested it but it's where I would start. Some examples here: http://exampledepot.com/egs/java.io/pkg.html
That's why we keep you around Ray . Nice!
Handy function. :up:
File.listFiles() is a handy method, but I don't think it's going to solve amp88's problem. The File objects returned are just abstract representations of the files on disk. Each time you call length() or lastModified(), it fetches the data again.

amp88 - have you done any profiling with hprof or something to see where time is being spent at a finer grain?
A Java and .NET Code dump thread like the one we have for PHP would be useful.

[Forth Request for an Off-Topic, while on topic of programming Fourm.]
Quote from Dygear :[Forth Request for an Off-Topic, while on topic of programming Fourm.]

Denied for the simple reason that would give grounds to anyone arguing that the community would be capable of teaching people about programming in general. With it being small, very specific requests here you can argue not, and so requests are allowed. Completely trivial requests from this point will be moved or closed pointing people at where they can find the answer i.e. "what's an if statement omgwtfz! :doh:" will result in a closed thread, pointing at google. Things like Ian.H's MD5 tool will be moved to off topic.

If you want to post on a purely programming forum, which you apparently do, go and find one. At the end of the day this is a LFS related forum, so things should be mostly about LFS.
Quote from Stuff :java.io.File does have the listFiles() method (along with lastModified, length/size, hidden, etc) that returns an array of java.io.File objects in the directory. I have not tested it but it's where I would start. Some examples here: http://exampledepot.com/egs/java.io/pkg.html

I'm already using listFiles, but as someone pointed out it doesn't retrieve all the information I need. It does bring back the path though (so the subsequent call to file.getAbsolutePath() is instant (well, as instant as a lookup can be...), but the calls to length() and lastModified() take the network trip. Thanks for the reply though.

To make the situation a bit clearer, here's the recursive method to retrieve the file details:

The printTimeInfo method is a helper method that takes an input time and a string for what the task was. It prints the time difference between now and the time passed in. Sample output for a file would be the following:

0 ms get file name
0 ms get absolute file path
3 ms get file size
3 ms get last modified
8 ms add new file object

So, we can see that retrieving the file size and the last modified date are taking the vast majority of the time for each file (the 8ms for the "add new file object" is for the overall file (i.e. all of the above).


private void gatherFileDetailsFile(String pathname) {
try {
long startGatherFileDetailsTime = System.nanoTime();

File dir = new File(pathname);
File[] files = dir.listFiles();

printTimeInfo(startGatherFileDetailsTime, "gather file details for "+pathname);

for (int i = 0; i < files.length; i++) {
if (files[i].isDirectory()) {
gatherFileDetailsFile(files[i].getAbsolutePath());
} else {
if (files[i].length() > filesizeLimit) {
long startGatheringFileDetailsTime = System.nanoTime();

long startGettingNameTime = System.nanoTime();
String name = files[i].getName();
printTimeInfo(startGettingNameTime, "get file name");
long startGettingAbsPathTime = System.nanoTime();
String absPath = files[i].getAbsolutePath();
printTimeInfo(startGettingAbsPathTime, "get absolute file path");
long startGettingLengthTime = System.nanoTime();
long length = files[i].length();
printTimeInfo(startGettingLengthTime, "get file size");
long startGettingLastModifiedTime = System.nanoTime();
long lastModified = files[i].lastModified();
printTimeInfo(startGettingLastModifiedTime, "get last modified");
/*long startGettingIsDirTime = System.nanoTime();
boolean isDir = files[i].isDirectory();
printTimeInfo(startGettingIsDirTime, "get is directory");*/

// Safe to pass "false" to FileObject constructor as this file is not a directory
filesGlobal.add(new FileObject(name, absPath, length, lastModified, false));

printTimeInfo(startGatheringFileDetailsTime, "add new file object");
} else {
filesExcluded++;
}
}
}
} catch (Exception e) {
System.out.println("Exception (gatherFileDetailsFile): "
+ e.getMessage());
// addLog("Exception (gatherFileDetailsFile): "+e.getMessage());
}
}

Quote from amp88 :

To make the situation a bit clearer, here's the recursive method to retrieve the file details:
.
.
<snip>
.
.
Sample output for a file would be the following:

0 ms get file name
0 ms get absolute file path
3 ms get file size
3 ms get last modified
8 ms add new file object

So, we can see that retrieving the file size and the last modified date are taking the vast majority of the time for each file

The only way I can think of escaping the network overhead you're incurring now is to create a service of one kind or another on the target machine that you can call to get the information you're requesting all in one transfer. I'm not sure what the requirements for the program you're writing are, though. So, that might not even be a consideration.
Quote from rheiser :The only way I can think of escaping the network overhead you're incurring now is to create a service of one kind or another on the target machine that you can call to get the information you're requesting all in one transfer. I'm not sure what the requirements for the program you're writing are, though. So, that might not even be a consideration.

Don't know why I didn't think of that before. I've rewritten it to use RMI and now it's taking roughly half a millisecond per file rather than the ~8ms per file as before, since it's doing all the work on the local machine. Thanks for the suggestion

FGED GREDG RDFGDR GSFDG