Problem when switching root files: max_file_size?

Dear ROOTers

In my program I have an option, which allows a user to save temporary trees in a temporary file instead of the main file.

When testing my program I have never seen any problem with this approach. However, now a user, who was using my R package
with many huge files resulting in file sizes > 2GB, reported the following crash:

> data.rma <- rma(data.exon,"MixRMAMetacorePS",tmpdir=tmpdir,background="antigenomic",
                 normalize=T,option="probeset",exonlevel="core")
Creating new temporary file <.../tmp_bgrd_310151_rbg.root> for <rma>...
Creating new temporary file <.../tmp_rkq_cqu.root> for <quantile>...
Creating new temporary file <.../tmp_expr_310151_mdp.root> for <medianpolish>...
Creating new file <.../MixRMAMetacorePS.root>...
Opening file <.../Scheme_HuEx10stv2r2_na25.root> in <READ> mode...
Opening file <.../HuTissuesExon_cel.root> in <READ> mode...
Preprocessing data using method <preprocess>...
   Background correcting raw data...
      calculating background for <A01.cel>...

...

Fill: Switching to new file: .../tmp_bgrd_310151_rbg_1.root
      calculating background for <F02.cel>...
      setting selector mask for typepm <9216>

 *** Break *** illegal instruction
Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Attaching to program: /proc/6480/exe, process 6480
[Thread debugging using libthread_db enabled]
[New Thread 182896897472 (LWP 6480)]
0x000000325498f9c4 in waitpid () from /lib64/tls/libc.so.6
#1  0x0000003254939bbf in do_system () from /lib64/tls/libc.so.6
#2  0x0000002aab50abfd in TUnixSystem::StackTrace ()
   from /root/Desktop/root/lib/libCore.so
#3  0x0000002aab50797a in TUnixSystem::DispatchSignals ()
   from /root/Desktop/root/lib/libCore.so
#4  <signal handler called>
#5  0x000000000a97c729 in ?? ()
#6  0x0000007fbfffba68 in ?? ()
#7  0x0000002aaaecc28b in XGCProcesSet::AdjustBackground ()
   from /usr/local/lib64/R/library/xps/libs/xps.so
#8  0x0000002aaaeac222 in XGCProcesSet::Preprocess ()
   from /usr/local/lib64/R/library/xps/libs/xps.so
#9  0x0000002aaaea8419 in XPreProcessManager::Preprocess ()
   from /usr/local/lib64/R/library/xps/libs/xps.so
#10 0x0000002aaaeeaf8f in PreprocessRMA ()
   from /usr/local/lib64/R/library/xps/libs/xps.so
#11 0x0000000000513e69 in do_dotCode (call=0x4399f88, op=0x7f4c78, 
    args=0x1a58e28, env=Variable "env" is not available.
) at dotcode.c:1774
#12 0x0000000000536a03 in Rf_eval (e=0x4399f88, rho=0xda3450) at eval.c:489
#13 0x0000000000537c52 in Rf_DispatchOrEval (call=0x4399ff8, op=0x813818, 
    generic=0x600c9a "$", args=0x1a58c68, rho=0xda3450, ans=0x7fbfffcc20, 
    dropmissing=0, argsevald=0) at eval.c:1862
#14 0x000000000049143f in do_subset3 (call=0x4399ff8, op=0x813818, 
    args=0x1a58c68, env=0xda3450) at subset.c:981
#15 0x00000000005368c7 in Rf_eval (e=0x4399ff8, rho=0xda3450) at eval.c:463
#16 0x00000000005386b6 in do_set (call=0x439a0a0, op=0x812d48,
args=0x439a068, 
    rho=0xda3450) at eval.c:1420
#17 0x00000000005368c7 in Rf_eval (e=0x439a0a0, rho=0xda3450) at eval.c:463
#18 0x000000000053873c in do_begin (call=0x43961c0, op=0x813ab8, 
    args=0x439a0d8, rho=0xda3450) at eval.c:1172
#19 0x00000000005368c7 in Rf_eval (e=0x43961c0, rho=0xda3450) at eval.c:463
#20 0x0000000000539df4 in Rf_applyClosure (call=0x439d520, op=0x4396498, 
    arglist=0xdaeb08, rho=0xdd4988, suppliedenv=0x835388) at eval.c:669
#21 0x00000000005367d7 in Rf_eval (e=0x439d520, rho=0xdd4988) at eval.c:507
#22 0x000000000053873c in do_begin (call=0x4395648, op=0x813ab8, 
    args=0x439d558, rho=0xdd4988) at eval.c:1172
#23 0x00000000005368c7 in Rf_eval (e=0x4395648, rho=0xdd4988) at eval.c:463
#24 0x0000000000539765 in R_execClosure (call=0xb79310, op=0x4390a98, 
    arglist=0xb22ed0, rho=0xb5db90, newrho=0xdd4988) at eval.c:754
#25 0x0000000000539a71 in R_execMethod (op=0x4390a98, rho=0xae42b0)
    at eval.c:857
#26 0x0000002a988ce3f9 in R_dispatchGeneric (fname=0x43b2e58, ev=0xae42b0, 
    fdef=0xae4358) at methods_list_dispatch.c:905
#27 0x0000000000425966 in do_standardGeneric (call=Variable "call" is not
available.
) at objects.c:965
#28 0x0000000000536a97 in Rf_eval (e=0x43b22c8, rho=0xae42b0) at eval.c:492
#29 0x0000000000539df4 in Rf_applyClosure (call=0xb79310, op=0x438ce88, 
    arglist=0xb22ed0, rho=0xb5db90, suppliedenv=0x835388) at eval.c:669
#30 0x00000000005367d7 in Rf_eval (e=0xb79310, rho=0xb5db90) at eval.c:507
#31 0x00000000005386b6 in do_set (call=0xb79268, op=0x812d48, args=0xb792a0,

    rho=0xb5db90) at eval.c:1420
#32 0x00000000005368c7 in Rf_eval (e=0xb79268, rho=0xb5db90) at eval.c:463
#33 0x000000000053873c in do_begin (call=0xb791f8, op=0x813ab8,
args=0xb79230, 
    rho=0xb5db90) at eval.c:1172
#34 0x00000000005368c7 in Rf_eval (e=0xb791f8, rho=0xb5db90) at eval.c:463
#35 0x00000000005368c7 in Rf_eval (e=0xc330b0, rho=0xb5db90) at eval.c:463
#36 0x000000000053873c in do_begin (call=0xc33040, op=0x813ab8,
args=0xc33078, 
    rho=0xb5db90) at eval.c:1172
#37 0x00000000005368c7 in Rf_eval (e=0xc33040, rho=0xb5db90) at eval.c:463
#38 0x0000000000539df4 in Rf_applyClosure (call=0xc49938, op=0xc49ba0, 
    arglist=0xb75480, rho=0x835350, suppliedenv=0x835388) at eval.c:669
#39 0x00000000005367d7 in Rf_eval (e=0xc49938, rho=0x835350) at eval.c:507
#40 0x00000000005386b6 in do_set (call=0xc499e0, op=0x812d48, args=0xc499a8,

    rho=0x835350) at eval.c:1420
#41 0x00000000005368c7 in Rf_eval (e=0xc499e0, rho=0x835350) at eval.c:463
#42 0x0000000000413f24 in Rf_ReplIteration (rho=0x835350, savestack=0, 
    browselevel=0, state=0x7fbfffed70) at main.c:257
#43 0x0000000000414058 in R_ReplConsole (rho=0x835350, savestack=0, 
    browselevel=0) at main.c:306
#44 0x0000000000414302 in run_Rmainloop () at main.c:967
#45 0x0000000000412428 in main (ac=Variable "ac" is not available.
) at Rmain.c:35

As you see, the crash occured in method AdjustBackground() after switching to a new temporary file.
The relevant part of method XGCProcesSet::AdjustBackground() may be:

Int_t XGCProcesSet::AdjustBackground(Int_t numdata, TTree **datatree, Int_t &numbgrd, TTree **bgrdtree)
{
   TDirectory *savedir = gDirectory;
   TFile      *tmpfile = fBackgrounder->GetFile();

   for (Int_t k=0; k<numdata; k++) {

   // Change directory
      if (tmpfile != 0) tmpfile->cd();
      else if (!fFile->cd(fName)) return errGetDir;

      bgrdtree[k] = new TTree(bgrdname, fSchemeName);
      XBgCell *bgcell = new XBgCell();
      bgrdtree[k]->Branch("BgrdBranch", "XBgCell", &bgcell, 64000, split);

   // Write background tree to file 
      WriteTree(bgrdtree[k], TObject::kOverwrite);
   }
}

Interestingly, the user reported that he could run my program w/o crash when not using a temporary file as option,
but saving all trees in the main file, even though then the main file was divided into sub-files.

1, Do you have any ideas what might be the reason for this crash?
Could changing to ‘tmpfile->cd()’ in the for-loop be the reason?

2, How could I test this situation?

3, For testing purposes I would like to decrease the maximum file size to 100kB only. Is this possible, and how?
Would this reflect the real situation (since the trees would probably remain in RAM)?

Best regards
Christian

see : root.cern.ch/root/html/TTree.htm … ChangeFile

Rene

Dear Rene

Thank you for this link.

What I do not understand is. The link says: "The file should not contain sub-directories"
Although my tmp-file does not contain sub-directories, the trees in the main file are stored in a sub-directory.
Why does the user have no problems when storing all trees in the main directory only?

Best regards
Christian

To benefit from the automatic file switching you can have only one Tree per file and no sub-directories.
To disable the automatic switch call TTree::SetMaxtreeSize

Rene

Dear Rene

This is really bad news.

For the moment I will set SetMaxtreeSize(20 GB) but this is no solution.
I am not sure if 32bit OS can handle such a file, and 20 GB may not be enough!

Since SetMaxtreeSize() is a static method I assume that I need to set it only once in my program.
Is this correct?

Using only one tree per file is also no solution, since it will clutter people’s directories with many hundred files.

Thus, ultimately I need to implement file switching manually:
1, Is this in principle possible, even when I am using sub-directories?
2, Is there some demo code which I could modify for my purposes?

Best regards
Christian

[quote]This is really bad news.
[/quote]Why?

[quote]For the moment I will set SetMaxtreeSize(20 GB) but this is no solution.
I am not sure if 32bit OS can handle such a file, and 20 GB may not be enough!
[/quote]Unless you use an antic prehistoric OS you should not have any problem in creating a 20 GB file.

[quote]Since SetMaxtreeSize() is a static method I assume that I need to set it only once in my program.
Is this correct?
[/quote]Yes, see example in $ROOTSYS/test/MainEvent.cxx

[quote]Using only one tree per file is also no solution, since it will clutter people’s directories with many hundred files.
[/quote]I am lost! this should be the contrary

[quote]Thus, ultimately I need to implement file switching manually:
1, Is this in principle possible, even when I am using sub-directories?
2, Is there some demo code which I could modify for my purposes?
[/quote]There is a contradiction in your request. Having one single file should remove the problem.
If you want to do file switching yourself, you have to monitor the file size at regular intervals (see TFile functions) and decide yourself to create a new file with the directory structure that you created at startup time, ie call your initialisation routine.

Rene

Dear Rene

You are right, I should have said:
Although I would prefer that all trees are stored in one file only, many people still use FAT32 which does
not allow file sizes larger than about 2GB.

Ideally, I would prefer to implement my program in a way which allows the user to choose either option.

I do not quite understand, why one tree per file should be a good solution for my purposes:
When I have the data of 400 DNAchips, I need to import them into 400 trees, and one tree per file
would mean to save 400 files. To create only one tree with 400 branches is not an option for me.

Best regards
Christian