Hi experts,
as a follow-up of the recent CMS open data tutorial, I’ve been exercising on the Google cloud platform to get a grasp of eventual access modes to open data for open data users.
I’ve observed the following behaviour when running CMSSW analysis jobs on CMS AOD files (reading through xrootd from CERN):
These are 10 AOD files of total of 40G and 163kevts, and would be a normal first step of an analysis. I believe the time dependence on the cluster location is expected, but it is quite strong.
Comparing to the next step, i.e. running an analysis in plain root on the output data of the first step (again, in this case, reading the input through xrootd from CERN) I see the following:
This is a single root file of 16 G and 51 Mevts.
As you can see, the dependence on the cluster distance from CERN is four time stronger for CMSSW jobs reading AOD root files compared to root jobs reading plain root files.
I’m curious to know why this happens. Do you have some ideas?
Best, Kati
CMSSW job:
- input data from http://opendata.cern.ch/record/6029
- code from http://opendata.cern.ch/record/12340
- container from https://hub.docker.com/r/cmsopendata/cmssw_5_3_32
Root job:
- input data from http://opendata.cern.ch/record/12359
- code from skim of http://opendata.cern.ch/record/12350
- container FROM rootproject/root-conda:6.18.04