Project

General

Profile

strange problem in cclm

Added by Iya Belova over 2 years ago

Dear colleagues,

We have the problem in cclm. Program is suddenly terminated for some unclear reasons.
int2lm script works correctly and all the output files appear.
We tried to install both starter package and normal version of int2lm and cclm but the problem is the same.
tests from starter package work fine.
We also tried to play with parameters in GRIBIN section but it didn’t help.
I’ve attached all of the files which, as we think, can be usefull for understanding problem.
Could you help us please?

Kind regards,
Iya Belova

P.S. due to old version of fortran compiler on our cluster we had to change lines from READ (nuin, inictl, IOSTAT=iz_err, IOMSG=iomsg_str) to READ (nuin, inictl, IOSTAT=iz_err) in some of the cclm install files


Replies (7)

RE: strange problem in cclm - Added by Hans-Juergen Panitz over 2 years ago

Dear Iya

actually, I have no explanation for the error.
At a frist glance, it seems to be a system (MPI?) error.
But who knows.
Nevertheless, here are a few comments on your setup:

1. the timestep dt: you are using dt=120 (sec) together with a spatial resolution of about 12 km.
dt=120 is, to my opinion, much too high. There is a large danger for violations of the CFL-criterion.
I would use dt=75

2. if I understand your setup correctly, you want to perform a 30 day simulation, starting 2009120100 and ending 2009123100.
This is a simulation duration of 720 hours (30 days * 24 hours/day) which should be the value for the namelist parameter “hstop”.
But you are using “hstop=30*720” (see your cclm-setup)
This could be corrected in your setup file by defining
NHOURS=24
instead of
NHOURS=720

3. the triple of values for namelist parameter “nhour_restart” should be
nhour_restart=120,$HSTOP,120
and not
nhour_restart=0,$HSTOP,120
where the values are given in hours.
However, this mistake (the first value of the triple) is corrected by CCLM (see cclm.exe.out)

4. Can someone else comment on Iya’s choices of Tuning parameters (see cclm.exe.out and YUSPECIF). They seem to be rather “extreme”.

Best regards
Hans-Juergen

RE: strange problem in cclm - Added by Burkhardt Rockel over 2 years ago

Regarding Hans-Jürgens item 4:
Iya, can you use the tuning parameters as in the starter package script and test your job?

RE: strange problem in cclm - Added by Iya Belova over 2 years ago

Thank you for your answers.
This problem really looks like an MPI error. We had almost the same problem some months ago (you can find this discussion in the Starter Package Support forum thread).
We made all the changes as you suggested in your answers but it didn’t help. I’ve attached new .out file just in case but it seems that there are no real changes there.

P.S. in sp cclm script we had the following tuning parameters:
” wichfakt=0.,
tur_len=500.,
v0snow=20.,
tkhmin=0.35,
tkmmin=1.,
rlam_heat=0.5249,
mu_rain=0.5,
entr_sc=0.0002,
uc1=0.0626,
fac_rootdp2=0.9000,
soilhyd=1.6200”
we also tried to start without tunung parameters at all but it gave the same result. Maybe there is some soft regime which we could try?

P.P.S. there are multiple lines in .out file “src_input: check completeness of input data”. Maybe this can tell something about source of the problem?

RE: strange problem in cclm - Added by Burkhardt Rockel over 2 years ago

Since you wrote that the starter package tests work, you may try to make your changes step by step from the starter package settings to your requested settings.

The multiple lines in .out file “src_input: check completeness of input data” appear because each processor writes this. This can be suppressed by changing the line

PRINT *, ' src_input: check completeness of input data'
to
IF (my_cart_id == 0) PRINT *, ' src_input: check completeness of input data'
Then only processor 0 writes the output.

RE: strange problem in cclm - Added by Hans-Juergen Panitz over 2 years ago

And a further suggestion in order to find out whether there is really a MPI problem on your system.

Sicne the error occurs already at the very beginning of your simulation
try to run it using only one process: nprocx=1, nprocy=1

If the error does not occur anymore, then I would say, it is a MPI/system problem

Hans-Juergen

RE: strange problem in cclm - Added by Iya Belova over 2 years ago

Dear Hans-Juergen,

I’ve mentioned before that we had the problem in starter package. The problem was that we were not able to start program in the uniprocessor mode. When we try to make it script fails earlier during reading ncdf files (both int2lm and cclm).

RE: strange problem in cclm - Added by Iya Belova over 2 years ago

Dear colleagues,

Thank you for your advices.
Problem was that our cluster is too weak for chosen LM grid.
We made new grid with the following options and now everything works:
startlat_tot = -7.6, startlon_tot = -7.6,
pollat = 34.3, pollon = -142.5,
dlon=0.152, dlat=0.152,
ie_tot=100, je_tot=100, ke_tot=40,

Kind regards,
Iya Belova

    (1-7/7)