Error messages

From Gerris

(Difference between revisions)
Jump to: navigation, search
Revision as of 21:16, 1 December 2011
EmilyMLane (Talk | contribs)

← Previous diff
Revision as of 02:22, 9 December 2011
EmilyMLane (Talk | contribs)
(Nicholson Canyon viscous serial run)
Next diff →
Line 124: Line 124:
</pre> </pre>
 +
 +== Pegasus Canyon viscous parallel run ==
 +
 +<pre>
 +#0 0x00007fb97083bab5 in raise () from /lib64/libc.so.6
 +#1 0x00007fb97083cf47 in abort () from /lib64/libc.so.6
 +#2 0x00007fb9712d5b4a in g_logv () from /lib64/libglib-2.0.so.0
 +#3 0x00007fb9712d5bd3 in g_log () from /lib64/libglib-2.0.so.0
 +#4 0x00007fb9712d3c17 in g_realloc () from /lib64/libglib-2.0.so.0
 +#5 0x00007fb9712a0739 in ?? () from /lib64/libglib-2.0.so.0
 +#6 0x00007fb9712a0ab7 in g_array_append_vals () from /lib64/libglib-2.0.so.0
 +#7 0x00007fb971803f52 in gfs_output_location_read () from /usr/lib64/libgfs3D-1.3.so.2
 +#8 0x00007fb971839f04 in simulation_read () from /usr/lib64/libgfs3D-1.3.so.2
 +#9 0x00007fb9715c128d in gts_graph_read () from /usr/lib64/libgts-0.7.so.5
 +#10 0x00007fb9718287eb in gfs_domain_read () from /usr/lib64/libgfs3D-1.3.so.2
 +#11 0x00007fb971837910 in gfs_simulation_read () from /usr/lib64/libgfs3D-1.3.so.2
 +#12 0x000000000040207c in main ()
 +</pre>
== Parallel runs of solid block scenario == == Parallel runs of solid block scenario ==

Revision as of 02:22, 9 December 2011

Contents

Emily's Favourite Error Messages


This page is a place for storing error messages so I can compare and contrast.


Nicholson Canyon viscous3D parallel runs

I get the same message on both turbine and the new modelling computer and it occurs at the same position:


Modelling computer:

   step:       1 t:      0.50000000 dt:  5.000000e-01 cpu:     31.87250000 real:     34.53414900
   MAC projection        before     after       rate
   niter:    7
   residual.bias:   -7.007e-03 -5.369e-05
   residual.first:   7.016e-03  5.373e-05      2
   residual.second:  1.461e-02  7.277e-05    2.1
   residual.infty:   3.525e-02  7.272e-04    1.7
   Approximate projection
   niter:    3
   residual.bias:   -1.151e-04 -5.477e-05
   residual.first:   1.547e-04  5.481e-05    1.4
   residual.second:  5.344e-04  7.187e-05      2
   residual.infty:   2.145e-02  7.902e-04      3
   [NIWA-36410:06625] *** Process received signal ***
   [NIWA-36410:06625] Signal: Segmentation fault (11)
   [NIWA-36410:06625] Signal code: Address not mapped (1)
   [NIWA-36410:06625] Failing at address: 0x8
   [NIWA-36410:06625] [ 0] /lib64/libpthread.so.0(+0xf2d0) [0x7ffb7880b2d0]
   [NIWA-36410:06625] [ 1] /usr/lib64/libgfs3D-1.3.so.2(+0x4728e) [0x7ffb794a628e]
   [NIWA-36410:06625] [ 2] /usr/lib64/libgfs3D-1.3.so.2(+0x152a9) [0x7ffb794742a9]
   [NIWA-36410:06625] *** End of error message ***
   mpirun noticed that job rank 0 with PID 6619 on node NIWA-36410 exited on signal 15 (Terminated).
   7 additional processes aborted (not shown)

Turbine:

   step:       1 t:      0.50000000 dt:  5.000000e-01 cpu:     54.26500000 real:     57.84363200
   MAC projection        before     after       rate
   niter:    7
   residual.bias:   -7.007e-03 -5.369e-05
   residual.first:   7.016e-03  5.373e-05      2
   residual.second:  1.461e-02  7.277e-05    2.1
   residual.infty:   3.525e-02  7.272e-04    1.7
   Approximate projection
   niter:    3
   residual.bias:   -1.151e-04 -5.477e-05
   residual.first:   1.546e-04  5.481e-05    1.4
   residual.second:  5.338e-04  7.187e-05      2
   residual.infty:   2.141e-02  7.902e-04      3
   [rotor03:20387] *** Process received signal ***
   [rotor03:20387] Signal: Segmentation fault (11) 
   [rotor03:20387] Signal code: Address not mapped (1) 
   [rotor03:20387] Failing at address: 0x8 
   [rotor03:20387] [ 0] /lib64/libpthread.so.0 [0x7f64b72c0c00]
   [rotor03:20387] [ 1] /usr/lib64/libgfs3D-1.3.so.2 [0x7f64b8c0944f]
   [rotor03:20387] [ 2] /usr/lib64/libgfs3D-1.3.so.2 [0x7f64b8bdc609]
   [rotor03:20387] *** End of error message ***
   mpirun noticed that job rank 0 with PID 20381 on node rotor03 exited on signal 15 (Terminated).
   7 additional processes aborted (not shown)

The same code runs in series, I also tried with 2 and 4 cpus and it failed with the same error message (although at t=1 for 4 nodes and before writing any output for 2 nodes. It seemed to be consistent when it failed.


--- Gerris Debugging When running gerris debugging after the fact (i.e. as Stephane suggested doing the following)

% ulimit -c unlimited
% <rerun, crash should display "(core dumped)">
% ls core*
% gdb gerris3D core
gdb> where

I get:

#0  0x00007fd8198c228e in match_periodic_bc ()
   from /usr/lib64/libgfs3D-1.3.so.2
#1  0x00007fd8198902a9 in cell_traverse_boundary_leafs ()
   from /usr/lib64/libgfs3D-1.3.so.2
#2  0x00007fd8198902a9 in cell_traverse_boundary_leafs ()
   from /usr/lib64/libgfs3D-1.3.so.2
#3  0x00007fd8198902a9 in cell_traverse_boundary_leafs ()
   from /usr/lib64/libgfs3D-1.3.so.2
#4  0x00007fd8198902a9 in cell_traverse_boundary_leafs ()
   from /usr/lib64/libgfs3D-1.3.so.2
#5  0x00007fd8198902a9 in cell_traverse_boundary_leafs ()
   from /usr/lib64/libgfs3D-1.3.so.2
#6  0x00007fd8198902a9 in cell_traverse_boundary_leafs ()
   from /usr/lib64/libgfs3D-1.3.so.2
#7  0x00007fd8198c5faf in match_box_bc () from /usr/lib64/libgfs3D-1.3.so.2
#8  0x00007fd8198c8db3 in domain_foreach () from /usr/lib64/libgfs3D-1.3.so.2
#9  0x00007fd8198cde21 in gfs_domain_tag_droplets ()
   from /usr/lib64/libgfs3D-1.3.so.2
#10 0x00007fd8198ce0ae in gfs_domain_remove_droplets ()
   from /usr/lib64/libgfs3D-1.3.so.2
#11 0x00007fd8198d6b59 in gfs_remove_droplets_event ()
   from /usr/lib64/libgfs3D-1.3.so.2
#12 0x00007fd8198d2825 in gfs_event_do () from /usr/lib64/libgfs3D-1.3.so.2
#13 0x00007fd81965fcb7 in slist_container_foreach ()
   from /usr/lib64/libgts-0.7.so.5
#14 0x00007fd8198da97f in simulation_run () from /usr/lib64/libgfs3D-1.3.so.2
#15 0x00007fd8198dc2a8 in gfs_simulation_run ()
   from /usr/lib64/libgfs3D-1.3.so.2
#16 0x0000000000402d3f in main ()

When I remove the remove droplet command this seems to fix the problem

Nicholson Canyon viscous serial run

I got the following error right at the end of a serial run for slide width 2500. It did everything except output the final sim file and print the final time

 *** Process received signal ***
[NIWA-36410:06681] Signal: Floating point exception (8)
[NIWA-36410:06681] Signal code: Floating point divide-by-zero (3)
[NIWA-36410:06681] Failing at address: 0x7f4b0233b0c9
[NIWA-36410:06681] [ 0] /lib64/libpthread.so.0(+0xf2d0) [0x7f4b0ad172d0]
[NIWA-36410:06681] [ 1] /usr/lib64/gerris/libtopics3D.so(+0x60c9) [0x7f4b0233b0c9]
[NIWA-36410:06681] [ 2] /usr/lib64/libgfs3D-1.3.so.2(ftt_cell_traverse_condition+0xcc) [0x7f4b0b98318c]
[NIWA-36410:06681] [ 3] /usr/lib64/libgfs3D-1.3.so.2(ftt_cell_traverse_condition+0x126) [0x7f4b0b9831e6]
[NIWA-36410:06681] *** End of error message ***
./run_nicholson.moab: line 15:  6681 Floating point exceptiongerris3D nicholson_viscous.gfs


Pegasus Canyon viscous parallel run

#0  0x00007fb97083bab5 in raise () from /lib64/libc.so.6
#1  0x00007fb97083cf47 in abort () from /lib64/libc.so.6
#2  0x00007fb9712d5b4a in g_logv () from /lib64/libglib-2.0.so.0
#3  0x00007fb9712d5bd3 in g_log () from /lib64/libglib-2.0.so.0
#4  0x00007fb9712d3c17 in g_realloc () from /lib64/libglib-2.0.so.0
#5  0x00007fb9712a0739 in ?? () from /lib64/libglib-2.0.so.0
#6  0x00007fb9712a0ab7 in g_array_append_vals () from /lib64/libglib-2.0.so.0
#7  0x00007fb971803f52 in gfs_output_location_read () from /usr/lib64/libgfs3D-1.3.so.2
#8  0x00007fb971839f04 in simulation_read () from /usr/lib64/libgfs3D-1.3.so.2
#9  0x00007fb9715c128d in gts_graph_read () from /usr/lib64/libgts-0.7.so.5
#10 0x00007fb9718287eb in gfs_domain_read () from /usr/lib64/libgfs3D-1.3.so.2
#11 0x00007fb971837910 in gfs_simulation_read () from /usr/lib64/libgfs3D-1.3.so.2
#12 0x000000000040207c in main ()

Parallel runs of solid block scenario

I have problems with running the solid block example too. Prior to dying I often (but not always) get the following warning:

Gfs-CRITICAL **: PE 0 (NIWA-36411): gfs_line_center: assertion `a > 0. && a < 1.' failed

The error message when it fails is

[NIWA-36411:10302] *** Process received signal ***
[NIWA-36411:10302] Signal: Floating point exception (8)
[NIWA-36411:10302] Signal code: Invalid floating point operation (7)
[NIWA-36411:10302] Failing at address: 0x7ffbfa528ecd
[NIWA-36411:10303] *** Process received signal ***
[NIWA-36411:10303] Signal: Floating point exception (8)
[NIWA-36411:10303] Signal code: Invalid floating point operation (7)
[NIWA-36411:10303] Failing at address: 0x7f855b613ecd
[NIWA-36411:10302] [ 0] /lib64/libpthread.so.0(+0xf2d0) [0x7ffbf9ae72d0]
[NIWA-36411:10302] [ 1] /usr/lib64/libgts-0.7.so.5(triBoxOverlap+0xfd) [0x7ffbfa528ecd]
[NIWA-36411:10302] [ 2] /usr/lib64/libgts-0.7.so.5(gts_bbox_overlaps_triangle+0x122) [0x7ffbfa4f0452]
[NIWA-36411:10302] [ 3] /usr/lib64/libgfs3D-1.3.so.2(+0x8e5b2) [0x7ffbfa7c95b2]
[NIWA-36411:10302] [ 4] /lib64/libglib-2.0.so.0(g_hash_table_foreach+0x43) [0x7ffbfa21e0d3]
[NIWA-36411:10302] [ 5] /usr/lib64/libgts-0.7.so.5(gts_surface_foreach_face+0x35) [0x7ffbfa507215]
[NIWA-36411:10302] [ 6] /usr/lib64/libgfs3D-1.3.so.2(+0x9068e) [0x7ffbfa7cb68e]
[NIWA-36411:10302] [ 7] /usr/lib64/libgfs3D-1.3.so.2(+0x8f9a7) [0x7ffbfa7ca9a7]
[NIWA-36411:10302] [ 8] /usr/lib64/libgfs3D-1.3.so.2(gfs_cell_traverse_cut+0x1f) [0x7ffbfa7cac4f]
[NIWA-36411:10302] [ 9] /usr/lib64/libgfs3D-1.3.so.2(+0x4ddb3) [0x7ffbfa788db3]
[NIWA-36411:10302] [10] /usr/lib64/libgfs3D-1.3.so.2(gfs_domain_traverse_cut+0x38) [0x7ffbfa789728]
[NIWA-36411:10302] [11] /usr/lib64/libgfs3D-1.3.so.2(+0x9cb5b) [0x7ffbfa7d7b5b]
[NIWA-36411:10302] [12] /usr/lib64/libgfs3D-1.3.so.2(gfs_simulation_run+0x58) [0x7ffbfa79c2a8]
[NIWA-36411:10302] [13] gerris3D(main+0xecf) [0x402d3f]
[NIWA-36411:10302] [14] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7ffbf9789bfd]
[NIWA-36411:10302] [15] gerris3D() [0x401d99]
[NIWA-36411:10302] *** End of error message ***
[NIWA-36411:10303] [ 0] /lib64/libpthread.so.0(+0xf2d0) [0x7f855abd22d0]
[NIWA-36411:10303] [ 1] /usr/lib64/libgts-0.7.so.5(triBoxOverlap+0xfd) [0x7f855b613ecd]
[NIWA-36411:10303] [ 2] /usr/lib64/libgts-0.7.so.5(gts_bbox_overlaps_triangle+0x122) [0x7f855b5db452]
[NIWA-36411:10303] [ 3] /usr/lib64/libgfs3D-1.3.so.2(+0x8e5b2) [0x7f855b8b45b2]
[NIWA-36411:10303] [ 4] /lib64/libglib-2.0.so.0(g_hash_table_foreach+0x43) [0x7f855b3090d3]
[NIWA-36411:10303] [ 5] /usr/lib64/libgts-0.7.so.5(gts_surface_foreach_face+0x35) [0x7f855b5f2215]
[NIWA-36411:10303] [ 6] /usr/lib64/libgfs3D-1.3.so.2(+0x9068e) [0x7f855b8b668e]
[NIWA-36411:10303] [ 7] /usr/lib64/libgfs3D-1.3.so.2(+0x8f9a7) [0x7f855b8b59a7]
[NIWA-36411:10303] [ 8] /usr/lib64/libgfs3D-1.3.so.2(gfs_cell_traverse_cut+0x1f) [0x7f855b8b5c4f]
[NIWA-36411:10303] [ 9] /usr/lib64/libgfs3D-1.3.so.2(+0x4ddb3) [0x7f855b873db3]
[NIWA-36411:10303] [10] /usr/lib64/libgfs3D-1.3.so.2(gfs_domain_traverse_cut+0x38) [0x7f855b874728]
[NIWA-36411:10303] [11] /usr/lib64/libgfs3D-1.3.so.2(+0x9cb5b) [0x7f855b8c2b5b]
[NIWA-36411:10303] [12] /usr/lib64/libgfs3D-1.3.so.2(gfs_simulation_run+0x58) [0x7f855b8872a8]
[NIWA-36411:10303] [13] gerris3D(main+0xecf) [0x402d3f]
[NIWA-36411:10303] [14] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f855a874bfd]
[NIWA-36411:10303] [15] gerris3D() [0x401d99]
[NIWA-36411:10303] *** End of error message ***
mpirun noticed that job rank 0 with PID 10302 on node NIWA-36411 exited on signal 8 (Floating point exception).
1 additional process aborted (not shown)
core


The following was generated using the debugging suggestion above. This was running on a 8 core modelling computer. Looking at the core dump using gdb I get:

#0  0x00007f330388fecd in triBoxOverlap () from /usr/lib64/libgts-0.7.so.5
#1  0x00007f3303857452 in gts_bbox_overlaps_triangle () from /usr/lib64/libgts-0.7.so.5
#2  0x00007f3303b305b2 in face_overlaps_box () from /usr/lib64/libgfs3D-1.3.so.2
#3  0x00007f33035850d3 in g_hash_table_foreach () from /lib64/libglib-2.0.so.0
#4  0x00007f330386e215 in gts_surface_foreach_face () from /usr/lib64/libgts-0.7.so.5
#5  0x00007f3303b3268e in cell_is_cut () from /usr/lib64/libgfs3D-1.3.so.2
#6  0x00007f3303b319a7 in cell_traverse_cut () from /usr/lib64/libgfs3D-1.3.so.2
#7  0x00007f3303b31c4f in gfs_cell_traverse_cut () from /usr/lib64/libgfs3D-1.3.so.2
#8  0x00007f3303aefdb3 in domain_foreach () from /usr/lib64/libgfs3D-1.3.so.2
#9  0x00007f3303af0728 in gfs_domain_traverse_cut () from /usr/lib64/libgfs3D-1.3.so.2
#10 0x00007f3303b3eb5b in simulation_moving_run () from /usr/lib64/libgfs3D-1.3.so.2
#11 0x00007f3303b032a8 in gfs_simulation_run () from /usr/lib64/libgfs3D-1.3.so.2
#12 0x0000000000402d3f in main ()


Actually If I do a serial run of the simple version I get the following error message and debug info

#0  0x00007fe13609c0c9 in print_interface () from /usr/lib64/gerris/libtopics3D.so
#1  0x00007fe13dac518c in ftt_cell_traverse_condition () from /usr/lib64/libgfs3D-1.3.so.2
#2  0x00007fe13dac51e6 in ftt_cell_traverse_condition () from /usr/lib64/libgfs3D-1.3.so.2
#3  0x00007fe13dac51e6 in ftt_cell_traverse_condition () from /usr/lib64/libgfs3D-1.3.so.2
#4  0x00007fe13dac51e6 in ftt_cell_traverse_condition () from /usr/lib64/libgfs3D-1.3.so.2
#5  0x00007fe13daf56d7 in box_traverse_condition () from /usr/lib64/libgfs3D-1.3.so.2
#6  0x00007fe13dafadb3 in domain_foreach () from /usr/lib64/libgfs3D-1.3.so.2
#7  0x00007fe13dafb5fc in gfs_domain_cell_traverse_condition () from /usr/lib64/libgfs3D-1.3.so.2
#8  0x00007fe13609b9fd in gfs_output_interface_grid_event () from /usr/lib64/gerris/libtopics3D.so
#9  0x00007fe13db04825 in gfs_event_do () from /usr/lib64/libgfs3D-1.3.so.2
#10 0x00007fe13d891cb7 in slist_container_foreach () from /usr/lib64/libgts-0.7.so.5
#11 0x00007fe13db4999b in simulation_moving_run () from /usr/lib64/libgfs3D-1.3.so.2
#12 0x00007fe13db0e2a8 in gfs_simulation_run () from /usr/lib64/libgfs3D-1.3.so.2
#13 0x0000000000402d3f in main ()
Personal tools
communication