Skip to content

fix pgbinary window bug#954

Merged
VincentVanlaer merged 2 commits intomainfrom
EbF/fix_pgbinary_window_bug
Apr 18, 2026
Merged

fix pgbinary window bug#954
VincentVanlaer merged 2 commits intomainfrom
EbF/fix_pgbinary_window_bug

Conversation

@Debraheem
Copy link
Copy Markdown
Member

This branch attempts to address some pgbinary bugs raised by @mathren. @mathren can you document here how to reproduce your initial issue.

@VincentVanlaer
Copy link
Copy Markdown
Member

Either this is a placebo and not actually fixing anything, or I am seeing a very different issue (although I have the exact same stack trace as was shown on slack). The problem I am seeing is

  • Initial period is set to 1e99
  • History therefore contains 1e99
  • pgbinary reads the history but uses single precision floats: 1e99 gets casted to INF
  • Plot ylimit calculations break resulting in NaN y limits
  • Deep down in the pgplot code (the NaN just propagates), it casts the limits to integers
  • On my system (or any other Intel based system), the result of the cast is a large negative number.
  • This is used to index an array of pixels, resulting in the segfault.

This patch does not fix that for me (it seems unlikely that it could). The following does however (this just guards the limit calculations against bad numbers):

diff --git a/star/private/pgstar_support.f90 b/star/private/pgstar_support.f90
index 232dada2b..48544ec06 100644
--- a/star/private/pgstar_support.f90
+++ b/star/private/pgstar_support.f90
@@ -854,17 +854,23 @@ contains
       if (use_given_ymin) then
          ymin = given_ymin
       else
-         ymin = minval(yvec(1:npts))
+         ymin = minval(yvec(1:npts), mask=.not. is_bad(yvec(1:npts)))
       end if

       use_given_ymax = abs(given_ymax + 101.0) > 1e-6
       if (use_given_ymax) then
          ymax = given_ymax
       else
-         ymax = maxval(yvec(1:npts))
+         ymax = maxval(yvec(1:npts), mask=.not. is_bad(yvec(1:npts)))
       end if
       dy = ymax - ymin

+      if (is_bad(dy)) then
+         ymax = given_ymax
+         ymin = given_ymin
+         dy = ymax - ymin
+      end if
+
       if (.not. use_given_ymin) ymin = ymin - ymargin * dy
       if (.not. use_given_ymax) ymax = ymax + ymargin * dy

@mathren
Copy link
Copy Markdown
Contributor

mathren commented Apr 2, 2026

Oh good catch @VincentVanlaer! I tried it with the default $MESA_DIR/binary/work without changing the period, as I didn't think the value was important! Maybe in my original report I was hitting two problems at once? (Although I was also running on AMD processors, not intel, if that matters)

@VincentVanlaer
Copy link
Copy Markdown
Member

Do you mean you only tested Eb's fix with the default inlists, or did you hit the same problem with the default inlists?

@mathren
Copy link
Copy Markdown
Contributor

mathren commented Apr 2, 2026

Original issue (found in MESA 24.08.1):

I'm encountering a weird pgbinary behavior: with Grid1_win_flag = .true. in inlist_pgbinary everything is fine, and I make a binary of period P=1d99 (this was to get single stars in a quick and dirty way with a specific setup). If I turn that flag to .false. I get a segfault from pgbinary. It appears that having pgbinary_flag = .true. in binary job without and pgbinary window and a large period causes the segfault.

Setting pgbinary_flag to .false. causes no problems

I originally thought it was just some interplay between the flags, unrelated to the choice of period -- this seems now incorrect.

@mathren
Copy link
Copy Markdown
Contributor

mathren commented Apr 2, 2026

Do you mean you only tested Eb's fix with the default inlists, or did you hit the same problem with the default inlists?

I only tested Eb's fix with the default and found no issues (once a pgbinary is provided in the inlist_project, with no pgbinary namelist I also hit a segfault once I turned on pgbinary_flag in binary_job).

@Debraheem
Copy link
Copy Markdown
Member Author

My fix is just a patch/guard for bad numbers, not the real solution. I believe Vincent has identified the actual source of bad numbers. I can't test though, i can't reproduce on arm.

@VincentVanlaer
Copy link
Copy Markdown
Member

@Debraheem Did your patch fix this issue for you? Cause that's what I am still somewhat confused about, since it doesn't seem that it can fix this issue.

@VincentVanlaer
Copy link
Copy Markdown
Member

Do you mean you only tested Eb's fix with the default inlists, or did you hit the same problem with the default inlists?

I only tested Eb's fix with the default and found no issues (once a pgbinary is provided in the inlist_project, with no pgbinary namelist I also hit a segfault once I turned on pgbinary_flag in binary_job).

Do you have a workdir that reproduces this? I'm trying to get this to happen, but I don't know what exactly your setup is.

@Debraheem
Copy link
Copy Markdown
Member Author

Debraheem commented Apr 3, 2026

Mathieu's original backtrace and files attached
home.zip

Backtrace for this error:

#0  0x7f279f0237a2 in ???
#1  0x7f279f022935 in ???
#2  0x7f279ee5a04f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x7f279ef714bb in __memset_avx2_unaligned_erms
	at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:367
#4  0x7f279f94d6f2 in ???
#5  0x7f279f94de3b in ???
#6  0x7f279f904629 in ???
#7  0x7f279f9074c3 in ???
#8  0x7f279f9076cc in ???
#9  0x7f279f9071af in ???
#10  0x7f279f907742 in ???
#11  0x7f279f93973a in ???
#12  0x442a37 in __pgbinary_support_MOD_show_box_pgbinary
	at ../private/pgbinary_support.f90:406
#13  0x488f49 in __pgbinary_history_panels_MOD_do_history_panels_plot
	at ../private/pgbinary_history_panels.f90:839
[home.zip](https://github.com/user-attachments/files/26450666/home.zip)

#14  0x48cafe in __pgbinary_history_panels_MOD_do_history_panels1_plot
	at ../private/pgbinary_history_panels.f90:94
#15  0x4918e9 in __pgbinary_grid_MOD_grid_plot
	at ../private/pgbinary_grid.f90:421
#16  0x494415 in __pgbinary_grid_MOD_grid1_plot
	at ../private/pgbinary_grid.f90:61
#17  0x42c177 in __pgbinary_MOD_onscreen_plots
	at /home/mrenzo/Documents/Research/codes/mesa/mesa-24.08.1/binary/make/pgbinary.f90:878
#18  0x42c37d in __pgbinary_MOD_do_pgbinary_plots
	at /home/mrenzo/Documents/Research/codes/mesa/mesa-24.08.1/binary/make/pgbinary.f90:764
#19  0x42efb6 in __pgbinary_MOD_update_pgbinary_plots
	at /home/mrenzo/Documents/Research/codes/mesa/mesa-24.08.1/binary/make/pgbinary.f90:85
#20  0x42a90a in __run_binary_support_MOD_do_run1_binary
	at ../private/run_binary_support.f90:712
#21  0x40ac4c in __binary_lib_MOD_run1_binary
	at ../public/binary_lib.f90:72
#22  0x40a59c in __run_binary_MOD_do_run_binary
	at /home/mrenzo/Documents/Research/codes/mesa/mesa-24.08.1/binary/job/run_binary.f90:7
#23  0x40a5b8 in binary_run
	at ../src/binary_run.f90:4
#24  0x40a5ef in main
	at ../src/binary_run.f90:2
./rn: line 8: 1522597 Segmentation fault      ./binary
DATE: 2026-03-25
TIME: 18:53:24

@VincentVanlaer
Copy link
Copy Markdown
Member

Hashed it out with @mathren over slack. The reproducer for the pgbinary namelist missing issue is just to add pgbinary_flag = .true. to inlist_project in the default workdir (I was still looking at the modified one, hence my confusion). The reproducer from Eb above is for a second issue which I described in #954 (comment)

@mathren
Copy link
Copy Markdown
Contributor

mathren commented Apr 3, 2026

Just for reference, to get the segfault issue because of the lack of a pgbinary namelist, take $MESA_DIR/binary/work and add in binary_job the line pgbinary_flag=.true. and erase the empty pgbinary namelist that is now present. No other change needed. After Eb's fix, any content of pgbinary namelist would work provided the namelist is there (empty, with a *_win_flag=.true. and .false., all cases with the default finite values of the binary properties worked).

So it appears there may be two problems at once:

  • lack of pgbinary namelist when pgbinary_flag = .true. in binary_job
  • 1d99 turning into NaN and propagating when doing crazy things with the period.

@VincentVanlaer
Copy link
Copy Markdown
Member

I think it is not a segfault for the lack of pgbinary, but it shows a backtrace nonetheless as that is the default when mesa_error is called. I get


Failed while trying to read pgbinary namelist file: inlist_project
Perhaps the following runtime error message will help you find the problem.

At line 1416 of file private/pgbinary_ctrls_io.f90
Fortran runtime error: End of file

Error termination. Backtrace:
#0  0x7f801f02b655 in ???
#1  0x7f801f02c219 in ???
#2  0x7f801f02cd8f in ???
#3  0x7f801f2e8a80 in ???
#4  0x7f801f2e9d5b in ???
#5  0x7f801f2ec9d4 in ???
#6  0x7f801f2ecc73 in ???
#7  0x557cbfae5aad in __pgbinary_ctrls_io_MOD_read_pgbinary_file
	at private/pgbinary_ctrls_io.f90:1416
#8  0x557cbfad680a in __pgbinary_ctrls_io_MOD_read_pgbinary_file
	at private/pgbinary_ctrls_io.f90:1432
#9  0x557cbfa9590e in __pgbinary_MOD_do_read_pgbinary_controls
	at private/pgbinary_full.f90:159
#10  0x557cbfa8fcdc in __run_binary_support_MOD_do_run1_binary
	at private/run_binary_support.f90:714
#11  0x557cbfa6f3e4 in __run_binary_MOD_do_run_binary
	at /home/vincentva/software/mesa/dev/binary/job/run_binary.f90:25
#12  0x557cbfa6ed35 in binary_run
	at src/binary_run.f90:7
#13  0x557cbfa6ed35 in main
	at src/binary_run.f90:3
make: *** [Makefile:18: run] Error 2

@VincentVanlaer VincentVanlaer force-pushed the EbF/fix_pgbinary_window_bug branch 2 times, most recently from 186321d to 2d782a0 Compare April 16, 2026 22:34
@VincentVanlaer
Copy link
Copy Markdown
Member

@Debraheem I have replaced your changes with mine. While it won't be needed with the upcoming refactor of the inlist readers, I have kept @mathren's changes, since people might wonder what pgbinary is doing in the inlist and learn a thing or two ;)

Since the changes are separate, it would be best I think to merge this without squashing

@Debraheem
Copy link
Copy Markdown
Member Author

I can't remember if it was written rule or just an informal one, but i think we do not squash commits being merged to main. I don't know if there is a specific reason, but it's something we can discuss later.

@VincentVanlaer
Copy link
Copy Markdown
Member

It can be useful if there is a bunch of small iterations on the same thing, with potentially broken commits in between,

@Debraheem
Copy link
Copy Markdown
Member Author

Scratch what i said, here are the guidelines https://docs.mesastar.org/en/latest/developing/contributing.html#merging-a-pull-request.

@VincentVanlaer
Copy link
Copy Markdown
Member

Yeah that makes sense to me

@Debraheem
Copy link
Copy Markdown
Member Author

can you run one test before we merge this?

@VincentVanlaer
Copy link
Copy Markdown
Member

Do you mean one of the test cases, the entire test suite, or just that this bug is gone? I did test with the 1e99 orbit and that worked fine.

@Debraheem
Copy link
Copy Markdown
Member Author

Debraheem commented Apr 17, 2026

I meant the test_suite, does not need to be an optional test.

mathren and others added 2 commits April 18, 2026 10:22
If copying the template and setting `pgbinary_flag=.true.` in binary_job
will error and stop if there isn't a pgbinary namelist in inlist_project
@VincentVanlaer VincentVanlaer force-pushed the EbF/fix_pgbinary_window_bug branch from 2d782a0 to 0767959 Compare April 18, 2026 08:23
@VincentVanlaer
Copy link
Copy Markdown
Member

Test suite has passed: https://testhub.mesastar.org/main/commits/2d782a0

I have force pushed in the meantime, but that only contains changes to the add empty pgbinary namelist commit to undo accidentally removed indentation.

@VincentVanlaer
Copy link
Copy Markdown
Member

See #970 for the follow-up

@VincentVanlaer VincentVanlaer merged commit d2e43a4 into main Apr 18, 2026
8 of 9 checks passed
@VincentVanlaer VincentVanlaer deleted the EbF/fix_pgbinary_window_bug branch April 18, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants