Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIX] Fix hangs during testing #137967

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mustartt
Copy link
Contributor

@mustartt mustartt commented Mar 3, 2025

Fixes all current test hangs experienced during CI runs.

  1. ipv6 link-local (the loopback device) gets assigned an automatic zone id of 0, causing the assert to fail and hang in library/std/src/net/udp/tests.rs
  2. Const alloc does not fail gracefully
  3. Debuginfo test has problem with gdb auto load safe path

@rustbot
Copy link
Collaborator

rustbot commented Mar 3, 2025

r? @ChrisDenton

rustbot has assigned @ChrisDenton.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 3, 2025
Copy link
Contributor

@daltenty daltenty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from the AIX perspective. These test cases hang the test run indefinitely at the moment, so this unblocks regular runs.

@@ -2,6 +2,7 @@
// on 32bit and 16bit platforms it is plausible that the maximum allocation size will succeed
// FIXME (#135952) In some cases on AArch64 Linux the diagnostic does not trigger
//@ ignore-aarch64-unknown-linux-gnu
//@ ignore-aix: FIXME(#137966)
Copy link
Member

@workingjubilee workingjubilee Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the system behaves badly on large allocations, then there is nothing to fix here.

Suggested change
//@ ignore-aix: FIXME(#137966)
//@ ignore-aix: alloc failure on AIX can result in SIGKILL instead of nullptr

@workingjubilee
Copy link
Member

@daltenty Do you have any idea why it sometimes hangs and sometimes SIGKILLs?

@mustartt
Copy link
Contributor Author

mustartt commented Mar 4, 2025

@daltenty Do you have any idea why it sometimes hangs and sometimes SIGKILLs?

It not exactly an "hang". mmap and zero initializing the mapped region takes quite a while on our dev machines which either times out our CI or get SIGKILL'd after a very long time.

@workingjubilee
Copy link
Member

...Is the problem that you literally have 128TiB of RAM?

@workingjubilee
Copy link
Member

workingjubilee commented Mar 4, 2025

Hm, wait... laziness in paging due to overcommit, resulting in the system accepting an allocation that can't possibly be respected if called but assuming that no one will actually call that bluff?

@@ -2,6 +2,9 @@
// on 32bit and 16bit platforms it is plausible that the maximum allocation size will succeed
// FIXME (#135952) In some cases on AArch64 Linux the diagnostic does not trigger
//@ ignore-aarch64-unknown-linux-gnu
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing
// AIX will allow the allocation to go through, and get SIGKILL when zero initializing

@@ -2,6 +2,9 @@
// on 32bit and 16bit platforms it is plausible that the maximum allocation size will succeed
// FIXME (#135952) In some cases on AArch64 Linux the diagnostic does not trigger
//@ ignore-aarch64-unknown-linux-gnu
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing
// the overcommited page.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the overcommited page.
// the overcommitted page.

Comment on lines +8 to +9
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing
// the overcommited page.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing
// the overcommited page.
// AIX will allow the allocation to go through, and get SIGKILL when zero initializing
// the overcommitted page.

@workingjubilee
Copy link
Member

address nits, squash, and then r=me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants