AWS ParallelCluster - Join AD not working seemlessly for Ubuntu

0

We are trying to join our Microsoft AD defined using AWS directory service to our parallel cluster headnode using the config file but it's not join seemlessly.

We fill the option as below to the pcluster config file :

DirectoryService:

DomainName: AD_DOMAIN_NAME

DomainAddr: ldap://IP_1_AD,ldap://IP_2_AD

PasswordSecretArn: ARN_SECRET_PLAIN_TEXT

DomainReadOnlyUser: cn=ReadOnlyUser,ou=Users,ou=AWS-JIMMY,dc=CORP,dc=EXAMPLE,dc=COM

AdditionalSssdConfigs:

ldap_auth_disable_tls_never_use_in_production: True

According to the documentation of pcluster this should be enough to join the AD but when I run the "realm discover AD_DOMAIN_NAME" command I get an error

Are we missing something to be set in the config file ? Is there some command to be run post creation ? I saw nothing required in the pcluster documentation.

Best regards

  • Our headnode OS is ubuntu

    The ping to IP_1_AD and IP_2_AD are working properly.

    We are guessing something wrong is going regarding the DNS resolution

5 Answers
3
Accepted Answer

Hi

If you suspect a DNS resolution issue, here are steps to help diagnose and troubleshoot it:

Ensure that your head node is using the correct DNS servers. Your /etc/resolv.conf file should contain the IP addresses of your AD DNS servers (IP_1_AD and IP_2_AD). It should look something like this:

nameserver IP_1_AD
nameserver IP_2_AD

If these entries aren't present or if other DNS servers are listed, update it to reflect the AD DNS servers.

Test DNS:

nslookup AD_DOMAIN_NAME

CheckSSSD Make sure your SSSD configuration file /etc/sssd/sssd.conf has the correct domain and DNS settings. You might want to set:

dns_discovery_domain = AD_DOMAIN_NAME

Also, ensure that the SSSD service is running:

Check for Firewalls and Security Groups:

Ensure that your security groups, NACLs, and any OS-level firewalls (e.g., ufw on Ubuntu) allow communication on DNS port 53 between the head node and your AD servers.

profile picture
EXPERT
answered 3 months ago
profile picture
EXPERT
reviewed 3 months ago
EXPERT
reviewed 3 months ago
  • So I modified the /etc/resolv.conf file add the IPs of my AD, and I managed to join my AD using the "sudo realm join -U AD_DOMAIN_NAME"

    A nice step solved, now we are facing problem to establish "sudo login" with some AD_USER

    Is there a need to modify any other file or to restart some linux service maybe ?

  • Now, if you're facing issues with logging in using an AD user, here’s a step-by-step guide to ensure a smooth login process:

    Check /etc/sssd/sssd.conf:

    Make sure your SSSD configuration file is correctly set up. Here’s a sample configuration:

    [sssd] services = nss, pam config_file_version = 2 domains = AD_DOMAIN_NAME

    [domain/AD_DOMAIN_NAME] id_provider = ad auth_provider = ad access_provider = ad ldap_id_mapping = True cache_credentials = True

    Optionally specify the ad_hostname if needed

    #ad_hostname = headnode.example.com

    override_homedir = /home/%u default_shell = /bin/bash

    Restart the SSSD and Other Services:

    After modifying /etc/sssd/sssd.conf, restart the following services to apply the changes:

    sudo systemctl restart sssd sudo systemctl restart realmd sudo systemctl restart nscd # Name Service Cache Daemon (optional)

    Modify PAM Configuration (Pluggable Authentication Modules):

    Ensure that your PAM configuration files allow AD logins:

    Edit /etc/pam.d/common-auth and ensure the following line is present:

    auth [success=1 default=ignore] pam_sss.so use_first_pass

    session required pam_mkhomedir.so skel=/etc/skel/ umask=0077

    Allow SSH Access:

    If you’re trying to SSH into the head node as the AD user, make sure the user is not restricted:

    Edit /etc/ssh/sshd_config and ensure UsePAM yes is set.

    AllowGroups domain^users

  • DomainReadOnlyUser: cn=ReadOnlyUser,ou=Users,ou=AWS-JIMMY,dc=CORP,dc=EXAMPLE,dc=COM ==> should we adapt the ReadOnlyUser to the name of the group we define in our AD forest ?

    I tried what you proposed but without success so far.

0

Recipe: aws-parallelcluster-environment::finalize_directory_service

  • execute[Fetch user data from remote directory service] action run[2024-09-23T06:40:14+00:00] INFO: Processing execute[Fetch user data from remote directory service] action run (aws-parallelcluster-environment::finalize_directory_service line 24) [2024-09-23T06:40:14+00:00] ERROR: execute[Fetch user data from remote directory service] (aws-parallelcluster-environment::finalize_directory_service line 24) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '2' ---- Begin output of sudo -u ubuntu getent passwd ReadOnlyUser ---- STDOUT: STDERR: ---- End output of sudo -u ubuntu getent passwd ReadOnlyUser ---- Ran sudo -u ubuntu getent passwd ReadOnlyUser returned 2; ignore_failure is set, continuing

Some error i found when digging into the /var/log/chef-client.log file

answered 3 months ago
0

After doing all these steps, I still have the error "Permissions Denied" when trying to connect via SSH to the Cluster's instance. Here is are the error's lines in auth.log about this attempts :

Sep 23 13:50:10 aws sshd[1186]: Connection reset by authenticating user user@domain IP_src port 24058 [preauth]
Sep 23 13:50:30 aws sshd[1189]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=IP_SRC  user=user@domain
Sep 23 13:50:30 aws sshd[1189]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=IP_src user=user@domain
Sep 23 13:50:30 aws sshd[1189]: pam_sss(sshd:auth): received for user user@domain: 4 (System error)
Sep 23 13:50:32 aws sshd[1189]: Failed password for user@domain from IP_SRC port 62867 ssh2
Sep 23 13:50:34 aws sshd[1189]: Connection reset by authenticating user user@domain IP_SRC port 62867 [preauth]

When I run the command

sudo getent passwd user

it gives this result :

user:*:1338001636:1338000513:Admin User:/home/user@domain:/bin/bash
answered 3 months ago
0

After trying to do this command, it appears that I can connect as an AD user :

sudo su <AD_user>

But when we remove "sudo" and we enter the AD user's password, it doesn't work. How can we assure the password synchronisation between the AD and Linux ?

answered 3 months ago
0

Information that could help solving this issue : when we have run

sudo apt install krb5-user

It does give us this result :

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 krb5-user : Depends: libkrb5-3 (= 1.17-6ubuntu4.7) but 1.19.2-2ubuntu0.4 is to be installed
E: Unable to correct problems, you have held broken packages.

Because the authentication seems not working with our cluster and krb5-user seems to manage it, do you have an answer to update it with this ?

answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions