Cluster 3 has been repurposed at the CCAST Collaboration Cluster (aka "Framling") and supports NDSU participation in OSG, CCC and XSEDE. There is no local access to this machine other than through the mechanisms of each of the collaborations. NDSU researchers benefit at an extra level over non-NDSU researchers on these resources, but there is no guarantee of pre-emption or queue order.
The Collaboration Cluster is built up of 127 nodes with each node consisting 8 processor cores. Each of these nodes consist of 2.66 GHz Intel Xeon X5550 processors and 48 GB of memory. These nodes are connected to each other with 1Gbps ethernet cables. The total aggregate memory of this cluster is approximately 6.1TB whereas the theoretical peak performance is approximately 10.9 Tflops. The Collaboration Cluster is ideal for multi node parallel jobs.
|Route Queue||Route Order||Execution Queue||Available Nodes per Queue||Processors Cores||Max Walltime||Max Nodes per Job||Max Processors per Job||Notes|
|default||1||short||127||1016||8 hours||127||1016||Only non-interactive|
jobs will be accepted. Use route queue to route as ordered.
|devel||4||32||4 hours||4||32||Interactive jobs allowed on this queue|
For more information, please see CCAST User Policies under Queue Policies.
The Environment Modules package provides an easy way for the dynamic modification of a user’s environment. It enables users to use multiple versions of different software. Each of these modulefiles contain the information needed to configure the shell environment for an application. Once the modules package is initialized, the environment can be modified on a per-module basis using the module command which interprets modulefiles. Typically modulefiles instruct the module command to alter or set shell environment variables such as PATH, MANPATH, etc. These modulefiles can be shared by many users in a system.
Modules can be "loaded" and "unloaded" dynamically and atomically. All popular shells are supported, including bash, sh, ksh, zsh, csh, tcsh as well as some scripting languages such as perl and python.
Please enter the following commands in a terminal window.
To check the available modules,
To load a module
|$> module load gcc-4.5.3|
(Replace 'gcc-4.5.3' with the desired module name)
To load a different version of the module
|$> module swap gcc-4.6.2|
To list the currently loaded modules
|$> module list|
To unload a module
|$> module unload gcc-4.6.2|
To unload all loaded modules
|$> module purge|
Cluster 3 FAQs
How do I log into a CCAST Cluster?
There are currently four clusters at CCAST. The following host names can be used:
How to login from a Windows Computer
PuTTY SSH client should be used to access any CCAST cluster from a Windows computer. Once PuTTY is installed, double click the application and enter the hostname to access the cluster. Below is an example of how to log into the Thunder cluster:
How to login from Apple/Linux computer
Open the terminal window and then execute following line from the terminal window to access any CCAST cluster.
NOTE: “username” is user’s login name and the host name can be changed to the cluster's name. Below is an example of how to log into the Thunder cluster.
Unlike on a typical desktop, the user cannot execute their application by logging into the compute node directly or running the application on the login nodes. The user can edit files and submit their application to the batch queue system from the login nodes.
How can I transfer data to a CCAST Cluster?
There are currently four clusters at CCAST. The following host names can be used:
Transfering between a Windows computer and a CCAST cluster
WinSCP client should be used to transfer data between your Windows computer and any CCAST cluster. WinSCP can be downloaded from here for free.
Once your download is complete, open the application and fill in the fields as seen in the screenshot below.
Any of the host names from the above list can be substituted for the host name field.
Please enter your username and password in the appropriate fields and select 'SCP' as the File protocol and click login.
Once you are logged in you should be able to see the following screen:
Now you can drag and drop your files between your computer and any CCAST cluster easily.
Transfering between an Apple/Linux computer and a CCAST cluster
You can use the scp command to transfer files between your computer and any CCAST cluster. Enter the following commands in a terminal on your own computer.
To transfer files from CCAST cluster to your computer:
|$> scp [[username@hostname]:[source-file]] [[destination]]|
For example, you can transfer files from your Cluster3 to your computer as shown below:
|$> scp firstname.lastname@example.org:/home/your_username/myfile.txt /home/mycomputer/myfile.txt|
To transfer files from your computer to CCAST cluster:
|$> scp [[source-file]] [[username@hostname]:[[destination]]|
For example, you can transfer files from your computer to the Thunder cluster as shown below:
|$> scp myfile.txt email@example.com:/home/your_username|
Any of the host names from the above list can be substituted for the fully qualified domain name field.
How do I work with Batch Queue Systems (PBS/Torque)?
Some useful commands in the batch queue system:
Here at CCAST, we use TORQUE, which is an open source implemtation of PBS. The batch system or resource manager divides up larger computer systems so that multiple users and their jobs can be run on them. Another part of the batch system is the scheduler which decides where and when a job will run. A user requests resources from the resource manager in terms of time, the number of compute nodes, and the resources on those nodes which may be RAM, network, disk, software licenses, etc. Once a job is submitted to the batch system, then the scheduler decides where and when that job will run.
Below yoiu will find information on:
- A Sample PBS batch script
- Environment variables available to your script
- Batch system commands
Sample batch script
The best way to demonstrate using the batch system is with an input script example. Below is a sample script that can be modifed to run a job, and there are more application specific scripts found on the interative nodes under /usr/local/PBS_EXAMPLES. More information can be found in the manual page for qsub by typing man qsub. The '#PBS' directives are significant. They are not comments, but special directives to the batch system, but ignored by the scripting language. To submit a job, simply type qsub $SCRIPTNAME.
Some environment variables are provided by the batch system to your script. Some of the more meaningful ones are.
|PBS_JOBID||This is the unique job identifier of your job. This can be used to create scratch directories or other unique identifiers that are specific to your job.|
|PBS_O_WORKDIR||This is the working directory where you originally submitted your job from. It is common to use 'cd $PBS_O_WORKDIR' or to access input/output files via $PBS_O_WORKDIR|
|PBS_JOBNAME||This is the name specified with the -N option when submitting your script|
|qstat||qstat||Display status of entire jobs in batch system.|
|qstat –f JOBID||Display detailed information about a specific job.|
|qsub||qsub submit_script||Submit a job to the batch system.|
|qdel||qdel JOBID||Terminate specific job.|
|qalter||qalter –N nickname JOBID||Change nickname of specific job.|
|qmove||qmove queue JOBID||Move specific job to another queue.|
How to start using Materials Studio?
You have to contact the CCAST support staff if you need to start using Materials Studio. We would be glad to come down to your workplace to install it for you.
How to run interactive jobs in the cluster?
Please use the command below to run interactive jobs in Cluster 2 and Cluster 3:
qsub -I -l nodes=1:ppn=8 -l walltime=4:00:00 -q devel
Please use the command below to run interactive jobs in Thunder:
qsub -I -l nodes=1:ppn=8 -l walltime=4:00:00 -q def-devel
Note: Please limit your interactive jobs to maximum of 2 hours.
What kind of jobs should I run on Cluster 3?
Cluster 3 is ideal for multiple node largly parallel jobs.
I get “warning:regcache incompatible with malloc” when I’m running a job in Cluster 3. How to fix?
Add MX_RCACHE=0 option to the run command. For example,
mpirun --mca mpi_paffinity_alone 1 -np $NUM_PROC -machinefile $PBS_NODEFILE -x MX_RCACHE=0 --mca pml cm /path/to/executable
Can I access individual compute nodes?
Yes you can access individual nodes if you have a job running in that particular node. Suppose you have a job running in cluster3-24 and cluster3-25, you can login to these nodes by typing
ssh cluster3-xx (xx replaces the node number)
NOTE: Access is not allowed unless you have a job already running on the node through the batch system.
My rsync request times out before the process is complete?
When you issue a rsync command it runs on the cluster head node. The cluster head node has a 30 minute time limitation to run commands. Therefore if you are trying to rsync a large amount of data it would time out after 30 minutes. The best option is to create multiple directories and rsync them seperately one by one. The other option is to remove 'z' option from the command. This would save the time used to compressing,
rsync -av [SOURCE] [DESTINATION]
When using rsync be extremely careful about the trailing slash. for example look at the following two rsync commands,
#1: rsync -av /some/path/a/ /some/otherpath/ #2: rsync -av /some/path/a /some/otherpath/
The first command will make /some/otherpath/ mirror the content of /some/path/a/ whereas the second command would create a directory inside /some/otherpath/ to mirror the content of /some/path/a. Therefore be extremly mindful about what you want.
How do I run openmpi over a different interface?
You can run openmpi over a different interface by changing the parameters of option --mca in the run command. The default usually is --mca pml cm which is the command to run over myrinet. If you want your program to run over a different interface you can change these options. For example you can use --mca tcp sm if you want to run over ethernet.
The myrinet network also supports TCP/IP communications, and to use it you will have to create your own hostfile with the following commands:
sed s/cluster3/cluster3m/ < $PBS_NODEFILE > /scratch/nodefile.$PBS_JOBID mpirun -machinefile /scratch/nodefile.$PBS_JOBID ...
How to find the nodes my job is running on?
While your job is running, you can use
qstat -n JOB_ID
to get the node information.
NOTE: The first node is MPI node 0 of your job.
Example:- Your output would be something like this
This means that your job runs on processor 3,4,5,6,7 on cluster3 node 60.
How do I find out how much memory my job is using?
First find out which nodes are running your job. Then go to cluster3.ccast.ndsu.edu from your web browser and select the node that you want to monitor. Here you'll be see the processor usage, memory usage and much more information about your job.
While your job is running, you can also do qstat -f YOUR_JOB_ID and look for resources_used.mem. Also, while your job is running, you can ssh to the node (ssh NODENAME), and run the top command.
How do I know how much storage I’m using in GPFS Scratch?
Navigate to the GPFS scratch directory and enter,
This will show you the amount of storage you are using in GPFS Scratch. If you want more information on your sub-directories do:
du -sch *
How do I run Materials Studio on my computer?
Materials Studio is available for the use of the CCAST Users. Should a user need to use Materials Studio on their computer please send an email to firstname.lastname@example.org. We would come and install Materials Studio for you.
I want to run a job for a longer time than the queue permits?
Currently the longest queue (long queue) we have, allows for 2 weeks of computation. We believe this is sufficient for most of the users. However if your job needs more than 2 weeks of computation time, please email us at email@example.com
How do I run a sequence of similar jobs on cluster?
If you want to run an array of jobs in cluster use,
qsub -t 1-5 array.pbs
Instead of 1-5 you can use the number of jobs you want to run. If you want to run multiple ids you can type -t 1,10,20 instead of the range -t 1-5
NOTE: That the only difference between the jobs is that the PBS_ARRAYID environment variable will be available and it will have the unique job identifier of your job.
You can delete jobs from the queue with the same -t syntax. So, qdel -t 1-5 YOUR_JOB_ID will delete all 5 of the array jobs.
How do I forward an X11 connection from my job
Users may want to use X11 forwarded applications for pre-processing or post-processing. However since our users are spread across many platforms there are many reasons for this to fail. Therefore, we strongly discourage users from forwarding X11. In unavoidable circumstances users can connect to cluster using SSH with X11 enabled using the following command,
ssh -X USERNAME@clusterX.ccast.ndsu.edu (X should be replaced by cluster number and USERNAME with your username)
Once you are logged in run the following command to use X11 forwarding.
qsub -I -l nodes=1:ppn=8 -q short -l walltime=04:00:00 -X
How do I make a job depend on another?
By adding the -W depend=CONDITIONS syntax to your qsub command.CONDiTIONS can be after:jobid[:jobid...] or afterok:jobid[:jobid...] where OK is defined as the job exiting with a 0 exit status. There are a number of dependency options, and its best to run man qsub on the cluster and determine with job dependency options are best for you.
Why isn’t my job running?
There can be multiple reasons for this. First check if your job is listed in the queue. If yes, then your job will run eventually. Remember, that this is a shared resource, and that eventually all of your jobs will run. Try to provide more accurate resource (walltime,processor) requirements. The system prefers shorter/smaller jobs. Those are the easiest to schedule. You can use the checkjob JOBID command to get extra information about your job.
If your job is not listed in the queue (running or waiting), this could be due to a problem with either your PBS script or your input file. Please refer to the appropirate software sections in the website to find out more information about running particular jobs. If you still can't find the problem contact firstname.lastname@example.org for help.
My question is not here, who can I ask for help?
Please do not hesitate to contact us.
How can I get additional software installed on the clusters?
Just send a support request, and if the software is appropriate for use on the systems, it will be centrally installed. Send the request to email@example.com
Can I install 3rd party Python modules for softwares available in clusters?
You can install 3rd party modules for any software that supports 3rd party modules, under your home directory. The process of installing these 3rd party modules differs from software to software. Below is an example of installing a 3rd party module for Python 3.
First you need to download the tarball for the module you want to install and untar it. Then navigate into the extracted directory and type,
python setup.py install --home=/home/YOURUSERNAME
This would install the 3rd party module in your home directory.
When you want to import this module in a python program type,
import sys sys.path.append('/home/YOURUSERNAME/lib/python/') import YOUR_3RD_PARTY_MODULE
On top of the script. If you need any help with installing a 3rd party module for the software please email us. We would be happy to help you.
How do I get rid of the special characters that get generated when I edit a file in Windows?
When you edit a file in Windows it adds some special characters to the end of each line to indicate new line or a line return. You need to remove these special characters in order to get them to work with Linux. The easiest way to do this would be to use the "dos2unix"
This would remove all unnecessary characters from your script (Replace). For more options please look at the "man" file or email us.
How to work with modules
The user can find many applications available on any CCAST cluster. To run these applications correctly, the user must setup the Linux shell environment correctly. On the Thunder cluster, the user can change the shell environment according to the application of interest by using the environment modules as shoqn below. The application is categorized as “application_name/version-compiler” in environment modules. If user does not specify the “version-compiler”, the default one will be selected.
Useful Commands in Environment Modules
|module avail||List available applications.|
|module load application||Load shell environment variables for specific application.|
|module unload application||Unload shell environment variables for specific application.|
|module list||List currently loaded shell environment variables for applications|
|module swap application1application2||Swap shell environment variables from application1 to application2.|
|module purge||Unload all currently loaded shell environment variables by environment modules|