Content | Navigation |

Introduction to Cluster 2

Architecture of Cluster 2

Cluster 2 is built up of 32 nodes with each node consisting 8 processor cores. Each of these nodes consist of 2.66 GHz Penryn processors and 32GB of memory. These nodes are connected with 1Gbps ethernet cables to each other. Cluster 2 is used for single node jobs and is ideal for single node sequential jobs. Theorotical peak performance of this cluster is approximately 2.7 Tflops.

The cluster 2 login node can be accessed using SSH from any computer within the NDSU network (Including WiFi) if you have access to the CCAST resources. If you need access to cluster2 from an off campus location send an email with your IP address stating why you need to access the cluster2 from outside university network.

 

Types of Queues

Cluster 2

Route QueueRoute OrderExecution QueueAvailable Nodes per QueueAvailable Processors Cores per QueueMax WalltimeMax Nodes per JobMax Processors per JobNotes
default1short322568 hours18

Non-Interactive

2medium1612824 hours18
3long4321 week18
matlab4328 hours432
devel184 hours18Interactive jobs
allowed on this queue

For more information, please see CCAST User Policies under Queue Policies.

Modules

The Environment Modules package provides an easy way for the dynamic modification of a user’s environment. It enables users to use multiple versions of different software. Each of these modulefiles contain the information needed to configure the shell enviornment for an application. Once the modules package is initialized, the enviornment can be modified on a per-module basis using the module command which interprets modulefiles. Typically modulefiles instruct the module command to alter or set shell enviornment variables such as PATH, MANPATH, etc. These modulefiles can be shared by many users in a system.

Modules can be "loaded" and "unloaded" dynamically and atomically. All popular shells are supported, including bash, sh, ksh, zsh, csh, tcsh as well as some scripting languages such as perl and python.

To check the available modules 

== $0 $>module avail

To load a module 

$> module load gcc-4.5.3

(Replace 'gcc-4.5.3' with the desired module name)

To load a different version of the module

$> module swap gcc-4.6.2

To list the currently loaded modules

$> module list

To unload a module

$> module unload gcc-4.6.2

To unload all loaded modules

$> module purge

Cluster 2 FAQs

How do I log into a CCAST Cluster?

There are currently four clusters at CCAST. The following host names can be used:

cluster2.ccast.ndsu.edu
cluster3.ccast.ndsu.edu
thunder.ccast.ndsu.edu
cyrus.ccast.ndsu.edu

How to login from a Windows Computer

PuTTY SSH client should be used to access any CCAST cluster from a Windows computer. Once PuTTY is installed, double click the application and enter the hostname to access the cluster. Below is an example of how to log into the Thunder cluster:

How to login from Apple/Linux computer

Open the terminal window and then execute following line from the terminal window to access any CCAST cluster.

NOTE: “username” is user’s login name and the host name can be changed to the cluster's name. Below is an example of how to log into the Thunder cluster.

Unlike on a typical desktop, the user cannot execute their application by logging into the compute node directly or running the application on the login nodes.  The user can edit files and submit their application to the batch queue system from the login nodes.

How can I transfer data to a CCAST Cluster?

There are currently four clusters at CCAST. The following host names can be used:

cluster2.ccast.ndsu.edu
cluster3.ccast.ndsu.edu
thunder.ccast.ndsu.edu
cyrus.ccast.ndsu.edu

Transfering between a Windows computer and a CCAST cluster

WinSCP client should be used to transfer data between your Windows computer and any CCAST cluster. WinSCP can be downloaded from here for free.
Once your download is complete, open the application and fill in the fields as seen in the screenshot below.

Any of the host names from the above list can be substituted for the host name field.

Please enter your username and password in the appropriate fields and select 'SCP' as the File protocol and click login.
Once you are logged in you should be able to see the following screen:

Now you can drag and drop your files between your computer and any CCAST cluster easily.

Transfering between an Apple/Linux computer and a CCAST cluster

You can use the scp command to transfer files between your computer and any CCAST cluster. Enter the following commands in a terminal on your own computer.

To transfer files from CCAST cluster to your computer:

$> scp [[username@hostname]:[source-file]] [[destination]]

For example, you can transfer files from your Cluster3 to your computer as shown below:

$> scp your_username@cluster3.ccast.ndsu.edu:/home/your_username/myfile.txt /home/mycomputer/myfile.txt

To transfer files from your computer to CCAST cluster:

$> scp [[source-file]] [[username@hostname]:[[destination]]

For example, you can transfer files from your computer to the Thunder cluster as shown below:

$> scp myfile.txt your_username@thunder.ccast.ndsu.edu:/home/your_username

Any of the host names from the above list can be substituted for the fully qualified domain name field.

How do I work with Batch Queue Systems (PBS/Torque)?

Some useful commands in the batch queue system:

Here at CCAST, we use TORQUE, which is an open source implemtation of PBS.  The batch system or resource manager divides up larger computer systems so that multiple users and their jobs can be run on them.  Another part of the batch system is the scheduler which decides where and when a job will run.  A user requests resources from the resource manager in terms of time, the number of compute nodes, and the resources on those nodes which may be RAM, network, disk, software licenses, etc.  Once a job is submitted to the batch system, then the scheduler decides where and when that job will run.

Below yoiu will find information on:

  • A Sample PBS batch script
  • Environment variables available to your script
  • Batch system commands

Sample batch script

The best way to demonstrate using the batch system is with an input script example.  Below is a sample script that can be modifed to run a job, and there are more application specific scripts found on the interative nodes under /usr/local/PBS_EXAMPLES.  More information can be found in the manual page for qsub by typing man qsub.  The '#PBS' directives are significant.  They are not comments, but special directives to the batch system, but ignored by the scripting language.  To submit a job, simply type qsub $SCRIPTNAME.

PBS script example

#!/bin/bash
#
# file name: sample.pbs
# usage: qsub sample.pbs
#
# nick name of your job
#PBS -N my_first_job
#
# resource limits: number of node and number of processor per node to be used
# In this case, requesting single node and eight processors on the single node.
# nodes: number of compute node
# ppn: number of processor per node
#PBS -l nodes=1:ppn=8
#
# resource limits: amount of memory to be used
#PBS -l mem=1024mb
#
# resource limits: maximum wall clock time can be allocated
#PBS -l walltime=3:20:00
#
# path and filename of standard output
#PBS -o path/filename.o
#
# path and filename of standard error
#PBS -e path/filename.e
#
# queue name, one of {default, special express}
# The default queue, "default", need not be specified
#PBS -q default
#
# user’s email addresss
#PBS -M my-email-address
#
# send an email when job begins
#PBS -m b
# send an email when job ends
#PBS -m e
# send an email when job aborts (with an error)
#PBS -m a
# export all current shell environment variables to the job
#PBS –V
#
/path/to/the/executable

Batch environment variables avialable in your script

Some environment variables are provided by the batch system to your script. Some of the more meaningful ones are.

Batch Variables
PBS_JOBID This is the unique job identifier of your job.  This can be used to create scratch directories or other unique identifiers that are specific to your job.
PBS_O_WORKDIR This is the working directory where you originally submitted your job from.  It is common to use 'cd $PBS_O_WORKDIR' or to access input/output files via $PBS_O_WORKDIR
PBS_JOBNAME This is the name specified with the -N option when submitting your script
CommandExampleDescription
qstat qstat Display status of entire jobs in batch system.
qstat –f JOBID Display detailed information about a specific job.
qsub qsub submit_script Submit a job to the batch system.
qdel qdel JOBID Terminate specific job.
qalter qalter –N nickname JOBID Change nickname of specific job.
qmove qmove queue JOBID Move specific job to another queue.

 

 

How to start using Materials Studio?

You have to contact the CCAST support staff if you need to start using Materials Studio. We would be glad to come down to your workplace to install it for you.

How to run interactive jobs in the cluster?

Please use the command below to run interactive jobs in Cluster 2 and Cluster 3:
qsub -I -l nodes=1:ppn=8 -l walltime=4:00:00 -q devel

Please use the command below to run interactive jobs in Thunder:
qsub -I -l nodes=1:ppn=8 -l walltime=4:00:00 -q def-devel 

Note: Please limit your interactive jobs to maximum of 2 hours.

What kind of jobs should I run on Cluster 2?

Cluster 2 is ideal for single node 8 processor sequential jobs.

Can I access individual compute nodes?

Yes you can access individual nodes if you have a job running in that particular node.  Suppose you have a job running in cluster3-24 and cluster3-25, you can login to these nodes by typing

ssh cluster3-xx (xx replaces the node number)

NOTE: Access is not allowed unless you have a job already running on the node through the batch system.

My rsync request times out before the process is complete?

When you issue a rsync command it runs on the cluster head node. The cluster head node has a 30 minute time limitation to run commands. Therefore if you are trying to rsync a large amount of data it would time out after 30 minutes. The best option is to create multiple directories and rsync them seperately one by one. The other option is to remove 'z' option from the command. This would save the time used to compressing,

rsync -av [SOURCE] [DESTINATION]

When using rsync be extremely careful about the trailing slash. for example look at the following two rsync commands, 

#1: rsync -av /some/path/a/ /some/otherpath/
#2: rsync -av /some/path/a /some/otherpath/

The first command will make /some/otherpath/ mirror the content of /some/path/a/ whereas the second command would create a directory inside /some/otherpath/ to mirror the content of /some/path/a. Therefore be extremly mindful about what you want. 

How to find the nodes my job is running on?

While your job is running, you can use 

qstat -n JOB_ID

to get the node information.

NOTE: The first node is MPI node 0 of your job.

Example:- Your output would be something like this

cluster3-60/7+cluster3-60/6+cluster3-60/5+cluster3-60/4+cluster3-60/3

This means that your job runs on processor 3,4,5,6,7 on cluster3 node 60. 

How do I know how much storage I’m using in GPFS Scratch?

Navigate to the GPFS scratch directory and enter,

du -sch

This will show you the amount of storage you are using in GPFS Scratch.  If you want more information on your sub-directories do:

du -sch *

 

How do I find out how much memory my job is using?

First find out which nodes are running your job. Then go to cluster2.ccast.ndsu.edu from your web browser and select the node that you want to monitor. Here you'll be see the processor usage, memory usage and much more information about your job.

How do I run Materials Studio on my computer?

Materials Studio is available for the use of the CCAST Users. Should a user need to use Materials Studio on their computer please send an email to support@ccast.ndsu.edu. We would come and install Materials Studio for you. 

I want to run a job for a longer time than the queue permits?

Currently the longest queue (long queue) we have, allows for 2 weeks of computation. We believe this is sufficient for most of the users. However if your job needs more than 2 weeks of computation time, please email us at support@ccast.ndsu.edu

How do I run a sequence of similar jobs on cluster?

If you want to run an array of jobs in cluster use,

qsub -t 1-5 array.pbs

Instead of 1-5 you can use the number of jobs you want to run. If you want to run multiple ids you can type -t 1,10,20 instead of the range -t 1-5

NOTE:  That the only difference between the jobs is that the PBS_ARRAYID environment variable will be available and it will have the unique job identifier of your job.

You can delete jobs from the queue with the same -t syntax. So, qdel -t 1-5 YOUR_JOB_ID[] will delete all 5 of the array jobs.

How do I forward an X11 connection from my job

Users may want to use X11 forwarded applications for pre-processing or post-processing. However since our users are spread across many platforms there are many reasons for this to fail. Therefore, we strongly discourage users from forwarding X11. In unavoidable circumstances users can connect to cluster using SSH with X11 enabled using the following command,

ssh -X USERNAME@clusterX.ccast.ndsu.edu (X should be replaced by cluster number and USERNAME with your username)

Once you are logged in run the following command to use X11 forwarding.

qsub -I -l nodes=1:ppn=8 -q short -l walltime=04:00:00 -X 

 

How do I make a job depend on another?

By adding the -W depend=CONDITIONS syntax to your qsub command.

CONDiTIONS can be after:jobid[:jobid...] or afterok:jobid[:jobid...] where OK is defined as the job exiting with a 0 exit status. There are a number of dependency options, and its best to run man qsub on the cluster and determine with job dependency options are best for you.

Why isn’t my job running?

There can be multiple reasons for this. First check if your job is listed in the queue. If yes, then your job will run eventually.  Remember, that this is a shared resource, and that eventually all of your jobs will run. Try to provide more accurate resource (walltime,processor) requirements. The system prefers shorter/smaller jobs. Those are the easiest to schedule.  You can use the checkjob JOBID command to get extra information about your job.

If your job is not listed in the queue (running or waiting), this could be due to a problem with either your PBS script or your input file. Please refer to the appropirate software sections in the website to find out more information about running particular jobs. If you still can't find the problem contact support@ccast.ndsu.edu for help.

My question is not here, who can I ask for help?

Please do not hesitate to contact us.

How can I get additional software installed on the clusters?

Just send a support request, and if the software is appropriate for use on the systems, it will be centrally installed.  Send the request to support@ccast.ndsu.edu

Can I install 3rd party Python modules for softwares available in clusters?

You can install 3rd party modules for any software that supports 3rd party modules, under your home directory. The process of installing these 3rd party modules differs from software to software. Below is an example of installing a 3rd party module for Python 3. 

First you need to download the tarball for the module you want to install and untar it. Then navigate into the extracted directory and type,

python setup.py install --home=/home/YOURUSERNAME

This would install the 3rd party module in your home directory.
When you want to import this module in a python program type,

import sys

sys.path.append('/home/YOURUSERNAME/lib/python/')

import YOUR_3RD_PARTY_MODULE

On top of the script. If you need any help with installing a 3rd party module for the software please email us. We would be happy to help you. 

How do I get rid of the special characters that get generated when I edit a file in Windows?

When you edit a file in Windows it adds some special characters to the end of each line to indicate new line or a line return. You need to remove these special characters in order to get them to work with Linux. The easiest way to do this would be to use the "dos2unix" 

Just type. 

$>dos2unix /path/to/filename

This would remove all unnecessary characters from your script (Replace). For more options please look at the "man" file or email us. 

How to work with modules

Environment Modules

The user can find many applications available on any CCAST cluster. To run these applications correctly, the user must setup the Linux shell environment correctly. On the Thunder cluster, the user can change the shell environment according to the application of interest by using the environment modules as shoqn below. The application is categorized as “application_name/version-compiler” in environment modules. If user does not specify the “version-compiler”, the default one will be selected.

Useful Commands in Environment Modules

CommandDescription
module avail List available applications.
module load application Load shell environment variables for specific application.
module unload application Unload shell environment variables for specific application.
module list List currently loaded shell environment variables for applications
module swap application1application2 Swap shell environment variables from application1 to application2.
module purge Unload all currently loaded shell environment variables by environment modules

 


Student Focused. Land Grant. Research University.

Follow NDSU
  • Facebook
  • Twitter
  • RSS
  • Google Maps

CCAST Support
Phone: 701.231.5184
Physical/delivery address:  1805 NDSU Research Park Drive/Fargo, ND 58102
Mailing address:  P.O. Box 6050—Dept. 4100/Fargo, ND 58108-6050
Page manager: CCAST

Last Updated: Friday, July 28, 2017 8:31:08 AM
Privacy Statement