The Luna Cluster

For quick reference, download the LSF Cheat Sheet. For futher assistance, email Nicholas Socci: soccin@mskcc.org.

Directories
LSF Commands
Notes
More Notes
The Luna Cluster is made up of…

Directories

/home/$USERNAME – 100GB Quota, backed-up (mirrored)

Use home for scripts/programs/workfiles
Not for intermediate files or results

/ifs/work/$LABNAME/$USERNAME – 2Tb Quota (can request more)

Fast disk
Use for scratch/intermediate files

/ifs/res/$LABNAME/$USERNAME

Medium performance disk. Use for intermediate term storage of results

/opt/common

binaries organized by OS/PROGRAM/VERSION
E.g.: PERL
- /opt/common/CentOS6/perl/
  - perl–5.16.3
  - perl–5.20.1
  - perl–5.20.2

LSF Commands

Before you can run LSF you need to source the following file:

/common/lsf/conf/profile.lsf

Add to .profile or .bashrc or …

Simple command:

bsub sleep 30

To request N CPU’s use:

-n N

For memory:

-R "rusage[mem=GB]"

-R "rusage[mem=10]"

Requests 10Gb

If you job will run longer than 1 hour please specific the job time with:

-We HOURS:MINUTES

ie for 2 hours jobs

-We 2:00

-We 120

To send output to a file you need to use:

-o out.txt

-o DIR/

to write to a directory.

bsub -I interactive
bsub -J NAME – name the job
bsub -w “NAME” – wait on the job
bsub -w $jobid – wait on job number
bsub -o filename – redirect stdout to the file filename.
bsub -e filename – redirect stderr to the file filename

The current defaults for jobs is:

-R "rusage[mem=1]" \

-R "span[hosts=1]"

-R "rusage[iounits=1]"

Quotes are necessary.

Notes

Do not submit jobs from /ifs/data and do not source data from /ifs/data. This is only mounted on the head nodes, and not write any result anywhere.
Auto emailing is turned off, need to specify in bsub command
‘We’ is expected runtime, anything less than 60 is considered short job, which can go on all the nodes, long jobs (not short jobs) can take up 20% of all s nodes, and half of t nodes
Stdout will go to -o file. If you want to redirect, you must add quotes around execution command in bsub command. example:
```
bsub -We 1 -J jobName -o output_file.txt "ls -al 1> redirect_file.txt"
```
Hold option in bsub:
```
-w "post_done($PREV_JOBNAME)"
```

Use post_done for holding, not “done” (done starts too quick sometimes).

If holding for multiple jobs with close to same name -w “post_done($prev_Job*)” will work, unless you have one This will ONLY let the job run if $PREV_JOBNAME job completed with exit status 0 AND finished post done processes (not sure what that is).

More Notes

Kill all my jobs:
```
bkill -b -u $USERNAME 0
```
bkill -b is much better to use if you are killing multiple jobs. Otherwise it takes a lot longer, potential to crash system.
bjobs gives you running jobs and jobs finished within 3 days (using bjobs -a)
bjobs -w shows full job names

Read more to search through longer jobs use:

bhist -n N -a -J "JOBNAME""

-n is how many events files to go through (jobs are rotated to older events files ~ every day),

-a means old jobs,

-J is job name (can contain wildcard).

bresources -g -l shows what guaranteed resources are being used

brequeue -e jobID if there is a job died that you want to try again

bmod -wn jobID removes the wait dependencies from a job so it will run.

http://www.ccs.miami.edu/hpc/lsf/7.0.6/admin/

https://www.american.edu/cas/hpc/upload/LSF-commands.pdf

The Luna Cluster is made up of:

luna.cbio.mskcc.org (login host): HP DL380 Gen8, one Xeon E5–2650 v2 @ 2.60GHz, 64gb RAM
62 Compute nodes, 1024 cores total (2048 threads total)
- u01-u36: 36 HP ProLiant DL160 Gen9, dual 8-core Xeon E5–2640 v3 @ 2.60GHz , 256gb RAM per node
- s01-s24: 24 HP ProLiant DL160 Gen8, dual 8 core Xeon E5–2660 0 @ 2.20GHz , 384gb RAM per node
- t01-t02: 2 HP ProLiant DL580 Gen8, quad 8 core Xeon E7–4820 v2 @ 2.00GHz, 1.5tb RAM per node

nodes have 800GB at /scratch/$USER

SolISI (isilon array) 1.5 – 2 PBytes (NL and X)

Luna is the head node for submitting jobs to the cluster

Some nodes have internet. We will describe how to access these nodes in the future.