skip to Main Content

The Luna Cluster

For quick reference, download the LSF Cheat Sheet. For futher assistance, email Nicholas Socci: soccin@mskcc.org.

Directories

/home/$USERNAME – 100GB Quota, backed-up (mirrored)

  • Use home for scripts/programs/workfiles
  • Not for intermediate files or results

/ifs/work/$LABNAME/$USERNAME – 2Tb Quota (can request more)

  • Fast disk
  • Use for scratch/intermediate files

/ifs/res/$LABNAME/$USERNAME

  • Medium performance disk. Use for intermediate term storage of results

/opt/common

  • binaries organized by OS/PROGRAM/VERSION
  • E.g.: PERL
    • /opt/common/CentOS6/perl/
      • perl–5.16.3
      • perl–5.20.1
      • perl–5.20.2

LSF Commands

Before you can run LSF you need to source the following file:

/common/lsf/conf/profile.lsf

Add to .profile or .bashrc or …

Simple command:

bsub sleep 30

To request N CPU’s use:

-n N

For memory:

-R "rusage[mem=GB]"
-R "rusage[mem=10]"

Requests 10Gb

If you job will run longer than 1 hour please specific the job time with:

-We HOURS:MINUTES

ie for 2 hours jobs

-We 2:00

or

-We 120

To send output to a file you need to use:

-o out.txt

or

-o DIR/

to write to a directory.

  • bsub -I interactive
  • bsub -J NAME – name the job
  • bsub -w “NAME” – wait on the job
  • bsub -w $jobid – wait on job number
  • bsub -o filename – redirect stdout to the file filename.
  • bsub -e filename – redirect stderr to the file filename

The current defaults for jobs is:

-R "rusage[mem=1]" \
-R "span[hosts=1]" 
-R "rusage[iounits=1]"

Quotes are necessary.

Notes

  • Do not submit jobs from /ifs/data and do not source data from /ifs/data. This is only mounted on the head nodes, and not write any result anywhere.
  • Auto emailing is turned off, need to specify in bsub command
  • ‘We’ is expected runtime, anything less than 60 is considered short job, which can go on all the nodes, long jobs (not short jobs) can take up 20% of all s nodes, and half of t nodes
  • Stdout will go to -o file. If you want to redirect, you must add quotes around execution command in bsub command. example:
    bsub -We 1 -J jobName -o output_file.txt "ls -al 1> redirect_file.txt"
  • Hold option in bsub:
    -w "post_done($PREV_JOBNAME)"

Use post_done for holding, not “done” (done starts too quick sometimes).

If holding for multiple jobs with close to same name -w “post_done($prev_Job*)” will work, unless you have one This will ONLY let the job run if $PREV_JOBNAME job completed with exit status 0 AND finished post done processes (not sure what that is).

More Notes

  • Kill all my jobs:
    bkill -b -u $USERNAME 0
  • bkill -b is much better to use if you are killing multiple jobs. Otherwise it takes a lot longer, potential to crash system.
  • bjobs gives you running jobs and jobs finished within 3 days (using bjobs -a)
  • bjobs -w shows full job names

Read more to search through longer jobs use:

bhist -n N -a -J "JOBNAME""

-n is how many events files to go through (jobs are rotated to older events files ~ every day),

-a means old jobs,

-J is job name (can contain wildcard).

-J is job name (can contain wildcard).

bresources -g -l shows what guaranteed resources are being used

bresources -g -l shows what guaranteed resources are being used

brequeue -e jobID if there is a job died that you want to try again

bmod -wn jobID removes the wait dependencies from a job so it will run.

http://www.ccs.miami.edu/hpc/lsf/7.0.6/admin/

https://www.american.edu/cas/hpc/upload/LSF-commands.pdf

The Luna Cluster is made up of:

  • luna.cbio.mskcc.org (login host): HP DL380 Gen8, one Xeon E5–2650 v2 @ 2.60GHz, 64gb RAM
  • 62 Compute nodes, 1024 cores total (2048 threads total)
    • u01-u36: 36 HP ProLiant DL160 Gen9, dual 8-core Xeon E5–2640 v3 @ 2.60GHz , 256gb RAM per node
    • s01-s24: 24 HP ProLiant DL160 Gen8, dual 8 core Xeon E5–2660 0 @ 2.20GHz , 384gb RAM per node
    • t01-t02: 2 HP ProLiant DL580 Gen8, quad 8 core Xeon E7–4820 v2 @ 2.00GHz, 1.5tb RAM per node

nodes have 800GB at /scratch/$USER

  • SolISI (isilon array) 1.5 – 2 PBytes (NL and X)

Luna is the head node for submitting jobs to the cluster

Some nodes have internet. We will describe how to access these nodes in the future.

Back To Top