Project

General

Profile

Dione user instructions » History » Version 6

Anonymous, 2019-10-04 15:57

1 1 Anonymous
h1. User instructions for Dione cluster
2 6 Anonymous
3 1 Anonymous
University of Turku
4
Åbo Akademi
5
Jussi Salmi (jussi.salmi@utu.fi)
6 2 Anonymous
7 1 Anonymous
h2. 1. Resources
8 2 Anonymous
9 1 Anonymous
h3. 1.1. Computation nodes
10
11 5 Anonymous
<pre>
12
PARTITION NODES NODELIST  MEMORY
13
normal    36    di[1-36]  192GB
14
gpu       6     di[37-42] 384GB
15
</pre>
16 1 Anonymous
17
Dione has 6 GPU-nodes where the user can perform calculation which benefits from very fast and parallel number crunching. This includes e.g. neural nets. The 36 other nodes are general purpose processors. The nodes are connected via a fast network, Infiniband, enabling MPI (Message Passing Interface) usage in the cluster. In addition, the cluster is connected to the EGI-grid (European Grid Infrastructure) and NORDUGRID which are allowed to use a part of the computational resources. The website
18
19
https://p55cc.utu.fi/
20
21
Contains information on the cluster, a cluster monitor and provides instructions on getting access and using the cluster.
22 3 Anonymous
23 1 Anonymous
h3. 1.2. Disk space
24
25
The system has an NFS4 file system with 100TB capacity on the home partition. The system is not backed up anywhere, so the user must handle backups himself/herself.
26 3 Anonymous
27 1 Anonymous
h3. 1.3. Software
28
29
The system uses the SLURM workload manager (Simple Linux Utility for Resource Management) for scheduling the jobs.
30
31
The cluster uses the module-system for loading software modules with different version for execution.
32 3 Anonymous
33 1 Anonymous
h2. 2. Executing jobs in the cluster
34
35
The user may not execute jobs on the login node. All jobs must be dispatched to the cluster by using SLURM commands. Normally a script is used to define the jobs and the parameters for SLURM. There is a large number of parameters and environment variables that can be used to define how the jobs should be executed, please look at the SLURM manual for a complete list.
36
37
A typical script for starting the jobs can look as follows (name:batch-submit.job):
38
39
<pre>
40
#!/bin/bash
41
#SBATCH --job-name=test
42
#SBATCH -o result.txt
43
#SBATCH --workdir=<Workdir path>
44
#SBATCH -c 1
45
#SBATCH -t 10:00
46
#SBATCH --mem=10M
47
module purge # Purge modules for a clean start
48
module load <desired modules if needed> # You can either inherit module environment, or insert one here
49
50
srun <executable>
51
srun sleep 60
52
</pre>
53
54
55
The script is run with
56
57
sbatch batch-submit.job
58
59
The script defines several parameters that will be used for the job.
60
61
<pre>
62 4 Anonymous
--job-name    defines the name
63 1 Anonymous
-o result.txt redirects the standard output to results.txt
64 4 Anonymous
--workdir     defines the working directory
65
-c 1          sets the number of cpus per task to 1
66
-t 10:00      the time limit of the task is set to 10 minutes. After that the process is stopped
67
--mem=10M     the memory required for the task is 10MB.
68 1 Anonymous
</pre>
69
70 4 Anonymous
srun starts a task. When starting the task SLURM gives it a job id which can be used to track it’s execution with e.g. the squeue command.
71 1 Anonymous
72
73
h2. 3. The module system
74
75
Many of the software packages in Dione require you to load the kernel modules prior to using the software. Different versions of the software can be used with module.
76
77
<pre>
78
module avail Show available modules
79
80
module list Show loaded modules
81
82
module unload <module> Unload a module
83
84
module load <module> Load a module
85
86
module load <module>/10.0 Load version 10.0 of <module>
87
88
module purge unload all modules
89
</pre>
90
91
92
h2. 4. Useful commands in SLURM
93
94
<pre>
95
sinfo shows the current status of the cluster.
96
97
sinfo -p gpu Shows the status of the GPU-partition
98
sinfo -O all Shows a comprehensive status report node per node
99
100
sstat <job id> Shows information on your job
101
102
squeue The status of the job queue
103
squeue -u <username> Show only your jobs
104
105
srun <command> Dispatch jobs to the scheduler
106
107
sbatch <script> Run a script defining jobs to be run
108
109
scontrol Control your jobs in many aspects
110
scontrol show job <job id> Show details about the job
111
scontrol -u <username> Show only a certain users jobs
112
113
scancel <job id> Cancel a job
114
scancel -u <username> Cancel all your jobs
115
</pre>
116
117
118
h2. 5. Further information
119
120
Further information can be asked from the administrators (fgi-admins@lists.utu.fi).