Wednesday, March 16, 2016

File Placement Optimizer (FPO) setup with Spectrum Scale 4.2

What is File Placement Optimizer (FPO) ?

 

         GPFS File Placement Optimizer (FPO) is a set of features that allow GPFS to operate efficiently in a system based on a shared nothing architecture. It is particularly useful for "big data" applications that process massive amounts of data.
         

Why this post ?

 

          Spectrum Scale 4.2 installer toolkit does not support FPO configuration enabling at the time of installation. This blog will provide you step by step guide for configuring FPO setup with spectrum scale installer toolkit.

Where to start ?

 

               Let's start with extracting and configuring spectrum scale installer toolkit similar to the regular setup. Here are details of my setup which I had given to spectrum scale installer toolkit. If you are looking for more help about spectrum scale installer toolkit then you will find it here - Overview of the spectrumscale installation toolkit

[root@viknode1 installer]# ./spectrumscale node list
[ INFO  ] List of nodes in current configuration:
[ INFO  ] [Installer Node]
[ INFO  ] 10.0.100.71
[ INFO  ]
[ INFO  ] [Cluster Name]
[ INFO  ] vwnode.gpfscluster
[ INFO  ]
[ INFO  ] [Protocols]
[ INFO  ] Object : Enabled
[ INFO  ] SMB : Enabled
[ INFO  ] NFS : Enabled
[ INFO  ]
[ INFO  ] GPFS Node Admin  Quorum  Manager  NSD Server  Protocol  GUI Server
[ INFO  ] viknode1   X       X                  X
[ INFO  ] viknode2           X                  
[ INFO  ] viknode3           X                  
[ INFO  ] viknode4                   X                     X
[ INFO  ] viknode5                   X                     X
[ INFO  ]
[ INFO  ] [Export IP address]
[ INFO  ] 10.0.100.76 (pool)
[ INFO  ] 10.0.100.77 (pool)
[root@viknode1 installer]# ./spectrumscale nsd list
[ INFO  ] Name FS            Size(GB) Usage           FG Pool    Device        Servers
[ INFO  ] nsd1 cesSharedRoot unknown  dataAndMetadata 1  Default /dev/dm-2     [viknode1]

Here I have added one NSD which will be required by cesSharedRoot filesystem.
The CES shared root (cesSharedRoot) is needed for storing CES shared configuration data, protocol recovery, and for some other protocol specific purpose.
Here is a high level diagram for this setup -


(Click on diagram to enlarge)
Let's run install command to install basic GPFS packages and GPFS commands.

[root@viknode1 installer]# ./spectrumscale install


Configuring NSDs for FPO 

 

               Configuring NSDs is more or less everything about FPO. According to IBM's official documentation, it is recommended that GPFS FPO configuration has two storage pools, a system pool for metadata only and a data pool. On my setup I will be creating three storage pools. A fast storage pool and a slow storage pool and a system storage pool. Fast storage pool, let's say, have all SSDs and other fast disks; a slow storage pool, let's say, have all HDDs and other slow disks; and a pool named 'system' for storing metadata.
Storage pool:
 
Storage pool stanzas are used to specify the type of layout map and write affinity depth, and to enable write affinity, for each storage pool.
Storage pool stanzas have the following format:

%pool: 
  pool=StoragePoolName  # name of the storage pool.
  blockSize=BlockSize  # the block size of the disks in the storage pool.
  usage={dataOnly | metadataOnly | dataAndMetadata}  # the type of data to be stored in the storage pool.
  layoutMap={scatter | cluster}  # The block allocation map type cannot be changed after the storage pool has been created.
  allowWriteAffinity={yes | no}  # Indicates whether the IBM Spectrum Scale File Placement Optimizer (FPO) feature is to be enabled for the storage pool.
  writeAffinityDepth={0 | 1 | 2}  # Specifies the allocation policy to be used by the node writing the data. It is also used for FPO-enabled pools.
  blockGroupFactor=BlockGroupFactor  # Specifies how many file system blocks are laid out sequentially on disk to behave like a single large block. This option only works on FPO enabled pools, where --allow-write-affinity is set for the data pool. 

For more details check Planning for IBM Spectrum Scale FPO
NSD:

          Every local disk to be used by GPFS must have a matching entry in the disk file.
          NSD stanzas have this format:
Storage pool stanzas have the following format:

%nsd:
  device=DiskName  # device name that appears in /dev
  nsd=NsdName  # name of the NSD to be created
  servers=ServerList  # comma-separated list of NSD server nodes
  usage={dataOnly | metadataOnly | dataAndMetadata | descOnly | localCache}  # disk usage
  failureGroup=FailureGroup  # the failure group to which this disk belongs
  pool=StoragePool  # the name of the storage pool to which the NSD is assigned

On my setup I have three virtual files on three machines which I'll use as disks.

[root@viknode1 ~]# ls /dev/dm-3
/dev/dm-3
[root@viknode2 ~]# ls /dev/dm-4
/dev/dm-4
[root@viknode3 ~]# ls /dev/dm-5
/dev/dm-5

Now let's create a new Staza File in /tmp

[root@viknode1 ~]# cat /tmp/newStanzaFile
%pool:
pool=fast
layoutMap=cluster
blocksize=1024K
allowWriteAffinity=yes  # this option enables FPO feature
writeAffinityDepth=1  # place 1st copy on disks local to the node writing data
blockGroupFactor=128  # Defines chunk size of 128MB

%pool:
pool=slow
layoutMap=cluster
blocksize=1024K
allowWriteAffinity=yes  # this option enables FPO feature
writeAffinityDepth=1  # place 1st copy on disks local to the node writing data
blockGroupFactor=128  # Defines chunk size of 128MB

#Disks in system pool are defined for metadata
%nsd:
nsd=nsd1
device=/dev/dm-3
servers=viknode1
usage=metadataOnly
failureGroup=101
pool=system

# Disks in fast pool
%nsd:
nsd=nsd2
device=/dev/dm-4
servers=viknode2
usage=dataOnly
failureGroup=102
pool=fast

# Disk(s) in slow pool
%nsd:
nsd=nsd3
device=/dev/dm-5
servers=viknode3
usage=dataOnly
failureGroup=103
pool=slow

Here, I have three pools -
1) System pool - Created by default by installer toolkit. I will use it to store metadata.
2) Fast pool - For fast disks. Use to store data.
3) Slow pool - For slow disks. Use to store data.

Lets create these NSDs

[root@viknode1 ~]# mmcrnsd -F /tmp/newStanzaFile

Creating NSDs is async process.
After NSDs are created you can check them using mmlsnsd command.

[root@viknode1 ~]# mmlsnsd

 File system   Disk name    NSD servers
---------------------------------------------------------------------------
 (free disk)   nsd1         viknode1
 (free disk)   nsd2         viknode1
 (free disk)   nsd3         viknode2
 (free disk)   nsd4         viknode3

Now we are going to create a gpfs file system on these NSDs. I am going with all default parameters but you can tune the parameters as per your requirement. Here is guide to mmcrfs command.

[root@viknode1 ~]#  mmcrfs gpfs0 -F /tmp/newStanzaFile -T /ibm/gpfs0

Ones file system is created then you can check it with mmlsfs command.

[root@viknode1 installer]# mmlsfs all

File system attributes for /dev/gpfs0:
======================================
flag                value                    description
------------------- ------------------------ -----------------------------------
 -f                 8192                     Minimum fragment size in bytes (system pool)
                    32768                    Minimum fragment size in bytes (other pools)
 -i                 4096                     Inode size in bytes
 -I                 16384                    Indirect block size in bytes
 -m                 1                        Default number of metadata replicas
 -M                 2                        Maximum number of metadata replicas
 -r                 1                        Default number of data replicas
 -R                 2                        Maximum number of data replicas
 -j                 cluster                  Block allocation type
 -D                 nfs4                     File locking semantics in effect
 -k                 nfs4                     ACL semantics in effect
 -n                 32                       Estimated number of nodes that will mount file system
 -B                 262144                   Block size (system pool)
                    1048576                  Block size (other pools)
 -Q                 none                     Quotas accounting enabled
                    none                     Quotas enforced
                    none                     Default quotas enabled
 --perfileset-quota No                       Per-fileset quota enforcement
 --filesetdf        No                       Fileset df enabled?
 -V                 15.01 (4.2.0.0)          File system version
 --create-time      Thu Apr  7 08:06:30 2016 File system creation time
 -z                 No                       Is DMAPI enabled?
 -L                 4194304                  Logfile size
 -E                 Yes                      Exact mtime mount option
 -S                 No                       Suppress atime mount option
 -K                 whenpossible             Strict replica allocation option
 --fastea           Yes                      Fast external attributes enabled?
 --encryption       No                       Encryption enabled?
 --inode-limit      65792                    Maximum number of inodes
 --log-replicas     0                        Number of log replicas
 --is4KAligned      Yes                      is4KAligned?
 --rapid-repair     Yes                      rapidRepair enabled?
 --write-cache-threshold 0                   HAWC Threshold (max 65536)
 -P                 system;fast;slow         Disk storage pools in file system
 -d                 nsd2;nsd3;nsd4           Disks in file system
 -A                 yes                      Automatic mount option
 -o                 none                     Additional mount options
 -T                 /ibm/gpfs0               Default mount point
 --mount-priority   0                        Mount priority

You can check storage pools with mmlspool command.

[root@viknode1 installer]# mmlspool gpfs0 all -L
Pool:
  name                   = system
  poolID                 = 0
  blockSize              = 256 KB
  usage                  = metadataOnly
  maxDiskSize            = 98 GB
  layoutMap              = cluster
  allowWriteAffinity     = no
  writeAffinityDepth     = 0
  blockGroupFactor       = 1

Pool:
  name                   = fast
  poolID                 = 65537
  blockSize              = 1024 KB
  usage                  = dataOnly
  maxDiskSize            = 64 GB
  layoutMap              = cluster
  allowWriteAffinity     = yes
  writeAffinityDepth     = 1
  blockGroupFactor       = 128

Pool:
  name                   = slow
  poolID                 = 65538
  blockSize              = 1024 KB
  usage                  = dataOnly
  maxDiskSize            = 64 GB
  layoutMap              = cluster
  allowWriteAffinity     = yes
  writeAffinityDepth     = 1
  blockGroupFactor       = 128

'allowWriteAffinity = yes' in above output shows disks in pool are enabled for FPO.
Let's mount this file system on all nodes.

[root@viknode1 ~]# mmmount gpfs0 -a
Wed Mar 16 10:40:42 EDT 2016: mmmount: Mounting file systems ...
[root@viknode1 ~]# mmlsmount gpfs0
File system gpfs0 is mounted on 5 nodes.

Enable protocols as per your requirement.
Don't forget to mention correct filesystem and mount point for deploying protocols.

[root@viknode1 installer]# ./spectrumscale node list
[ INFO  ] List of nodes in current configuration:
[ INFO  ] [Installer Node]
[ INFO  ] 10.0.100.71
[ INFO  ]
[ INFO  ] [Cluster Name]
[ INFO  ] vwnode.gpfscluster
[ INFO  ]
[ INFO  ] [Protocols]
[ INFO  ] Object : Enabled
[ INFO  ] SMB : Enabled
[ INFO  ] NFS : Enabled
[ INFO  ]
[ INFO  ] GPFS Node Admin  Quorum  Manager  NSD Server  Protocol  GUI Server
[ INFO  ] viknode1   X       X                  X
[ INFO  ] viknode2           X                  X
[ INFO  ] viknode3           X                  X
[ INFO  ] viknode4                   X                     X
[ INFO  ] viknode5                   X                     X
[ INFO  ]
[ INFO  ] [Export IP address]
[ INFO  ] 10.0.100.76 (pool)
[ INFO  ] 10.0.100.77 (pool)
[root@viknode1 installer]# ./spectrumscale config protocols -f cesSharedRoot -m /ibm/cesSharedRoot
[root@viknode1 installer]# ./spectrumscale config object -f gpfs0 -m /ibm/gpfs0

Now you can deploy protocols and your setup will be ready with FPO.

[root@viknode1 installer]# ./spectrumscale deploy

For more details here are recommended videos -
Spectrum Scale (GPFS) for Hadoop Technical Introduction (Part 1 of 2)
Spectrum Scale (GPFS) for Hadoop Technical Introduction (Part 2 of 2)