Glusterfs is a scalable network filesystem with capabilities of scaling to several petabytes and handling thousands of clients. In this tutorial, I wi. This tutorial shows you how to install GlusterFS on Ubuntu Linux LTS server and configure 2 nodes high availability storage for your web. GlusterFS is a distributed file system defined to be used in user space. In this article, we will be Tecmint: Linux Howtos, Tutorials & Guides.
|Published (Last):||19 May 2010|
|PDF File Size:||20.95 Mb|
|ePub File Size:||18.87 Mb|
|Price:||Free* [*Free Regsitration Required]|
Storage becomes a challenge when size of the data you are dealing with becomes large. There are several possible solutions that can rescue from such problems. In this post we will be discussing one such open source solution.
Let’s discuss why finding a storage solution is very much important when the size of your data becomes large. Let’s say you have large RAID array that has a capacity of around 10 TB and is exposed to one of your server, from where other clients are mounting that volume.
Now all the read and write operations that the NFS clients issue, has to be completed at the server end by the server NFS process. Imagine if the clients are doing a heavy read write operation all simultaneously at the same time on the NFS share. In such cases the NFS server will be under work load and also will be slow on resources memory and processing power.
What if you can combine the memory and processing power of two machine’s and their individual discs to form a volume that can be accessed by clients. So basically the idea is to aggregate multiple storage servers with the help of TCP or Infiniband A high performance link used in data Centre’s to provide fast, and high throughput connection and make a very large storage.
GlusterFS does the exact same thing of combining multiple storage servers to form a large storage.
Let’s see some important and noteworthy points about GlusterFS. A good storage solution must ylusterfs elasticity in both storage and performance without affecting active operations. Yeah that’s correct, elasticity in both storage size and performance is the most important factor that makes a storage solution successful. Although most storage solutions do mention linearity as a key term in tutoril product details, 80 percent of them does not give you the absolute linearity that’s required.
However this kind of linear scaling does not happen most of the times. And a storage solution that does not provide that kind of linearity is abusing the term “linear”. Tutorrial you see in tutoial above graph, the X axis is the storage size and the Y axis is the performance meter. In order to achieve linearity as shown in the graph, both the them should go equaly together, which does not happen in real world most of the times. However that linearity can be achieved to a certain extent with the help of glusterfs file system.
It works something like the below. You can add more disks if your organization wants to increase storage size, without affecting performance. And you can increase the nodes servers taking part in the storage if your organization wants to improve performance. Performance in gluster is the tutirial memory and processing power of all nodes taking part in the storage cluster. You can also increase both the storage size as well as performance by adding more nodes and more disks per nodes.
Introduction to GlusterFS (File System) and Installation on RHEL/CentOS and Fedora
Similary you can cut down costs by rutorial nodes or removing disks. There is a principle that must be kept in mind while working with glusterfs. This is explains the working architecture of gluster rather than a principle.
Let’s discuss this performance vs storage size thing and understand it with the help of couple of diagrams. In this shown configuration, we glustdrfs the following things to note. The above configuration can be modified by adding more 1 TB drives on each server inroder to increase the storage size, without affecting performance.
However if you want to improve performance of the same storage shown above, without increasing the size of the storage, we need to add two more server nodes.
The configuration will look something like the below diagram. The things to note about this configuration is as below. However if you go on increasing the number of nodes, you need to keep the fact in mind that our network is only of 1G. So after improving performance to a certain extent, network speed will become a bottleneck. In order to solve that, we need to upgrade our network to 10 G.
The glustervs processing power, and memory of all the nodes in the system contributes to the improved performance of gluaterfs. Glusterfs has been a popular choice for good performance storage in the range of petabytes. The configuration that you should deploy on your environment depends on the kind of operation and requirement you have.
In simple words we can say that VFS is nothing but a single interface for applications to use without worrying about the underlying type of file system. Remember the fact that the actual write and read operation is not done by the VFS itself. However VFS just hands over the operation requested by the user or application to the underlying file system module inside the kernel. Understanding Network File System.
Let’s first discuss something called as FUSE before going to installation and configuration of gluster file system. The name itself suggests that user space consists of memory location allocated to programs, and kernel space is reserved for kernel to run. Now most of the file system code the program that actually does the job on the file system is inside the kernel in the form of a module.
GFS – Gluster File System – A complete Tutorial Guide for an Administrator
So basically if you need to make your own file system, then a kernel module for that specific file system must be made part of the kernel.
However there are easy methods to achieve this without touching or writing code for a kernel module. As we mentioned before, the area in memory isolated for kernel is called as kernel space, and you can make your own file system code inside user space without touching kernel space.
You might be thinking how will your file system become part of VFS? There is a mechanism available in the form of a module called FUSE File System in User Spaceto run your own file system code without touching the kernel.
GFS – Gluster File System – A complete Tutorial Guide for an Administrator
This module will provide a bridge between your file system and VFS. Hence you can make any new file system and make that compatible with Linux, with the help of FUSE module.
This module is used for mounting very large number of file system’s in Linux, some of them are mentioned below. If you have carefully read the above list of file system that runs with the help of FUSE, glusterfs is also included.
Glusterfs is widely adopted by Red Hat, for their Red Hat Enterprise Storage solutions, and recommends to use that where scaling and elasticity is very much required. For installing packages, you can download the core packages from the official site. Before installing gluster, let’s make our configuration pretty clear.
These two server nodes will be combinely making a storage volume, which our client server will be mounting. We need to first install the below packages, available form glusterfs official site. The above packages needs to be installed on the server that will be acting as glusterfs server nodes server1 and server2 in our case.
Hence the first thing that we need to do is to start the glusterd service, which can be done as shown below. Please remember the fact that we need to start glusterd service on all node servers. So as we have started our glusterd service on both server1 and server2, let’s now make a storage pool with these two servers as members in that pool.
For showing this example tutorial of glusterfs i will be using MB partitions on each server. So we will be having a volume with MB to mount on the client server However this depends on the type of gluster you are using, we will discuss that shortly.
Let’s first create a storage pool before going ahead. This storage pool will simply authorize your required servers, as part of that storage pool. Please remember the fact that you need to have perfect DNS setup, if you are going to use host names while adding servers to the pool. Please remember the fact that you need to modify the firewall rules to allow servers in the pool to probe. Otherwise you might get an error something like the below.
So in our storage pool we have two servers server1 and server2. You might be thinking, that i have not added the server1 through the same probe command.
That’s because localhost by itself will become part of the pool. Even if you try to add it you will get a message like the below one. The below command can be used to create the gluster volume with 1 brick from each node server. Let’s see the complete information of this newly created volume with the help of gluster command.
This command shown below will give you the complete information about the volume. Some of the critical information shown by the above command are mentioned below. Specifies the volume name. Specifies the unique volume ID. Shows whether the volume is started or not. Tells the total number of bricks taking part in making the volume.
This tells you the type of volume. This field needs a little more explanation as there is a large difference between different types of volumes in gluster. We will be discussing one by one differently in this article. The default one is “distributed”. This tells the type of transport mechanism used to establish communication between the server nodes taking part in the volume.
There are different types of transport methods that can be used. One is TCP, the other is high speed Infiniband cable connection between the server nodes which requires the specific infiniband drivers to be installed on both the servers. If you do not specify the transport method during volume creation the default one is selected, which is TCP in our case. As we have seen in the previous volume info command, there is something called as type of the volume.
The selection depends upon the requirement. Some of them are good for scaling storage size, while some of them are good for performance, and you can also use combination of two types together to get both the advantages. Distributed Gluster volume’s are used to spread files randomly across the number of bricks in the volume.
In other words, file 1 might be stored in the first brick, and file 2 might be stored in the other brick. There is no redundancy provided by gluster if you have created a distributed volume.
The main purpose behind making a distributed storage volume with gluster is to easily scale the volume size.