Container storage is the new commanding height

Source:   Editor: admin Update Time :2019-07-17

Memory circles are talking about flash and software-defined storage. One is the replacement of storage media; the other is the change in storage architecture. If broadening  horizons , you will see a new " commanding heights" of storage technology development-the container storage,High performance distributed file storage.

The ultimate form of cloud computing applications
Cloud computing is a broad topic that involves calculate, storage or network. It is the inevitable results of cloud computing that computing virtualization, SDS (software-defined storage) and SDN/NFV (software-defined network/network function virtualization).

From the application point of view, cloud native, micro-service and container technology seamlessly connect to the cloud computing elastic infrastructure. It can be said that only the cloud native and containerized applications are the true cloud applications; although traditional application cloud has a certain degree of flexibility, there is no way to fully exploit the characteristics of the cloud. Therefore, it is imperative.

Taking the Internet as an example, the reason why Internet companies are in full swing is that they are leading in container technology applications. For traditional enterprises, only catching up can they keep up with the pace of development.
The trend will definitely affect The status quo of storage technology development
Compared with traditional applications, the advantage of the container lies in the rapid reconstruction of the system. When the container or container node is abnormal, Kubernetes that is the K8S container scheduling platform can reconstruct the container. However the problem is that the data processed by the container such as website pages, configuration files, databases and big data applications disappears when the container is rebuilt or destroyed, which affects business applications

In order to maintain business continuity, it is required that container rebuild to ensure data is not lost. Technically, containers need to be provided persistent storage support.

Nowadays, K8S defines two storage methods, Flex Volume and CSI, which can be used to call external storage as a container service through the API. Optional external storage, such as CephRBD, Ceph iSCSI, etc., create a PV (Persistent Volume) through the yaml file and provide storage for the container by attaching or detaching the virtual block device. But attach and detach are time-consuming and error-prone operations.

In practical applications, a faulty node will contain dozens of hundreds of containers and each reconstruction using the stored container needs to complete the step of detaching and reattaching to the new node. Therefore, such an operation is basically not feasible.


Container storage solution
Technically, when a container provides persistent storage, it goes from stateless to stateful operations and the rapid rebuild of the container requires the support of new technologies.

YRCloudFile is a distributed container storage for problems with container applications self-developed by Beijing Yanrong Technology co., LTD
How is YRCloudFile done?
Yanrong Technology chooses distributed file storage as a carrier, so every node in the K8S cluster can real-time sharing access to individual PVs(persistent volume) and solve the problem that CephRBD and Ceph iSCSI need to reattach when dealing with Pod cross-node reconstruction in Node access mechanism to container storage.

Currently, K8S defines three access methods for persistent storage volumes: RWO (ReadWriteOnce), ROX (ReadOnlyMany) and RWX (ReadWriteMany). For container storage, ReadWriteMany read and write mode is indispensable. Only in this way, applications such as ElasticSearch, WordPress, and Harbor High Availability have storage guarantees.

This is also an important reason why YRCloudFile is more suitable for container applications than CephRBD and Ceph iSCSI.

Apart from the container reconstruction data is not lost, the number of container instances will increase by 10 times more than virtual machine scenario after the container is serviced.
In this regard, the key to container storage design is that there is not bottleneck in MDS (metadata management). YRCloudFile adopts a horizontally scalable metadata cluster architecture and uses a dynamic subtree metadata management algorithm to ensure MDS access efficiency.


In terms of IO performance, file slicing is used to ensure large file random read and write performance; by supporting RoCE or InfiniBand network, RDMA is used to improve performance.
The test data of YRCloudFile shows that performance of small IO is increased by more than 400% compared with TCP transmission after YRCloudFile used RDMA.


In the latest IO500 test, YRCloudFile High performance features has been initially reflected. Although limited by the limited number of disks in the test cluster, YRCloudFile has entered the first echelon of high-performance distributed storage in the IO500 list, which is comparable to first-line manufacturers such as EMC, WekaIO, IBM, HPE, Inspur, etc.

Not just the "support" container
With the popularity of the K8S, many cloud vendors, software-defined storage and yper-converged vendors now claim to support containers and seamlessly dock K8S, but
supporting is just not enough, there will be more functional requirements in the application。

As we all know, K8S provides FlexVolume and CSI plug-in methods for storage vendors to connect their storage solutions to K8S.Yanrong cloud provides the CSI plug-in for YRCloudFile that takes the leading Position compared with the CSI solution stored in other containers on the market.

In the latest release version YRCloudFile 6.0, newly added fault awareness function of CSI plug-in container enables K8S to automatically filter out abnormal CSI plug-in container and abnormal worker nodes connection with YRCloudFile cluster when creating and scheduling a new Pod with data persistence requirements.

Followed by QoS that is a crucial technology. QoS can help the system control to use limited resources (such as IOPS, bandwidth, etc.). This requires yaml file creating PV to set this PV IOPS (or bandwidth) to avoid some applications being "starved to death" due to insufficient resources.

Next is PV IO pressure tracking and positioning.
In the context of PV multiplying, how to track, monitor and locate PVs with large IO pressures has become a challenge for users on container cloud platforms. YRCloudFile real-time tracking based on PV IOPS or bandwidth help users locate and discover business bottlenecks to optimize programs.


Prometheus monitors the data. Today, Prometheus is an important member of the Cloud Native Computing Foundation (CNCF), which is second only to the K8S and is now a mainstream surveillance system. Based on this, users use the Prometheus exporter provided by YRCloudFile to display YRCloudFile cluster monitoring data using Grafana, or other tools.

Features such as PV Insight、PV Resize and PV Quota enable administrators to monitor and manage PVs more granularly.PV Insight helps administrators quickly view data levels and data temperatures in the PV for analysis and adjustment by graphics.PV resize is used to adjust the PV size to achieve elastic expansion.PV Quota can follow the Quota specified in the yaml file to emit error messages when writing data that exceeds the capacity limit and YRCloudFile provides a performance warning about PV granularity.


In containerized applications, Massive data sharing access is one of the typical scenarios(such as content management systems such as Drupal, WordPress, or image recognition, video encoding, video rendering, etc.).

These application scenarios need to meet the requirements of container persistent data sharing access, high performance and low cost. YRCloudFile has container persistence and high performance features. How to help enterprises achieve digital transformation goals of cost reduction and efficiency increasement , the latest version of YRCloudFile's Smart Tiering blockbuster  function  provides the answer.

Statistics show that 80% of the data has obvious characteristics of access periodicity. After a certain period of time, the data will gradually become colder. Then, the application reads and writes these cold data very little. For this data feature, YRCloudFile saves the hot data in the high-performance data layer with SSD and the cold data layer directly accesses the object storage provided by S3 standard interface provided by any third party.

The container application has no sense of the level at which the data is located when accessing the data. Users can customize the cold data strategy and the time when the cold data automatically performs cross-layer migration. The data is still readable and writable during the migration between the hot and cold data layers, which has no impact on the business.

Smart Tiering function
for one hand , YRCloudFile automatically compresses the data before the hot data enters the object storage to become cold data. For specific data such as logs, the compressed data size is only 5% of the original data; for another hand, the object-based erasure code technology increased disk utilization of the overall cluster to reduce overall costs by 40%-50%; At the same time, the high performance of the hot data layer always provides sufficient performance support for upper container applications.

These rich features may have different names, but they are all indispensable ability in application practice.

If the container persistence storage is a blue ocean market, this is like the story of "sell shoes on the island". People's eyes are different and the corresponding strategies are different.
As an innovative company, YRCloudFile has taken the lead. How will the container storage market develop in the future? Will it become a new commanding height for storage technology competition? Can container storage achieve the dream of Rongrong Technology? Everything is waiting for the market to answer.


Related Articles:

FMS 2018:Do AI in storage

Knowledge Storage - Memory Storage

Massive data and high-performance computing call ... - Memory Storage