Apache Airflow Integral Part of the CASFS+ Cloud Workspace

When building and creating CASFS+, Code Willing wanted to provide as much as possible to their clients without clients needing to download other extensions or add-ons. One thing that has come out of this standard is the use of Apache Airflow, one of the most popular open-source tools for building robust ETL pipelines.

When working in the CASFS+ Cloud Workspace, clients immediately have access to Apache Airflow. No configuring or downloading is needed. By utilizing Airflow with CASFS+, users can easily launch multiple nodes in their network, each with their own pre-configured and isolated Airflow instance.

Following their completion of building CASFS+, Code Willing utilized their own program to solve problems with their own clients. To better understand how Code Willing is specifically leveraging Airflow in CASFS+, here are three different situations in which they leverage Airflow and CASFS+ to deliver data ETL solutions to their clients:

Data Processing

The data processing “Directed Acyclic Graphs” (DAGs) in the Code Willing CASFS+ environment are the core of their Airflow ETL pipelines. These DAGs are responsible for extracting, transforming and loading (ETL) raw client data into a processed, usable dataset.

Airflow assists in this process by monitoring the CASFS+ file system in real-time for the arrival of dependent files, and launches these tasks as soon as the files arrive. This ensures that Code Willing’s clients will always receive their time-sensitive production data as soon as possible.

Data Quality

Second, several data quality check DAGs are implemented to test the quality of the data from the Data Processing stage. These data quality checks aim to do several things, from checking the format of the ETL data from the Data Processing stage to ensuring that files in the ETL stage are arriving on time. In addition, Airflow has been configured to send both emails and Slack messages to their support team in the event that an alert, such as a failed data quality check or a late file, is raised.

System Health Check Alerts

Finally, Code Willing has also considered the possibility that the entire Airflow service could fail. If this happens, the data quality DAGs discussed above become obsolete, as the service can no longer run them to ensure data is arriving as expected. In response to this situation, Code Willing also utilizes Airflow’s external health checks by regularly monitoring the API endpoints for errors. Similarly to the data quality checks above, in the event that the Airflow service is deemed to be “down,” the Code Willing team will receive Slack and email notifications about the issue. 


Code Willing’s goal is to have a high level of expertise on most popular data engineering and analysis tools. Building this expertise allows Code Willing to effectively support clients when they are using well known third-party tools and infrastructure in CASFS+. In addition, leveraging these open-source tools allows Code Willing to be both flexible and effective in meeting its client’s needs. In particular, Apache Airflow has been instrumental in Code Willing’s ability to manage time-sensitive data ETL pipelines for its clients, so much so that it is now part of the CASFS+ Cloud Workspace suite of tools by default. 

Data Processing Study Reveals Quick, Economical Way to Process Big Data

Following their study of several Python DataFrame libraries, the Financial Tech company, Code Willing, determined the fastest and most cost-efficient libraries while using CASFS on AWS for big data processing were Ray and PySpark.

CASFS is the brainchild of Code Willing. Faced with time and cost constraints when processing enormous amounts of data, their data scientists needed something no one else had for their large-scale data analysis process. So, they built it, and CASFS was born.

In the study, the libraries used were Pandas, Polars, Dask, Ray, and PySpark.

The study included data processing with these five different libraries processing three separate dataset sizes:

  • 1 file – 2,524,365 rows x 20 columns
  • 10 files – 23,746,635 rows x 20 columns
  • 100 files – 241,313,625 rows x 20 columns

They processed the data on a r5.24xlarge machine with 96 cores and 768 GB of RAM. For the code processed on a cluster, they used 10 r5.2xlarge machines with a total of 80 cores and 640 GB of RAM.

Through these individual tests, they were able to discern that Ray and PySpark were the most useful to them in terms of performance and memory efficiency when used for big data processing.

While Ray and PySpark proved best for large amounts of data processing, Code Willing’s study also determined where the remaining libraries worked best in the data analysis process.

To see the more in-depth findings of the study, you can head over to their website and review the full report here

Link to Dalton’s whitepaper: https://www.codewilling.com/whitepapers/dataframe.html

The Pandemic and The Cloud

How the Pandemic Intensified Demand for Cloud Computing

The cloud becoming the dominant solution to our data storage needs was a destination we were heading towards before the Covid-19 breakout. But the ensued pandemic has fast-tracked that evolution in a way we didn’t dare to predict pre-2020. From organizations being forced to work from home to lockdown procedures that limited movement, the post-pandemic world has grasped the real risk of on-premise data solutions—and unconsolidated data solutions—with first-hand experience.

It wasn’t a coincidence investments in data center infrastructure showed 19% returns in the first half of 2020, a period in which returns of most industrial sectors plummeted. People in the industry have now understood the potential and necessity of cloud computing for a future of secure, accessible and available data storage.

In this post, we are going to explore how the pandemic has affected the industry of cloud computing and what it would mean for you as an institution looking to migrate to a cloud workspace of your own.

Embracing the Cloud Amid the Pandemic

Most industries, whether it was IT, healthcare, education or government, had to adapt to remote work and work from home environments in a matter of days given how suddenly the pandemic became widespread. Most of them turned to cloud computing to find the fastest possible solutions to virtually continue their operations.

It’s not a stretch to say that cloud computing became the thread our economy has been hanging by since the start of the pandemic. With its ability to provide fast solutions and relative ease of migration, cloud computing kept businesses and their operations afloat and allowed employees to safely work from home.

A survey conducted by Flexera during the first quarter of 2020 discovered 93% of enterprises have integrated multi-cloud storage and 87% integrated hybrid cloud strategy. Also, 59% of enterprises expected an increase in their cloud-related spendings compared to the previous year.

According to the Synergy Research Group, pandemic-induced cloud computing transformation has taken the revenue of the cloud infrastructure market to $129 billion from the previous year’s $97 billion. 

Data from the Synergy Research Group also show that enterprises spent over $37 billion on cloud infrastructure in Q4 2020, a 35% increase from the expenses in Q4 2019.

When we look at these numbers, it’s safe to say the pandemic has resulted in a cloud computing revolution around the world. Even now, when the vaccine rollout has begun the process of containing the virus, companies are pushing towards finding secure cloud storage and secure cloud workspace solutions with the lessons learned from this unanticipated and devastating situation.

With the emergence of new technologies like 5G and AI, cloud computing still has room to grow and more value to offer to its customers. Hosting your organization’s data, file systems, applications and software, and workflow in the cloud is an investment for the future of your company. With the fast spread of Covid-19, we have seen how quickly the means of the economic world can change and the challenges it could pose.

Challenges of Increased Demand in Cloud Computing

While we have seen many organizations embrace cloud computing during the pandemic, the increased demand and need for fast solutions paved the way for several challenges in the industry too.

The biggest concern organizations migrating to the cloud had to address was ensuring the security of data hosted in the cloud.

Even though major cloud service providers have secure cloud storage options, their advanced configuration processes could result in loopholes in the system. Because the pandemic prompted fast migration responses from organizations, it’s possible their security could have been compromised when setting up the system.

As people began working from home, securely accessing cloud workspaces required special security procedures, like VPNs. Setting up VPNs for employees scattered around a region also under lockdown proved to be a challenge to organizations migrating to the cloud.

Increased network demand, caused by increased cloud demand, put an extra strain on the cloud service providers when trying to maintain undisrupted services. We saw the impact of increased demand when a popular streaming service reduced its picture quality to keep the services available to everyone. In an attempt to find solutions to this problem, 2020 brought a boom in data center constructions across the US with the aim of providing sufficient infrastructure to the growing demand.

While cloud-based solutions have provided ease of maintenance and expansion, access to better resources and increased accessibility to organizations, some have also added a considerable increase in spending to their operational bills.

According to the survey conducted by Flexera, businesses reported their cloud-related spendings exceeded the budgeted amount by an average of 23%. And 47% expected their cloud spendings to increase in the next 12 months. As this data suggests, we can understand that the high costs of cloud workspaces are one of the factors hindering the growth of businesses, especially small businesses. Cloud computing service providers are aware of this hindrance and some have found solutions.

The cloud services industry is now evolving fast as it looks for strategies and solutions to face all of these challenges. New cloud service platforms are entering the market with solutions that add improvements to these areas, some platforms more effective than others. The pandemic-driven digital transformation in the industry is a sure sign that whatever shortcomings still prevailing in cloud computing are going to be resolved sooner than we expected.

Migrating to the Cloud During a Pandemic

With the uncertainty of the pandemic keeping the business environment off-kilter and the rapid growth of the cloud services industry at the moment, right now is the best time for your business to move away from on-premise storage facilities and embrace the ease provided by a cloud workspace.

Setting up a secure cloud storage with minimal time delay, lowest downtime and low cost is not an easy feat. But there is a solution to that too. 

The Lift and Shift solution of cloud migration with CASFS+ provides a transition that fulfills all of these needs. You will also be able to access the same on-premise applications you used prior to your shift to the cloud. The best thing about cloud migration with CASFS+ is that their lift and shift approach saves you from having to make platform-specific adjustments to applications.


The Covid-19 pandemic has accelerated the field of cloud computing to the point we are seeing a massive cloud revolution in the world today. Organizations are moving to secure cloud storage after realizing the reliability of on-premise storage solutions can change on a dime.
As we discussed in this article, the pandemic has significantly changed the outlook of the industry and created promising prospects for its improvement. There isn’t a better time for businesses to take the leap to the cloud than right now. And with secure cloud computing services, like CASFS+, able to achieve a smooth transition and allowing clients to start enjoying the benefits of the cloud right away, it really couldn’t be any easier.