Why should we think about hardware in Data Science? | by Altimetrik Poland Tech Blog | Medium

Why should we think about hardware in Data Science?
High-Performance Computing (HPC) in Data Science nowadays (Part 2)
Altimetrik Poland Tech Blog
·Follow
6 min read·
Aug 18, 2022
--
Series index:
1. What is Data Science nowadays?
2. Why should we think about hardware in Data Science?
3. What are the elements of High Performance Computing (HPC) nowadays?
4. Where is the synergy between HPC and Data Science? — publication coming soon
5. Project using big clusters or supercomputers — publication coming soon
IntroductionNowadays, Data Science stands in great synergy with hardware, as I mentioned in the first part of the article. In that part, I explained the specifications of large-scale Data Science. I highlighted the importance of numerical algorithms, data volume and computational power. Let’s keep these things in mind while reading this part, because all of this will be connected.
In this section, I will expand on such topics as:
How does Data Science use hardware?
Why is choosing the right type of hardware crucial?
What are the hardware trends in Data Science?
These are important questions from an engineering and business perspective. I will also try to discuss each of the questions listed above.
How does Data Science use hardware?Data Science mainly needs hardware to perform calculations. From this, it follows that we often simply need the fastest possible computing machine. We need the so-called TFLOPS. What is TFLOPS? By definition, FLOPS is a unit of measurement for the number of floating-point operations per second. The letter “T” here corresponds to the prefix term, which translates to 1012. Currently, the best Nvidia A100 GPU offer about 312 TFLOPS performance. The next generation of Nvidia GPU (Hopper architecture) will be more efficient than the Ampere series. The conclusion is that the more TFLOPS, the better for us. 
The next aspect is the Video RAM (VRAM) on GPU. This is the second key thing, because we need it to store model parameters. In my opinion, VRAM is more important because when we don’t have enough memory, we can’t store all the model parameters on the GPU and sometimes the training process is impossible. So it can be concluded that it is better to have enough VRAM than more TFLOPS. TFLOPS only determines the time consumption for the model training process.
Why is choosing the right type of hardware crucial?This is the most important question in this article. We have several correct answers depending on the perspective. In my opinion, the two most important perspectives are engineering and business benefits. Data Science projects are projects that clash very strongly between the technical world and the business world. Business requirements very often require specific technical solutions, such as the right algorithm and the appropriate solution bandwidth. In such a situation, you often have to look for the golden mean between the satisfaction of the business team and the technical team. What is mean? Each team often has different expectations. From a business perspective, I think it will mainly be cost reduction and speed of implementation. This is often a natural approach, as companies want to reduce costs and maximize financial income. On the other hand, the technical part of the team wants the solution to be as effecient and technologically designed as possible. These are somewhat opposite goals, since a technically better solution generates higher costs to begin with. All right, but is there an intersection point to minimize costs on both sides somewhere? The answer is yes, and in hardware! Let’s imagine a situation where our solution is already running in production. At this point, it will be important to determine the cost reduction and scalability of the product, as we anticipate that traffic will grow. We can achieve these goals mainly by choosing the right hardware. The right hardware gives us a permanent cost reduction, as we will need less hardware to properly scale the product. Nowadays, cloud solutions are very up-to-date, but it is not the cheapest solution if we need to rent a large number of computing machines, such as those with GPU. Reducing the number of these machines is important and strongly related to a correct hardware optimization program. We need to design software so as to maximize the hardware utilization. In this approach, we get the best performance-to-cost ratio. 
 In a previous article, I gave an example of training a large language model (LLM). The model was named Bloom. Yes, at the moment the training of this model has come to an end and lasted 117 days (March 11 — July 6 of 2022). The training took place on 384 NVIDIA A100 80GB GPUs on the Jean Zay Supercomputer in France. Well, but what am I getting at? Let’s look at the task from the perspective of cost and preparation time. About 1,000 scientists from more than 70 countries and more than 250 institutions worked on this model. It took them about a year to prepare the program and dataset to train the model. Of course, this project is open source and no one had to pay everyone a salary. It was mostly charity work, but do you see how expensive it can be from a human resources perspective? However, back to the topic of hardware, let’s estimate the money needed to train this model from the hardware perspective. Let’s take Amazon cloud (AWS) pricing as a reference point. Let’s make assumptions:
1. We rent EC2 P4 Instances — instance with 8 Nvidia A100 80GB 
2. We need 384 GPUs Nvidia A100 80B 
3. The training time is 117 days 
4. We use Microsoft DeepSpeed library to scale training process
Visualization of AWS cluster building from EC2 P4 Instances
 Source: (https://aws.amazon.com/ec2/instance-types/p4/)With these assumptions, we can proceed to estimate the cost and money needed. EC2 P4 instances with 8 Nvidia A100s cost $40,96 per hour (data as of 18.08.2022). How many instances do we need? 384 GPUs/8 GPUs = 48 instances. How many training hours do we need? 117 days * 24 hours = 2808 hours of training. So, we have all the information to estimate the final cost. The final estimated cost is 48 instances * 2808 hours * $40,96 per hour ~= $5,520,752. Yes, the estimated cost is $5,52 million. In my opinion, a lot of money for one model, an experiment. In addition, remember as I wrote in the previous article that the appropriate hardware optimization has already been applied and we still need more than $3 million. Believe me that without these hardware optimizations it would have been much more expensive. This example shows how expensive large scale Data Science is, and we must not forget the proper selection of hardware and optimization of software under it. What do I want to say more? Believe me that this is not the most expensive model or experiment ever. There are already been experiments worth tens of millions of dollars, for example, Google AlphaGo Zero model.
Source: https://bigscience.huggingface.co/blog/bloomWhat are the hardware trends in Data Science?This paragraph will be a small introduction to the next article. In Data Science, we can observe some trends. Looking globally, it seems to that models are getting bigger, more complex and need more and more data. Algorithms are becoming more and more parallel, which forces certain things on the hardware. These things affect the hardware designed for DS. The conclusion is, we need more and more efficient hardware, both in terms of computing performance and energy consumption. Recently, a number of hardware architectures have been developed that try to meet these requirements. I will write about them in the next part of the article.
SummaryWhat did we learn in this part of the series?1. We understand how Data Science uses hardware — what is important from a Data Science perspective in hardware.
2. What is TFLOPS — the most important unit of measurement of computing performance.
3. We understood why the choice of hardware is important from a business and technical perspective.
4. We have gained an insight about the costs of modern large scale Data Science experiments.
5. We have gained the insight that optimization is very important in projects of this scale, as it significantly reduces the already high costs of the experiment.
6. We realized that dozens of people can work on such projects.
7. We understood what are the main hardware trends in Data Science.
Stay tuned for the next parts of the series!
Words by Patryk Binkowski, Data Scientist at Altimetrik Poland
Patryk Binkowski — Solution Architect/Technical Leader — Altimetrik Poland | LinkedInI am looking for research internships/work (industry, universities, science institutions) preferably in Europe (e.g…
www.linkedin.com
Copywriting by Kinga Kuśnierz, Content Writer at Altimetrik Poland https://www.linkedin.com/in/kingakusnierz/
--
--
Written by Altimetrik Poland Tech Blog162 Followers
·25 Following
This is a Technical Blog of Altimetrik Poland team. We focus on subjects like: Java, Data, Mobile, Blockchain and Recruitment. Waiting for your feedback!
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams