A little story of a system test running 24 times faster after 2 performance improvements

Kategori: Dev

Parallellize all

I am a self proclaimed fan of system tests. A system test is to me an automatic test which checks what’s in production and compares it to expected reality. It can be as simple as just downloading your web page via curl/wget and comparing the main title to what you have in your cms/database. If they don’t match you know you have a problem somewhere in your pipeline. It is about discovering inconsistencies before users report them. I like to call them «sleep well at night» tests. For this post I will talk about a system test we have that checks whether all the 5 video qualities of a given programme has the same basic filename (see explanation 1 at the bottom for details).

The test

The test was quite simple, it traverses the directory structure at Akamai and checks whether all the files in a given folder starts with the same guid. A typical structure looks something like this: /root/some/program/path/guidfilename_2250.mp4. Below is a code excerpt:

Benchmarking

First step to any performance problem is to establish a baseline. In our staging environment the test took:

baseline benchmark
baseline benchmark

I also did some profiling using Ants performance profiler and sure enough the time spent chatting with the server is the killer here. All calls to conn.DirectoryExist and conn.GetList each use about 60 ms and when there is thousands of folders to traverse it takes a lot of time.

Improvements

First improvement: Querying instead of traversing folders

Since we already have all the paths to all the files in our database I thought we could query the contents of these paths instead of traversing them. Below is the sql query for retrieving a substring of the Akamai path.

select distinct substr(files.cdn_folder_location, 0, (instr(files.cdn_folder_location, ‘/’, -1, 1)-1)) as path from files where file_path is not null;

Who knew Oracle had a decent set of functions built in?! The syntax is not much, but substringing and finding the last instance of «/» using instr was not that bad. So getting all the folders from the db allowed me to use Directory.GetList(path) and get the files for each folder. Remember we are only interested in the files in the last folder in the tree. Now the same test excerpt looks like this:

Time for a new benchmark:

After first improvement
After first improvement

Down to using 64% of the time, nice.

Second improvement: Parallelize

Since the test is checking 40 different Akamai net storages in prod the thought came to parallelize. The operations are completely self contained and don’t access shared resources. To parallelize I used Parallel.ForEach for the first time. Now i feel stupid for not using it before. It is brilliant, especially with the thread safe local variables.


Last benchmark:

After parallelizing
After parallelizing

Now we are getting somewhere.

In the staging environment where the benchmarking was done we only have 3 storages thus we can only parallelize 3 operations. In Prod though we can parallelize all 40. This led to a speed increase from around 12 hours down to 30 minutes. 24 times faster! Below is a graph from jenkins showing time spent running the test. You can probably see when the new test was introduced.

Build time trend
Build time trend

Takeaways

  1. Do not crawl ftp servers, it takes a lot of time. This is quite obvious because every call to conn.DirectoryExist or GetList has to go over the wire and when you have a lot of calls they add up. Instead of crawling it is more efficient to query the server directly. That way you only use GetList on the folders you actually need, not each one in the folder hierarchy.
  2. Oracle has a decent set of built in functions, maybe not as impressive or easy to use as functions in PostgreSQL, but they work.
  3. Parallellize all
    Parallellize all

I already knew that parallelization can give huge savings and I have been doing it more and more when working with F# code. The fact that it was so easy in C# also gives me hope for doing it more in that language as well.

 

 

 

1.
We use Akamai as our CDN to make programmes available at tv.nrk.no. We upload them via ftp to netstorages and they get distributed throughout their network and played back by the video player at our site and apps. We upload all our programmes in 5 qualities. For our player to know what to play there is a manifest file which is created with the basic filename plus the quality. For example e77054d2-d9a0-4399-af58-fbbb9b2c449c_,141,316,563,1266,2250,.mp4 where the filename base is a guid and the qualities (141, 316 etc) are listed.

It is vitaly important for us that these filenames are the same for each programme or else the programme won’t play.

2 kommentarer

Legg igjen en kommentar

Din e-postadresse vil ikke bli publisert. Obligatoriske felt er merket med *. Les vår personvernserklæring for informasjon om hvilke data vi lagrer om deg som kommenterer.