Streaming Large Files Asynchronously using .NET 4.5

Recently I had a project that required me to be able to transfer large files via a service call to a remote server where the file would be ingested by an always-on application for further processing. When thinking about the requirements for such a service, a few things came to mind:

  1. The transfer had to be streaming, as attempting to buffer requests for files in excess of 100MB would be taxing on the host server.
  2. To maximize throughput, the requests need to be released as fast as possible so that new requests can be processed.
  3. The service should be as simple and quick to build as possible.

After a bit of research on the matter, I struck the jackpot with the HttpTaskAsyncHandler class and the recently introduced GetBufferlessInputStream() method and I soon got to work on mocking up the functionality.

HttpTaskAsyncHandler

The HttpTaskAsyncHandler is an implementation of the IHttpAsyncHandler interface introduced in .NET 4.0, but with the added benefit of the awaitable and async features introduced in .NET 4.5 that allow a much easier syntax for dealing with asynchronous processing over the Begin/End async pattern from the older interface. The only method that needs to be implemented is ProcessRequestAsync(HttpContext context). We can then use the await keyword to indicate to the handler that the work to be done will be longer running. Let’s look at the code:

public class StreamingFileHandler : HttpTaskAsyncHandler
{
    public override bool IsReusable
    {
        get { return true; }
    }

   public override async Task ProcessRequestAsync(HttpContext context)
   {
       //after this line, we depart the handling thread and continue work on a threadpool thread.
       var result = await TransferFileAsync(context);

       //we pick up here on another thread to return to the client
       context.Response.ContentType = "text/plain";
       using (var writer = new StreamWriter(context.Response.OutputStream))
       {
           writer.Write(result ? "Success" : "Fail");
           writer.Flush();
       }
   }

   private async Task TransferFileAsync(HttpContext context)
   {
       string tempFilePath = null;

       try
       {
           if (context.Request.ContentType != "application/xml" && context.Request.ContentType != "text/xml")
               throw new Exception("Content-Type must be either 'application/xml' or 'text/xml'.");

           tempFilePath = String.Format(@"c:outputnewfile{0}.txt", DateTime.Now);
           using (var reader = new StreamReader(context.Request.GetBufferlessInputStream(true)))
           using (var filestream = new FileStream(tempFilePath, FileMode.Create, FileAccess.Write, FileShare.Read, 4096, true))
           using (var writer = new StreamWriter(filestream))
           {
               var dataToWrite = await reader.ReadToEndAsync();
               await writer.WriteAsync(dataToWrite);

               return true;
           }
       }
       catch (Exception)
       {
           context.Response.StatusCode = 500;
           return false;
       }
   }
 }

When the compiler sees the await keyword, it rewrites the method into the begin/end async pattern for us. Let’s take a look at the implementation of ProcessRequestAsync(). During execution, when the await TransferFileAsync(context) is called, each await that calls into the ReadAsync or WriteAsync methods suspends the execution of the TransferFileAsync method and returns the thread to the pool to process other requests. When the async operation completes, .NET requisitions another thread from the pool to resume the method right where it left off. This ensures that the threads handling requests will be available to process the maximum number of requests while the worker threads handle the disk I/O. Even better, if you’re current execution environment has a SynchronizationContext, that is captured and used to marshal to the same thread for the callback (in case you call await from the UI thread of a forms app, for example).

The handler can get wired up in the web.config as follows:

<system.webServer>
  <asp enableChunkedEncoding="true"/>
  <handlers>
    <add name="Stream Files" path="/postfile" type="Demonstration.StreamingFileHandler, Demonstration" verb="POST"/>
  </handlers>
</system.webServer>

GetBufferlessInputStream

Normally when accessing the Request.InputStream property of the HttpContext object, ASP.NET will only acquire the stream when the entire message body has been received. For cases where the message is large (like an extremely large file) that translates into the memory use of the process inflating like a balloon to the size of the file being transferred, and then being released once the processing has taken place. You can imagine that if the service is handling multiple large file requests concurrently this can quickly lead to the machine running out of memory to process further requests. In contrast, the Request.GetBufferlessInput() method will allow access to the stream immediately by the request as it starts to flow in. This allows full control over the processing of the input stream as well as a smaller memory footprint. Let’s take a look at the part of our handler example that will process the incoming stream:

private async Task TransferFileAsync(HttpContext context)
{
    string tempFilePath = null;

    try
    {
        if (context.Request.ContentType != "application/xml" && context.Request.ContentType != "text/xml")
            throw new Exception("Content-Type must be either 'application/xml' or 'text/xml'.");

        tempFilePath = String.Format(@"c:outputnewfile{0}.txt", DateTime.Now);
        using (var reader = new StreamReader(context.Request.GetBufferlessInputStream(true)))
        using (var filestream = new FileStream(tempFilePath, FileMode.Create, FileAccess.Write, FileShare.Read, 4096, true))
        using (var writer = new StreamWriter(filestream))
        {
            var dataToWrite = await reader.ReadToEndAsync();
            await writer.WriteAsync(dataToWrite);

            return true;
        }
    }
    catch (Exception)
    {
        context.Response.StatusCode = 500;
        return false;
    }
}

In the example above, we wrap the GetBufferlessInputStream() input stream into a StreamReader so that we can control how much data we want to read/write. Here I’m calling the ReadToEndAsync() method of the reader. keep in mind that it’s horrible practice to read an entire file into memory, but I’m simply trying to demonstrate the benefits of using the async and awaitable features of the 4.5 framework.

Final Trimmings

There’s one last setting that needs to be taken care of to ensure that ASP.NET will allow access to the incoming stream as soon as possible. In the config file for our handler, we add the following:

<system.webServer>
  <asp enableChunkedEncoding="true"/>
</system.webServer>  

it may also be beneficial to override the ASP.NET maximum request size limit:

<system.web>
  <httpRuntime maxRequestLength="2097151"/>
</system.web>

That’s all there is to it. Happy streaming!

I am a software architect with over 13 years of experience developing scalable, enterprise level applications targeted for Windows and the Web. I am specialized in the latest Microsoft Development Technologies, with recent experience using C# 4.5, the .NET Framework 4.5 and SQL Server 2005/2008/2012.

  • Dave

    Update: I’ve edited the code example to
    1.) Compile!
    2.) Give a better illustration of how to efficiently make use of async without offloading to other worker threads.
    3.) Use the correct filestream constructor so writing will actually work async.

    Thanks Royi for calling these out.

  • Royi Namir

    Heck this wont even compile at [ var resultContent = await FileTransfer(context);]

    • Dave Marini

      You’re right it doesn’t.. I apologize for the confusion and the delay in my response. This example was quickly (and poorly) pulled from a larger example. I’m going to edit the post to contain a better working model. It also addresses your last post. You are correct that the TaskFactory.StartNew is unnecessary, what you didn’t catch is my more damning mistake that the filestream class (although being updated with the Async methods) still has to be constructed with useAsync = true, which is something oft forgotten when I work with streams in async :)

  • Royi Namir

    In your example – you’re not gaining nothing. When the controls reaches [ var resultContent = await FileTransfer(context); ] — it enteres [ private Task FileTransfer(HttpContext context)] which calls [return Task.Factory.StartNew(() => TransferFile(context));] . now – transferFile does use async-IO-operation which is fine and the main thread does return back to the thread pool. – BUT(!) you did use another threadpool for [Task.Factory.StartNew(() => TransferFile(context));] – so you basically don’t scale. ( Task.factory – uses thread pool by default. ( unless Longrunning is mentioned),.