Streaming Large Files Asynchronously using .NET 4.5

Recently I had a project that required me to be able to transfer large files via a service call to a remote server where the file would be ingested by an always-on application for further processing. When thinking about the requirements for such a service, a few things came to mind:

  1. The transfer had to be streaming, as attempting to buffer requests for files in excess of 100MB would be taxing on the host server.
  2. To maximize throughput, the requests need to be released as fast as possible so that new requests can be processed.
  3. The service should be as simple and quick to build as possible.

After a bit of research on the matter, I struck the jackpot with the HttpTaskAsyncHandler class and the recently introduced GetBufferlessInputStream() method and I soon got to work on mocking up the functionality.

HttpTaskAsyncHandler

The HttpTaskAsyncHandler is an implementation of the IHttpAsyncHandler interface introduced in .NET 4.0, but with the added benefit of the awaitable and async features introduced in .NET 4.5 that allow a much easier syntax for dealing with asynchronous processing over the Begin/End async pattern from the older interface. The only method that needs to be implemented is ProcessRequestAsync(HttpContext context). We can then use the await keyword to indicate to the handler that the work to be done will be longer running. Let’s look at the code:

public class StreamingFileHandler : HttpTaskAsyncHandler
{
    public override bool IsReusable
    {
        get { return true; }
    }

    public override async Task ProcessRequestAsync(HttpContext context)
    {
        //after this line, we depart the handling thread and continue work on a threadpool thread.
        var resultContent = await FileTransfer(context);

        //we pick up here on another thread to return to the client
        context.Response.ContentType = "text/plain";
        using (var writer = new StreamWriter(context.Response.OutputStream))
        {
            writer.Write(resultContent);
            writer.Flush();
        }
    }

    private Task FileTransfer(HttpContext context)
    {
        return Task.Factory.StartNew(() => TransferFile(context));
    }

    private string TransferFile(HttpContext context)
    {
        string tempFilePath = null;

        try
        {
            if (context.Request.ContentType != "application/xml" && context.Request.ContentType != "text/xml")
                throw new Exception("Content-Type must be either 'application/xml' or 'text/xml'.");

            tempFilePath = String.Format(@"c:outputnewfile{0}.txt", DateTime.Now);
            using (var reader = new StreamReader(context.Request.GetBufferlessInputStream(true)))
            using (var filestream = new FileStream(tempFilePath, FileMode.Create))
            using (var writer = new StreamWriter(filestream))
                writer.WriteLine(reader.ReadLineAsync().Result);

            return "Success";
        }
        catch (Exception ex)
        {
            context.Response.StatusCode = 500;
            return "A critical error occurred. Please try your request again: " + ex;
        }
    }
}

When the compiler sees the await keyword, it rewrites the method into the begin/end async pattern for us. Let’s take a look at the implementation of ProcessRequestAsync(). During execution, when the await TransferFile(context) is called, the currently executing thread will be returned immediately to the threadpool, and a worker thread will be spawned to do the work in TransferFile(). Once transferFile is complete, the system will allocate another thread to run the rest of the logic (in this case the writing of the return string to the response stream). This ensures that the threads handling requests will be available to process the maximum number of requests while the worker threads handle the disk I/O.

The handler can get wired up in the web.config as follows:

<system.webServer>
  <asp enableChunkedEncoding="true"/>
  <handlers>
    <add name="Stream Files" path="/postfile" type="Demonstration.StreamingFileHandler, Demonstration" verb="POST"/>
  </handlers>
</system.webServer>

GetBufferlessInputStream

Normally when accessing the Request.InputStream property of the HttpContext object, ASP.NET will only acquire the stream when the entire message body has been received. For cases where the message is large (like an extremely large file) that translates into the memory use of the process inflating like a balloon to the size of the file being transferred, and then being released once the processing has taken place. You can imagine that if the service is handling multiple large file requests concurrently this can quickly lead to the machine running out of memory to process further requests. In contrast, the Request.GetBufferlessInput() method will allow access to the stream immediately by the request as it starts to flow in. This allows full control over the processing of the input stream as well as a smaller memory footprint. Let’s take a look at the part of our handler example that will process the incoming stream:

private string TransferFile(HttpContext context)
{
    string tempFilePath = null;

    try
    {
        if (context.Request.ContentType != "application/xml" && context.Request.ContentType != "text/xml")
            throw new Exception("Content-Type must be either 'application/xml' or 'text/xml'.");

        tempFilePath = String.Format(@"c:outputnewfile{0}.txt", DateTime.Now);
        using (var reader = new StreamReader(context.Request.GetBufferlessInputStream(true)))
        using (var filestream = new FileStream(tempFilePath, FileMode.Create))
        using (var writer = new StreamWriter(filestream))
            writer.WriteLine(reader.ReadLineAsync().Result);

        return "Success";
    }
    catch (Exception ex)
    {
        context.Response.StatusCode = 500;
        return "A critical error occurred. Please try your request again: " + ex;
    }
}

In the example above, we wrap the GetBufferlessInputStream() input stream into a StreamReader so that we can control how much data we want to read/write. Here I’m calling the ReadLineAsync() method of the reader. The fact that we’re calling the Result property will in effect cause the read to be synchronous, but I’m using it here to demonstrate another of the benefits of using the async and awaitable features of the 4.5 framework.

Final Trimmings

There’s one last setting that needs to be taken care of to ensure that ASP.NET will allow access to the incoming stream as soon as possible. In the config file for our handler, we add the following:

<system.webServer>
  <asp enableChunkedEncoding="true"/>
</system.webServer>  

it may also be beneficial to override the ASP.NET maximum request size limit:

<system.web>
  <httpRuntime maxRequestLength="2097151"/>
</system.web>

That’s all there is to it. Happy streaming!

I am a software architect with over 13 years of experience developing scalable, enterprise level applications targeted for Windows and the Web. I am specialized in the latest Microsoft Development Technologies, with recent experience using C# 4.5, the .NET Framework 4.5 and SQL Server 2005/2008/2012.