Skip to content Skip to sidebar Skip to footer

Download Pdf From Url In Swift

Apple has a sophisticated caching system in iOS, which is enabled by default. However, documentation around URLCache is quite sparse. Today, we'll look at the behavior of caching when dealing with large files.

Motivation, aka Why Not Build Your Own Cache?

Why even bother having a cache? After all, once a file is downloaded, it's stored locally and can be accessed offline. However, caching logic helps us solve another hard problem: detecting if a file changes on the server.

Getting this correct is not trivial. There are a variety of HTTP headers that can help verify if a file needs to be downloaded again: Expires, ETag, Last-Modified, and If-Unmodified-Since. They are also functionally different. For example, while ETag makes it possible to reliably compare fingerprints, Last-Modified requires more guesswork/heuristics. The two can also be used together.

Apple spent years building logic that correctly takes all of these details into account, so let's try to reuse what the system offers.

ℹ️ Note: Since we're only using Foundation objects in this article, everything applies to all Apple platforms — from iOS to macOS to tvOS and watchOS. All code here is written in Swift 5, but everything can be used equally in Objective-C, where objects are prefixed (e.g. NSURLCache or NSURLSession).

Caching isn't magic. The system needs a hint from the server as to how long a file should be cached or how to verify it. If your server doesn't emit any hints for the caching system, files won't be cached. Here's a simple way to see that HTTP headers are using curl:

curl -i https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf  HTTP/1.1 200 OK Date: Wed, 06 Nov 2020 10:24:34 GMT Server: Apache Strict-Transport-Security: max-age=63072000; includeSubdomains; preload X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN Last-Modified: Sun, 19 May 2002 14:49:00 GMT Accept-Ranges: bytes Content-Length: 20702285 Content-Type: application/pdf

In this example, Last-Modified is set so caching will work with an algorithm based on how old the modified date is. This "freshness lifetime" algorithm is defined in RFC2616 and is 10 percent of the current age (proof).

([Sun, 19 May 2002 14:49:00 GMT] - [Fri, 6 Nov 2020 12:00:00 GMT]) * 10% 6,745 days, 19 hours, 35 minutes, and 34 seconds * 10%

= The cache for the above file is valid for ~674 days.

Cache invalidation works differently, depending on the tags used. If only ETag is set, the system must query the server every time (refer to the decision tree from Apple below):

NSURLRequestUseProtocolCachePolicy decision tree for HTTP and HTTPS. Copyright Apple.com

If you'd like to learn more about HTTP caching headers, Google has a great resource at web.dev.

Downloading Large Files on iOS and macOS

There are many ways files can be downloaded on iOS, however, the modern approach is using URLSession with either a dataTask or a downloadTask. The main difference is storage. Data tasks return the data directly, while download tasks return a file URL. The returned file needs to be copied to a local destination in the completion handler to remain accessible.

Let's look at a complete example that downloads the Apollo 11 Flight Plan into the Documents directory of the current application (error handling is omitted for the sake of brevity):

                    let                    remoteURL                    =                    URL                    (                    string                    :                    "https://www.nasa.gov/specials/apollo50th/pdf/a11final-fltpln.pdf"                    )                    !                    let                    documentURL                    =                    FileManager                    .                    default                    .                    urls                    (                    for                    :                    .                    documentDirectory                    ,                    in                    :                    .                    userDomainMask                    )[                    0                    ]                    let                    targetURL                    =                    documentURL                    .                    appendingPathComponent                    (                    remoteURL                    .                    lastPathComponent                    )                    let                    downloadTask                    =                    URLSession                    .                    shared                    .                    downloadTask                    (                    with                    :                    remoteURL                    )                    {                    url                    ,                    response                    ,                    error                    in                    guard                    let                    tempURL                    =                    url                    else                    {                    return                    }                    _                    =                    try                    ?                    FileManager                    .                    default                    .                    replaceItemAt                    (                    targetURL                    ,                    withItemAt                    :                    tempURL                    )                    }                    downloadTask                    .                    resume                    ()                  

This will download the file correctly, but it'll likely not use the cache in a way you'd expect.

Caching Download Tasks

Download tasks support caching via either the default URLCache or a custom URLCache. However, there are a few details you need to know about:

  1. Background Downloading — A download task can run in the background if the background URLSession is used. In this scenario, the download is managed by a system daemon which has no access to the app-local cache.

  2. Manual Storage — While download tasks will query the cache, they — unlike data tasks — don't automatically store the result in the cache.

In order to store the result of a download task, you need to manually call storeCachedResponse on the cache:

                    let                    req                    =                    URLRequest                    (                    url                    :                    remoteURL                    )                    let                    downloadTask                    =                    URLSession                    .                    shared                    .                    downloadTask                    (                    with                    :                    req                    )                    {                    url                    ,                    response                    ,                    error                    in                    print                    (                    "Download Task complete."                    )                    if                    let                    response                    =                    response                    ,                    let                    url                    =                    url                    ,                    cache                    .                    cachedResponse                    (                    for                    :                    req                    )                    ==                    nil                    ,                    let                    data                    =                    try                    ?                    Data                    (                    contentsOf                    :                    url                    )                    {                    cache                    .                    storeCachedResponse                    (                    CachedURLResponse                    (                    response                    :                    response                    ,                    data                    :                    data                    ),                    for                    :                    req                    )                    }                    }                  

The URLCache class has been thread-safe since iOS 8. Things weren't so great in earlier releases.

Verifying URLCache

Each app has a default, sandboxed cache that lives under <APP_ROOT>/Library/Caches/<APP_BUNDLE_ID>. The default size isn't documented, but it can be queried easily. This has been tested on macOS Big Sur via Catalyst and might be different depending on the device:

(lldb) p (int)[[NSURLCache sharedURLCache] diskCapacity] (int) $0 = 20000000 // ~19 MB (lldb) p (int)[[NSURLCache sharedURLCache] memoryCapacity] (int) $1 = 512000   // ~500 KB

To effectively use URLCache for file downloads, it needs to be much bigger. We can be quite bold with size requests, as iOS will clean up automatically as needed:

In iOS, the on-disk cache may be purged when the system runs low on disk space, but only when your app is not running (Source: Apple Documentation).

On disk, the cache is a regular SQLite database named Cache.db, including the -shm and -wal files SQLite uses to improve performance. Binary files aren't stored in SQLite but in a separate folder named fsCachedData. The data stored here isn't processed; it's the same data you downloaded. In our case, we can open the PDF by simply renaming and opening 49E5D82A-5749-4094-A934-5D61B767CBF0.

NSURLCache Folder

Creating a Custom URLCache

The code block below will set up a cache with ~10 MB memory and ~1 GB disk cache space. We use the Caches directory because it isn't backed up in iCloud, and we certainly don't want a cache to be backed up:

                    let                    cachesURL                    =                    FileManager                    .                    default                    .                    urls                    (                    for                    :                    .                    cachesDirectory                    ,                    in                    :                    .                    userDomainMask                    )[                    0                    ]                    let                    diskCacheURL                    =                    cachesURL                    .                    appendingPathComponent                    (                    "DownloadCache"                    )                    let                    cache                    =                    URLCache                    (                    memoryCapacity                    :                    10_000_000                    ,                    diskCapacity                    :                    1_000_000_000                    ,                    directory                    :                    diskCacheURL                    )                  

There is anecdotal evidence that the cache rejects files if they're more than 10 percent in size of the total cache size. So you really want to pick a generous size for the cache to make it work reliably if you're dealing with large files like we are in this example. The exact numbers don't seem to be documented.

Next, we can control which cache should be used on a per-request basis. Instead of using URLSession.shared, we use a custom session object:

                    let                    config                    =                    URLSessionConfiguration                    .                    default                    config                    .                    urlCache                    =                    cache                    let                    session                    =                    URLSession                    (                    configuration                    :                    config                    )                  

That's all that's needed! Your magical new disk cache is ready to go.

Accessing the Cache Offline

Using the cache can be controlled via the cachePolicy setting on URLRequest. The default is .useProtocolCachePolicy, which usually does the right thing — including returning a cached copy if the content is new enough. Depending on your content, you might want to use .returnCacheDataElseLoad in the offline case instead:

let req = URLRequest(url: flightPlanURL, cachePolicy: .returnCacheDataElseLoad)

ℹ️ Note: Depending on the cache rules, it will still load files that have already been deleted remotely, as the cache won't always hit the network.

Conclusion

Using URLCache for large files is straightforward once one is aware of the gotchas around file downloading. Refer to this gist to see how all the snippets fit together to build reliable offline caching for downloaded files. We hope this helps some of you reuse Apple's caching instead of rolling your own.

Free 60-Day Trial Try PSPDFKit in your app today.

Free Trial

Source: https://pspdfkit.com/blog/2020/downloading-large-files-with-urlsession/

Posted by: dinorahdinorahhaniblee0277851.blogspot.com

Post a Comment for "Download Pdf From Url In Swift"