How to Use the Blob Storage Service from Python
This guide will show you how to perform common scenarios using the Windows Azure Blob storage service. The samples are written using the Python API. The scenarios covered include uploading, listing, downloading, and deleting blobs. For more information on blobs, see the Next Steps section.
Table of Contents
What is Blob Storage?
Concepts
Create a Windows Azure Storage Account
How To: Create a Container
How To: Upload a Blob into a Container
How To: List the Blobs in a Container
How To: Download Blobs
How To: Delete a Blob
How To: Upload and Download Large Blobs
Next Steps
What is Blob Storage
Windows Azure Blob storage is a service for storing large amounts of unstructured data that can be accessed from anywhere in the world via HTTP or HTTPS. A single blob can be hundreds of gigabytes in size, and a single storage account can contain up to 100TB of blobs. Common uses of Blob storage include:
- Serving images or documents directly to a browser
- Storing files for distributed access
- Streaming video and audio
- Performing secure backup and disaster recovery
- Storing data for analysis by an on-premises or Windows Azure-hosted service
You can use Blob storage to expose data publicly to the world or privately for internal application storage.
Concepts
The Blob service contains the following components:
-
Storage Account: All access to Windows Azure Storage is done through a storage account. This is the highest level of the namespace for accessing blobs. An account can contain an unlimited number of containers, as long as their total size is under 100TB.
-
Container: A container provides a grouping of a set of blobs. All blobs must be in a container. An account can contain an unlimited number of containers. A container can store an unlimited number of blobs.
-
Blob: A file of any type and size. There are two types of blobs that can be stored in Windows Azure Storage: block and page blobs. Most files are block blobs. A single block blob can be up to 200GB in size. This tutorial uses block blobs. Page blobs, another blob type, can be up to 1TB in size, and are more efficient when ranges of bytes in a file are modified frequently. For more information about blobs, see Understanding Block Blobs and Page Blobs.
-
URL format: Blobs are addressable using the following URL format:
http://<storage
account>.blob.core.windows.net/<container>/<blob>
The following URL could be used to address one of the blobs in the diagram above:
http://sally.blob.core.windows.net/movies/MOV1.AVI
Create a Windows Azure Storage Account
To use storage operations, you need a Windows Azure storage account. You can create a storage account by following these steps. (You can also create a storage account using the REST API.)
-
Log into the Windows Azure Management Portal.
-
At the bottom of the navigation pane, click NEW.
-
Click DATA SERVICES, then STORAGE, and then click QUICK CREATE.
-
In URL, type a subdomain name to use in the URI for the storage account. The entry can contain from 3-24 lowercase letters and numbers. This value becomes the host name within the URI that is used to address Blob, Queue, or Table resources for the subscription.
-
Choose a Region/Affinity Group in which to locate the storage. If you will be using storage from your Windows Azure application, select the same region where you will deploy your application.
-
Optionally, you can enable geo-replication.
-
Click CREATE STORAGE ACCOUNT.
How to: Create a Container
Note: If you need to install Python or the Client Libraries, please see the Python Installation Guide.
The BlobService object lets you work with containers and blobs. The following code creates a BlobService object. Add the following near the top of any Python file in which you wish to programmatically access Windows Azure Storage:
from azure.storage import *
The following code creates a BlobService object using the storage account name and account key. Replace 'myaccount' and 'mykey' with the real account and key.
blob_service = BlobService(account_name='myaccount', account_key='mykey')
All storage blobs reside in a container. You can use a BlobService object to create the container if it doesn't exist:
blob_service.create_container('mycontainer') By default, the new container is private, so you must specify your storage access key (as you did above) to download blobs from this container. If you want to make the files within the container available to everyone, you can create the container and pass the public access level using the following code:
blob_service.create_container('mycontainer', x_ms_blob_public_access='container') Alternatively, you can modify a container after you have created it using the following code:
blob_service.set_container_acl('mycontainer', x_ms_blob_public_access='container') After this change, anyone on the Internet can see blobs in a public container, but only you can modify or delete them.
How to: Upload a Blob into a Container
To upload a file to a blob, use the put_blob method to create the blob, using a file stream as the contents of the blob. First, create a file called task1.txt (arbitrary content is fine) and store it in the same directory as your Python file.
myblob = open(r'task1.txt', 'r').read()
blob_service.put_blob('mycontainer', 'myblob', myblob, x_ms_blob_type='BlockBlob') How to: List the Blobs in a Container
To list the blobs in a container, use the list_blobs method with a for loop to display the name of each blob in the container. The following code outputs the name and url of each blob in a container to the console.
blobs = blob_service.list_blobs('mycontainer')
for blob in blobs:
print(blob.name)
print(blob.url) How to: Download Blobs
To download blobs, use the get_blob method to transfer the blob contents to a stream object that you can then persist to a local file.
blob = blob_service.get_blob('mycontainer', 'myblob')
with open(r'out-task1.txt', 'w') as f:
f.write(blob) How to: Delete a Blob
Finally, to delete a blob, call delete_blob.
blob_service.delete_blob('mycontainer', 'myblob') How to: Upload and Download Large Blobs
The maximum size for a block blob is 200 GB. For blobs smaller than 64 MB, the blob can be uploaded or downloaded using a single call to put_blob or get_blob, as shown previously. For blobs larger than 64 MB, the blob needs to be uploaded or downloaded in blocks of 4 MB or smaller.
The following code shows examples of functions to upload or download block blobs of any size.
import base64
chunk_size = 4 * 1024 * 1024
def upload(blob_service, container_name, blob_name, file_path):
blob_service.create_container(container_name, None, None, False)
blob_service.put_blob(container_name, blob_name, '', 'BlockBlob')
block_ids = []
index = 0
with open(file_path, 'rb') as f:
while True:
data = f.read(chunk_size)
if data:
length = len(data)
block_id = base64.b64encode(str(index))
blob_service.put_block(container_name, blob_name, data, block_id)
block_ids.append(block_id)
index += 1
else:
break
blob_service.put_block_list(container_name, blob_name, block_ids)
def download(blob_service, container_name, blob_name, file_path):
props = blob_service.get_blob_properties(container_name, blob_name)
blob_size = int(props['content-length'])
index = 0
with open(file_path, 'wb') as f:
while index < blob_size:
chunk_range = 'bytes={}-{}'.format(index, index + chunk_size - 1)
data = blob_service.get_blob(container_name, blob_name, x_ms_range=chunk_range)
length = len(data)
index += length
if length > 0:
f.write(data)
if length < chunk_size:
break
else:
break If you need blobs larger than 200 GB, you can use a page blob instead of a block blob. The maximum size of a page blob is 1 TB, with pages that align to 512-byte page boundaries. Use put_blob to create a page blob, put_page to write to it, and get_blob to read from it.
Next Steps
Now that you’ve learned the basics of blob storage, follow these links to learn how to do more complex storage tasks.