Azure Blob Storage

Azure Blob Storage is a service that provides object storage for the cloud. You can store unstructured data such as

  • Text files: word documents, spreadsheets, presentations, log files.
  • Binary files: audio, images, video, virtual disks.
Basically an storage account is organized using containers and blobs.
  • Storage Account: must be unique across all storage accounts in Azure. The reason behind this is that the URL to the storage account will include the name, for example for an storage account called abarrenecheademo the URL is https://abarrenecheademo.blob.core.windows.net
  • Blob Containers: It is like a root folder for your blobs, you can have multiple containers within a Storage Account. To access the blobs inside your container you can append the container name to the Storage Account URL, for a container called images the URL is https://abarrenechea.blob.core.windows.net/images
  • Blobs: Items in the container (text files, images, videos, pdfs). Azure Blob Storage doesn't support the concept of folders, it can be faked using a path in the name of the item (virtual directory) https://abarrenechea.blob.core.windows.net/images/icon.svg

Some of the key features of this services are:

  • You can store unstructure data such as files (logs, text files), and binary files (images, videos, pdf, virtual disks for VMs).
  • Access the container and items via REST API Over HTTP/HTTPS.
  • Access using Azure Portal, Azure Powershell/Azure CLI, Azure Storage Client Library (.NET, Java, Python, etc)
  • Data at rest is encrypted with Storage Service Encryption

Creating a Storage Account

An Storage Account can be created through the Azure Portal, the Azure CLI, and Azure Resource Manager (ARM) templates. When creating a Storage Account you must have the following properties in mind.

  • Resource Group
  • Storage account name: Must be unique
  • Location (Region)
  • Performance: Standard (Magnetic disk), or Premium (SSD)
  • Account Kind: StorageV2 (general purpose v2), Storage (general purpose v1), BlobStorage
  • Replication: Read-access geo-reundant storage (RA-GRS)

Authorize Requests to Blob Storage

There are four different ways of authorizing access to Blob Storage resources.

  • Shared Key (Storage Account Key): Can be used in connection strings, it is most common when accessing the data through code/SDK's.
  • Shared Access Signatures (SAS): Tokens that you append to URLs, provide fined grained control given that when generating SAS tokens you can specify the start and expiry date, IP addresses, and the protocol (HTTP/HTTPS).
  • Azure Active Directory
  • Anonymous public read access Can be enabled at the container level. By default this property is set to private, but you can enable blob access level, or container access level. The difference between the last two is that the container access level will allow getting a list of blobs in the container.

Blob types

Azure Blob Storage gives us three blob types, each of them handle different use cases.

  • Block Blob : Stores types like images, audio, pdfs, text files. Stored in separate blocks.
  • Append Blob : Stores types like log files
  • Page Blob : Optimized to read & write at any position, a page blob is a collection of 512-byte pages. Azure VM disks are backed by page blobs.

Storage Account Kinds

  • StorageV2 (General Purpose V2) This is the default, and is good for high throughput. When creating a Storage Account of StorageV2 kind you have the option of choosing between two performance types: Standard and Premium.
    • Standard Performance (Magnetic drives): Supports Blob (Blob, Append, Page), File, Queue, Table
    • Premium Performance (SSD): Blob (page)
  • BlockBlobStorage (Only Premium Performance supported) Supports Blob (Block, Append) Ideal when you have a lot of transaction with small objects
  • File Storage (Premium Performance) Supports File

Replication Strategies

The data in storage account is always replicated, Azure Blob Storage offers the following strategies to setup the durability of your data:

  • Locally Redundant Storage (LRS): Three copies in the same region (Example West Europe). Copies are created in a single data center which is also known as availability Zone or just Zone.
    There is an option called Geo-redundant storage (GRS): Async copy to a secondary region where tree copies are copied to a single data center (zone)
    Fail over (using the secondary region), usually takes an hour. Unless Read Access Geo-Redundant Storage (RA-GRS) is set, in that case you get a secondary URL to access it.
  • Zone Redundant Storage (ZRS): Distributes data acroos three data centers (zones) in the same region (not supported in all regions, region must have three data centers)
    There is an option called Geo-Zone-Redundant storage (GRS): Async copy to a secondary region where tree copies are copied to a single data center (zone)
    Fail over (using the secondary region), usually takes an hour. Unless Read Access GeoZone-Redundant Storage (RA-GZRS) is set, in that case you get a secondary URL to access it.
To access data from the secondary region you must append "-secondary" to the storage account name. For example: https://abarrenechea-secondary.blob.core.windows.net
  • StorageV2 (Standard Performance): LRS, GRS, RA-GRS (default), ZRS, GZRS, RS-GZRS
  • Premium Perfomance (BlockBlobStorage, FileStorage): LRS, ZRS

Interacting with Azure Blob Storage

To interact with Azure Blob Storage we can use an SDK (There are SDK's available for .NET, Java, JavasScript/TypeScript, Python, C++, and Android/IOS). You can read more about the SDK's in its GIT repository

The Azure SDK provides a collection of client libraries with the structure Azure.[service-categoru].[service-name]. In the case of .NET they are available as Nuget packages. This library contains classes such as:

  • BlobServiceClient: perform tasks related to the blob storage in the Storage Account such as getting the list of containers.
  • BlobContainerClient: perform taks related to the containers such as creating/deleting a container, getting the list of blobs.
  • BlobClient: Interact with blob items. For example read/write blobs.

Properties & Metadata

Properties & Metadata exist in Blob Container and Blob objects. Properties & Metadata are set and retrieved via HTTP headers (Either by the browser, if you load a blob item through the navigation bar or programatically using the SDK).

Blob Container has System Properties such as
  • Etag: String ID tha changes on every update
  • Last Modified
  • user-defined metadata string-based key value pairs
Blob Object
  • Etag: String ID tha changes on every update (read only)
  • Last Modified (read only)
  • Content type:
  • Content length:
  • x-ms-blob-type: block blob, append blob, page blob
  • user-defined metadata string-based key value pairs

Access tiers for block blobs

Access Tier can be set per Storage Account and inferred by Block Blob. Can be set to Hot or Cool Can also be set at Block Blob level.

  • Hot (default): For frequently accessed data. Low access and transacion cost. It has a high storage cost per GB
  • Cool: accessed data stored for at least 30 days
  • Archive: Can only be set for Blob objects (Not storage accounts or containers). stored for at least 180 days (you don't plan to acces in at least 180 days) High access and transacion cost Low storage cost per GB

Data Archiving & Data retention

By default, a storage container access tier is set to Hot. We have the option to set it to Cool while creating the account, Archive is only allowed for blob objects given that it will take the object offline and it will be innaccesible until it is rehydrated.

It is also possible to change the configuration through the following methods:

  • Programatically using the SDK
  • The command line using the CLI
  • Life Cycle Management (Rules): we can setup rules in the Azure Storage Account to move blobs to a different tier. For example, moving a blob object from Hot to Cool if it has not been accessed in the last 15 days.

Manage the blob lifecycle

Lifecycle Management allows us to define rules that can be set for that purpose. Example: If not accessed in over 15 days, move to cool tier. Apply only for objects in the container "samplecontainer/"

Soft Delete

"Data Protection" > "Turn on soft delete blobs" > then choose for how many days we want to keep deleted blobs Also works for Snapshots and Versions, this means that you can restore the blob with all its snapshots and versions.

Snapshot

It is a read-only copy of your blob at a specific point in time. If you want to save a copy, you can create manually from the portal using the option "Create Snapshot" If you want to access the snapshot, you can do it through the URL which includes a query parameter with the timestamp of the snapshot. (i.e. https://abarrenecheademo.blob.core.windows.net/images/augusto.jpe?snapshot=2024-04-13T10:00:00)

Versions

Store all the changes made Needs to be setup at the storage account level. Go to "Data Protection" > "Tracking"

Leases

It is a mechanism used to protect a blob from unwanted modifications or deletion. The "Acquire Lease" option in a blob gives you an ID, which will be required to change or delete the blob going forward. Even the delete option from the Azure portal is disabled when the object got a lease on it. Are powerful to implement pessimistic concurrency

Inmutable Blob Storage

There are two types of policies

  • Time-based retention: Allows you to specify a period for retention in days. You will not be able to modify or delete the blob during that period. After the period has expired you can not modify the blob but you can delete it.
  • Legal hold: Allows you to add tags which help you reming why a blob cannot be modified or deleted.

Copy items in Blob Storage

  • Azure CLI: Copy a blob or a container with all its blobs from one storage account to the other.
  • AzCopy: Copy from a local folder, copy blob from one container to the other, copy blobs from one container to the other, copy all containers from one account to the other
  • .NET client library