Published on

AWS S3 Notes

Authors

S3 Bucket, Object, Encryption

S3

  • Object-based storage is spread across multiple devices and facilities. Storage that can be scaled indefinitely.

Bucket

  • File vault
  • It must have a globally unique name.
  • Defined at the Region level
  • There is a naming convention
    • no capital letters
    • not underscore
    • 3-63 characters
    • No IP
    • Must start with lower_case_letter or number

Object

  • Objects have keys.
  • key means full path
    • s3://my-bucket/prabhat.jpg
    • s3://my-bucket/my_folder1/another_folder/prabhat.jpg
  • key consists of prefix + object name
  • There is no concept of a directory. (The UI is made of keys like directories)
  • The object value is the body of the content.
    • Max Object Size : 5TB(5000GB)
    • If you are uploading an object larger than 5GB, you should use a multi-part upload
  • Metadata (system metadata)
  • Tags
  • Version ID

Versioning

  • Versioning of files inside S3 is possible.
  • Available at the bucket level
  • If you overwrite the same key, the version is incremented.
  • note
    • Protection from accidental deletion
    • Easy to roll back
    • Files uploaded with versioning off are version null.
    • When you remove versioning from a bucket, you don't get rid of the old version, it removes the versioning for the new file.
    • If you delete a file with versioning turned on, the Delete Marker will only be attached. (The previous version remains as is)

Object encryption

  • There are 4 encryption methods
    • SSE-S3: Encrypting objects using AWS-managed keys
    • SSE-KMS: AWS Key Management Service
    • SSE-C: Unique Encryption Key
    • Client Side Encryption: Encrypted by the client

SSE-S3

  • Server side encryption
  • Encryption is managed by Amazon s3.
  • Uses AES-256 encryption
  • "x-amz-server-side-encryption": "AES256" must be set in the header

SSE-KMS

  • Server side encryption
  • KMS handles encryption keys
  • Advantages of KMS: User Control + Tracking
  • "x-amz-server-side-encryption": "aws:kms" must be set in the header.

SSE-C

  • Encrypt with key managed by Customer
  • S3 does not store encryption key
  • HTTPS must be used unconditionally.

Client-side encryption

  • The client proceeds with encryption
  • Encrypted before sending to S3
  • Responsibility for decryption is also on the client side.

S3 Security, Websites, CORS

S3 Security

  • User based
    • IAM Policies: which API calls should be allowed
  • Resource based
    • Bucket Policies
    • Object Access Control List (ACL)
    • Bucket Access Control List (ACL)
  • Note: IAM policy can access s3 object
    • If the user has permission or the resource policy is ALLOW
    • If there is no explicit DENY

S3 Bucket Policies

  • JSON based
    • How to define policy via JSON
    • Fully defined buckets and objects
    • Allow / Deny

Security - Other

  • Networking
  • Logging and Audit
    • S3 Access Logs are stored in different buckets
    • Author of API calls to AWS CloudTrail
  • User Security
    • MFA Delete : Use MFA to delete
    • Pre-Signed URLs: URLs that are only valid for a limited time

S3 Websites

  • S3 can host a static website and can make it accessible from www
  • website URL is
    • bucket-name.s3-website-AWS-region.amazonaws.com
  • If 403 is displayed, you can set the bucket policy to allow for the public read.

S3 CORS

  • Cross Origin Resource Sharing
  • By default, web browsers allow only the same origin and block requests to hostnames of different origins.
  • Same origin
  • Another Origin (Cross Origin)
  • If the correct CORS header is not found, the web browser blocks the request.
  • CORS Header (ex: Access-Control-Allow-Origin) must be set.
  • When a client sends a cross-origin request to our s3 bucket, we need to enable the correct CORS header.

S3 Consistency Model

S3 Advanced 1

S3 MFA Delete

  • Regulate Behavior Using Multi-Factor Authentication (MFA)
  • You must enable versioning to use it.
  • If you need MFA
    • Permanently delete an object
    • Suspending bucket versioning
  • When MFA is not required
    • versioning enable
    • deleted version listing
  • Only the bucket owner can set it.
  • Not from the console, CLI only, root account only.

S3 Default Encryption vs Bucket policy

  • If you want to force encryption, you can set a bucket policy to prevent it if there is no encryption header.
  • Another way is to use the s3 default encryption option.
  • Bucket policy is computed before default encryption

S3 access log

  • To log all requests to access a bucket for auditing purposes.
  • Stored in another bucket
  • May be analyzed using Athena or other analysis tools.
  • Do not use the same monitoring bucket and logging bucket. (Logging loop - huge capacity)

S3 Replication

  • When you want to copy a bucket to a bucket in another region.
  • Versioning of source and destination must be enabled
  • Cross Region Replication (CRR)
  • Same Region Replication (SRR)
  • Buckets can be in different accounts.
  • Replication is asynchronous
  • Appropriate IAM permissions must be granted to s3
  • CRR: Regulatory Compliance, Faster Data Access Times
  • SRR: log aggregation, replication of production_test data
  • Only new objects after being activated are duplicated.
  • DELETE case
    • Delete marker can also be duplicated
    • Deleting a version is not replicated
  • No chain replication
    • When bucket 1 is replicated to bucket 2 and bucket 2 is replicated to bucket 3
    • Bucket 1 was not replicated in Bucket 3

S3 Presigned URLs

  • How to temporarily approve upload to S3 or download from S3

S3 Advanced 2

S3 storage class

  • Amazon S3 Standard - General Purpose
  • Amazon S3 Standard-Infrequent Access(IA)
  • Amazon S3 One Zone-Infrequent Access
  • Amazon S3 Intelligent Tiering
  • Amazon Glacier
  • Amazon Glacier Deep Archive
  • Amazon S3 Reduced Redundancy Storage (deprecated)

S3 Standard - General Purpose

  • Durability, no object loss, availability: 99.9999999%
  • Can withstand two dysfunctions at the same time
  • Very general use

S3 Standard - Infrequent Access(IA)

  • Data that is accessed infrequently but needs to be accessed quickly
  • availability
  • Less expensive than S3 Standard
  • For disaster recovery, backup, and storage of unused data

S3 One Zone - Infrequent Access(IA)

  • The IA function is the same, and it is stored in only one AZ.
  • Availability is a bit low, but latency is low and high throughput can be expected.
  • Data is lost when AZ is blown up
  • Supports encryption.
  • 20% cheaper than IA
  • Used to save backup files, image thumbnails, etc.

S3 Intelligent Tiering

  • Low latency, high performance same as S3 standard
  • There is a tiering fee along with a small monitoring fee every month.
  • Tiering is the storage of high performance / low performance according to usage.
  • Data movement occurs automatically between universal S3 and S3 IA

AWS Glacier

  • low-cost object storage
  • For backup / archiving
  • For storage for a very long period of time (10 years)
  • An alternative to magnetic tape storage
  • It's very cheap at $0.004 per GB, but it comes at a retrieval cost.
  • Each item is Archive called and can store up to 40TB.
  • Vaults Archives are stored in a safe called, not a bucket.

AWS Glacier & Glacier Deep Archive

  • 3 recovery options
    • Expedited (1~5)
    • Standard (3~5 hours)
    • Bulk (5-12 hours)
    • Minimum storage period: 90 days
  • Deep Archive: Storage for a longer period of time
    • Standard (12 hours)
    • Bulk (48 hours)
    • Minimum storage period: 180 days

S3 Lifecycle Rules

  • Objects can be moved between storage classes
  • Frequently accessed objects are STANDARD_IA
  • Objects that do not need real-time are GLACIER or DEEP_ARCHIVE
  • Although it is possible to move directly, it is also possible to move automatically using lifecycle constructs.

Lifecycle rules

  • Transition actions: define when an object will be moved to another storage class
    • Transfer to Standard IA class after 60 days
    • Move to Glacier class after 6 months
  • Expiration actions: Deleting an object after a period of time has elapsed.
    • Access logs can be set to be cleared after 365 days
    • Can be used to delete old versions of files
    • Can be used to clear incomplete multi-part uploads
  • Rules can be created with specific prefixes
  • Rules can be defined for specific tags

S3 Lifecycle Rule Scenario 1

  • When the application creates a user profile, it is uploaded to s3.
  • These thumbnails can be easily regenerated and need to be kept for 45 days.
  • Original images must be able to be restored immediately for 45 days, and after 45 days, you can wait up to 6 hours.

=> Keep the original S3 in standard and send it to GLACIER after 45 days. => Thumbnails are placed in ONEZONE_IA and deleted after 45 days (because re-creation is possible)

S3 Lifecycle Rule Scenario 2

  • Recover deleted s3 immediately for 15 days
  • Objects deleted for up to 1 year can be restored within 48 hours

=> S3 versioning => If it is not the current version, move it to S3_IA (restore immediately) => After 15 days, move it to DEEP_ARCHIVE (can be restored within 48 hours)

S3 Analysis - Storage Class Analysis

  • You can set up s3 analytics to decide when to send objects from standard to standard_ia.
  • Only works with standard -> standard_ia
  • It takes 24-48 hours to activate for the first time
  • Good for improving lifecycle rules

S3 - Baseline Performance

  • Amazon S3 automatically scales to handle a very large number of requests.
  • The latency is very short.
  • 3500 put/copy/post/delete, 5500 get per (second, prefix)
  • object path => prefix
    • bucket/folder1/sub1/file => /folder1/sub1/
    • bucket/folder1/sub2/file => /folder1/sub2/
    • bucket/1/file => /1/
    • bucket/2/file => /2/

S3 - KMS Restrictions

  • When using SSE-KMS, it is affected by KMS limit
  • When uploading, the KMS API called GenerateDataKey is called.
  • When downloading, the KMS API called Decrypt is called.
  • KMS has a quota limit per second
  • You can request a quota increase through the service quotas console.

S3 - Performance

  • Multipart Upload
    • 100 MB or more per file
    • Used for files larger than 5GB
    • Can be uploaded at the same time
  • S3 Transfer Acceleration
    • Move files to a nearby edge location
    • Faster transfers from edge locations to buckets over the private network
    • Compatible with split uploads

S3 Performance - S3 Byte-Range Fetches

  • When requesting a GET request, a specific object is divided into small byte units in a byte range.
  • request in parallel
  • Provides better recovery from failures (requests only partial data)