AWS just dropped one of the most impactful storage announcements in years: Amazon S3 Files — a feature that lets you mount S3 buckets as fully-featured file systems on any compute resource. No data duplication, no syncing pipelines, no code changes. Your existing file-based tools, AI agents, and ML pipelines can now work directly with S3 data using standard file system operations.
This is a game-changer for anyone who has ever struggled with the gap between object storage and file-based workflows. Let us break down what S3 Files is, how it works, and how you can start using it today.
The Problem S3 Files Solves
Amazon S3 has been the gold standard for cloud object storage — durable, scalable, and cost-effective. But there has always been a fundamental friction: file-based applications cannot work with S3 directly.
If you had an ML training pipeline, a log processing script, or an AI agent that needed to read and write files, you had to:
- Set up a separate file system (EFS, FSx, or local disk)
- Copy data from S3 to the file system
- Process it
- Copy results back to S3
- Build sync pipelines to keep everything consistent
This meant duplicated data, higher costs, complex pipelines, and stale copies. S3 Files eliminates all of this.
What Is Amazon S3 Files?
S3 Files creates a file system view of your S3 bucket that you can mount on any EC2 instance, ECS container, Lambda function, or EKS pod. It is built on Amazon EFS technology but connects directly to your S3 data.
Key characteristics:
- No data duplication — your data never leaves S3
- Full file system semantics — read, write, rename, list directories, file locking
- Low-latency access — intelligent caching of actively used data
- Massive throughput — multiple terabytes per second aggregate reads
- Concurrent access — thousands of compute resources mounting the same file system simultaneously
- Dual access — data accessible via file system AND S3 APIs at the same time
How to Set Up S3 Files
Setting up S3 Files is straightforward using the AWS CLI or Console:
Step 1: Create an S3 File System on Your Bucket
# Create a file system access point for your S3 bucket
aws s3api create-file-system --bucket my-data-lake-bucket --file-system-id fs-s3-myfilesystem
# Or using the new S3 Files CLI
aws s3files create --bucket my-data-lake-bucket --performance-mode enhanced --cache-size 500 # GB of local cacheStep 2: Mount on Your EC2 Instance
# Install the S3 Files mount helper (Amazon Linux 2023+)
sudo yum install -y amazon-s3-files-utils
# Create mount point
sudo mkdir /mnt/s3data
# Mount the S3 bucket as a file system
sudo mount -t s3files my-data-lake-bucket /mnt/s3data
# Verify
df -h /mnt/s3data
ls -la /mnt/s3data/Step 3: Add to /etc/fstab for Persistent Mounting
# Add to /etc/fstab
echo "my-data-lake-bucket /mnt/s3data s3files _netdev,cache=500G 0 0" | sudo tee -a /etc/fstabThat is it. Your S3 bucket is now accessible as a regular directory at /mnt/s3data.
Real-World Use Cases
1. AI Agent Memory and State
AI agents can now persist memory, share state across pipeline stages, and checkpoint progress — all directly on S3:
import json
import os
# Agent writes state directly to S3 via file system
STATE_DIR = "/mnt/s3data/agents/research-agent/"
def save_agent_state(agent_id, state):
os.makedirs(STATE_DIR, exist_ok=True)
with open(f"{STATE_DIR}/{agent_id}_state.json", "w") as f:
json.dump(state, f)
def load_agent_state(agent_id):
path = f"{STATE_DIR}/{agent_id}_state.json"
if os.path.exists(path):
with open(path) as f:
return json.load(f)
return None
# Multiple agents across different containers
# can read/write the same state directory simultaneously2. ML Training Without Data Staging
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# Point directly at S3-backed file system
# No need to download dataset first!
train_dataset = datasets.ImageFolder(
root="/mnt/s3data/training-data/imagenet/",
transform=transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
])
)
train_loader = DataLoader(
train_dataset,
batch_size=64,
shuffle=True,
num_workers=8 # Multi-threaded reads from S3 cache
)
# Train as if data is local
for images, labels in train_loader:
# S3 Files handles caching automatically
outputs = model(images)
loss = criterion(outputs, labels)
# ...3. Log Processing and Analytics
#!/bin/bash
# Process logs directly from S3 — no download needed
# Count errors across all application logs
grep -r "ERROR" /mnt/s3data/logs/2026/04/ | wc -l
# Use standard Unix tools on S3 data
cat /mnt/s3data/logs/2026/04/10/*.log | awk '{print }' | sort | uniq -c | sort -rn | head -20
# Tail the latest log file in real-time
tail -f /mnt/s3data/logs/2026/04/10/app-latest.logS3 Files vs Other AWS Storage Options
| Feature | S3 Files | EFS | FSx Lustre | S3 + s3fs-fuse |
|---|---|---|---|---|
| Data location | S3 (no copy) | EFS storage | FSx storage | S3 (FUSE layer) |
| Performance | TB/s reads, cached | GB/s | TB/s | Limited |
| File semantics | Full | Full | Full | Partial |
| Concurrent mounts | Thousands | Thousands | Thousands | Limited |
| Data duplication | None | Full copy | Full copy | None |
| S3 API access | Simultaneous | No | No | Yes |
| Cost | S3 pricing + access fee | EFS pricing | FSx pricing | S3 pricing |
Pricing and Availability
S3 Files is available in all AWS commercial regions. Pricing follows the S3 model — you pay for S3 storage as usual, plus a per-GB fee for data accessed through the file system interface. There are no upfront commitments or minimum fees.
For most workloads, this will be significantly cheaper than maintaining a separate EFS or FSx file system alongside S3, since you eliminate data duplication costs entirely.
Getting Started Today
To start using S3 Files:
- Update your AWS CLI to the latest version:
pip install --upgrade awscli - Install the S3 Files mount utilities on your instances
- Create a file system on any existing S3 bucket
- Mount and start using it — no migration needed
This feature works with all existing S3 data. There is no migration, no format change, and no lock-in. Your S3 APIs continue to work exactly as before, alongside the new file system access.
The Bottom Line
Amazon S3 Files is arguably the most important AWS storage feature since S3 itself. It eliminates the oldest friction point in cloud storage — the gap between object storage and file-based applications. For AI/ML teams, data engineers, and anyone building file-based workflows, this is a massive simplification.
No more data duplication. No more sync pipelines. No more choosing between S3 durability and file system convenience. You get both, on the same data, at the same time.
Reference: Official AWS Announcement

Leave a Reply