300TB of data. Dropbox and Google are dead to me. Next options. Cloud? Tape? NAS?
So I run a video production company. We have 300TB of archived projects (and growing daily).
Many years ago, our old solution for archiving was simply to dump old projects off onto an external drive, duplicate that, and have one drive at the office, one offsite elsewhere. This was ok, but not ideal. Relatively expensive per TB, and just a shit ton of physical drives.
A few years ago, we had an unlimited Google Drive and 1000/1000 fibre internet. So we moved to a system where we would drop a project onto an external drive, keep that offsite, and have a duplicate of it uploaded to Google Drive. This worked ok until we reached a hidden file number limit on Google Drive. Then they removed the unlimited sizing of Google Drive accounts completely. So that was a dead end.
So then we moved that system to Dropbox a couple of years ago, as they were offering an unlimited account. This was the perfect situation. Dropbox was feature rich, fast, integrated beautifully into finder/explorer and just a great solution all round. It meant it was easy to give clients access to old data directly if they needed, etc. Anyway, as you all know, that gravy train has come to an end recently, and we now have 12 months grace with out storage on there before we have to have this sorted back to another sytem.
Our options seem to be:
Go back to our old system of duplicated external drives, with one living offsite. We'd need ~$7500AUD worth of new drives to duplicate what we currently have.
Buy a couple of LTO-9 tape drives (2 offices in different cities) and keep one copy on an external drive and one copy on a tape archive. This would be ~$20000AUD of hardware upfront + media costs of ~$2000AUD (assuming we'd get maybe 30TB per tape on the 18TB raw LTO 9 tapes). So more expensive upfront but would maybe pay off eventually?
Build a linustechtips style beast of a NAS. Raw drive cost would be similar to the external drives, but would have the advantage of being accessible remotely. Would then need to spend $5000-10000AUD on the actual hardware on top of the drives. Also have the problem of ever growing storage needs. This solution we could potentially not duplicate the data to external drives though and live with RAID as only form of redundancy...
Another clour storage service? Anything fast and decent enough that comes at a reasonable cost?
AWS Glacier Deep Freeze is designed for this. Something you access a couple of times per year if that, and it's $.99/TB/mo. Price that out compared to a $10k NAS or tape backup that will still need consumables like drives and tapes, and it might be your best option. There are costs on retrieval, but since as you've said this is archive footage that customers might request you could pass that cost down to them.
Over the last 24 months I've built 300TB (a mix of 10 and 14TB disks) for $2500 in disks. I could do that right now for $2100. A 18TB LTO9 tape is more expensive than what I'm paying per TB for 14TB disks.
$700 in hardware to build the NAS with 25 bays.
Glacier would cost you $1080/mo in storage fees alone (300,000GB @ $0.0036) not including the $0.09/GB to get any data back out. Deep Glacier is less (by half, for storage), but comes with strings attached.
Don't forget to factor in labor hours of what it's going to cost you to maintain a tape library or a local server in general.
Are you charging clients for long term storage after a project is complete? If not, you should be.
You have 3 issues, online archive of past projects, long term (offline) storage & client access.
LTO is your long term solution for offline archive of projects. Depending on the average / largest project you might want to do 1 project per tape so LTO7/8 sizes. Scales really well, multiple copies, etc.
For the online storage, a NAS is really the only option. How it's sized & configured comes into play. You can go cheaper with used enterprise gear, but then you're dealing with more disks & higher power bills. Fewer larger disks can help with the power bill & noise levels.
Splitting things between a read-only share (of things that have been archived to tape), and a normal working share would help on the workflow.
The catch is what you do for client data exchanges. Giving them access via Dropbox is nice, but you need better housekeeping around data. Once the 1 year grace is over, what's the size they have committed to? While self-hosting a client accessible share is possible, there's ongoing costs & I would be cautious around exposing the NAS to the internet directly.
Have you considered Amazon S3? It’s made for enterprises with unlimited storage, a lot of pricing options and could save you a lot of headaches long term.
s3 is designed with high availability and high throughput in mind, op needs a cold storage solution like aws glacier or azure cold storage. but even that is not cheap
AWS as well as Azure provide cold storage on the order of $1/TB/mo. There are caveats, such as retrieval costs and such, but depending on your situation that might be OK.
I use this for a client - they have an on site server, which backs up to Azure Archive Tier storage. They have around 60TB up there and pay just under $100 per month.
Message me if you like and I'll go into exactly what we did, but it works well for them!
Really depends on how often you need to touch your data. Tape has high upfront cost (4-5k $ for a LTO-9 tape drive + ~3,5 $/TB in tapes) but you don't have to worry about archive space anymore. Otherwise, NAS space (if you selfhost) is ~15 $/TB + a server which would also be slightly above 5k right now to store your 300 TB.
IMO it depends on how organized you are and how often you need to access archived video.
LTO-9 is cheaper per TB (haven’t run the numbers, but on the order of 100s of TB it’s almost definitely true) but relies on someone physically finding the right tape and putting it into the system (unless you shell out for a very expensive automated system). Not good for fast access, but cheaper for expanding.
If you need fast, automated access I’d recommend the NAS option, but keep in mind that it would be in one physical location. A fire or flood and you’re fucked.
Plus, since the cost per TB of tape is so much cheaper than HDD, expanding your archive is probably much cheaper with tape (keeping in mind the organization/automation aspect)
How often are you actually needing to access the 300TB? If 250TB are “cold storage”, then LTO is the way (you can rent the readers usually, rather than buy)
If you’re needing to have access but not edit from, NAS is the way, 300TB wouldn’t even be THAT expensive (still expensive), just slow to move to, but once you’re up and running a decent rig should last years.
If you’re needing to access all 300TB, then you’re looking at a LTT style NAS that needs to handle read and write from multiple users at a time, and that’s gonna be the real $$$.
I feel like you might do well from a mixture of all of these. A smallish NAS for day to day/project use, and once that project is done you move it to the big “slow” server for onsite backup, and once every 2-3mo you rent the LTO drive and load up a few tapes, and ship them off to the void for offsite backup and cold storage.
This is only for archived projects. But we'd probably still need to access ~10-20TB of that data relatively regualry to update branding, or change edits, etc. Saying that, as mentioned in the OP, if we went tape or cloud, we'll likely have a physical local copy on an external hard drive for quicker access. We just need a redunant back up of these archives.
If we went NAS, I feel like maybe we could get away without the redundancy? Risky...
If we went NAS, I feel like maybe we could get away without the redundancy? Risky...
That's the thing, you could, but it wouldn't be best practice. At the end of the day the 3-2-1 rule applies to any data.
I know it's a hard pill to swallow, but ideally you'd need both a NAS (I'd go with Proxmox on a PC) and the tape backup for that NAS to ensure the safety of the data.
However. Backblaze may take the spot of the tapes - unsure if the NAS as well. Have a look at their offer and see what fits your budget. I would personally go with the NAS on site and backup it daily to Backblaze. Note that Backblaze B2 says something like 6$/TB/month which amounts to about 21600$/year which stings but then again it's safe and it's the best value (all the competition seems to be more expensive).
I mean; the NAS would have some built in redundancy via RAID 5 or 6 or whatever, but you wouldn’t have an offsite backup. What you’d wanna look into is something like Backblaze B2, but even that is going to be $1800 a month, so at that point I would say build a 2nd NAS and pay for it to be in a data center, that would only be a couple hundred a year, or even just run it at your house and run a nightly backup.
This is a bit out there, but have you considered encrypting it & putting it in the Filecoin network? According to a network explorer, they've got ~24EiB of space online currently.
Apparently, alot of the time storing is done for free because the miners need valid data to store to qualify for network rewards.
I work in a TV production company. Masters and rushes are archived to LTO8.
Drives are cheap but a real pain to keep around and you can’t keep them indefinitely.
But you’ll likely want a library. These are expensive. Not necessarily to buy, but to license and get tot software. I think our entire system, (library, 2xlto8 drives, server, software and licenses cost 15-20k)
And we only licensed 25 out of the 50slots. As it’s a real fucker as you have to license the slots twice, once on the library and again in the archive software.
But it’s been an absolute godsend, having archive projects available makes life so much easier.
You should follow in the steps of Linus Tech Tips and build a massive file server. That's probably the most comparable system and situation to your case.
This is specifically the high capacity storage system. They don't Experience failures with that. LTT does hype up things as failing but it's not their essential stuff.
It's not just them that use these either, there's also companies like 45 drives
I'd recommend the hybrid approach with NAS and Tape Drive
Build a robust NAS system for remote accessibility, but consider setting up a hierarchical storage management (HSM) system. Frequently accessed or recent projects can reside on the NAS, while older and less accessed ones can be automatically moved to more cost-effective storage.
Invest in LTO-9 tape drives for archival purposes. While the upfront cost is more but tapes provide long-term, cost-effective storage. This is particularly useful for archival data that doesn't require frequent access. It adds an extra layer of redundancy and security.
I hope you charge your clients for archiving purpose. I work in a similar field as yours and no chance I'm archiving this much data if the clients aren't paying for this. I have a contract that stipulates assets are kept for 1 year then they pay a yearly fee for archiving or they agree that we may delete it.
Raid/NAS, as many others have said, isn't a backup.
However, you could have a single NAS and backup to AWS Glacier where storage costs for larger files is cheap going in and getting out in DR scenario is expensive, but maybe covered by your insurance depending on the DR event.
Have you looked at Wasabi or Backblaze? Possibly cheaper. It’s always cheaper to do this via a nas in the end. Big Synology or two smaller units at each site with expandability.
Copy 1, long term storage: LTO / LTFS is the way to go for one of your copies, it is by far the most reliable storage solution.
Copy 2, a big ass Truenas build, they can be done at the cost of the hard drives and can be configured for very high availability with 3 parity drives. Pro tip, have spare drives in the chassis but not configured as hot spares, it's a long story, you will learn if you go down the path of truenas. Get a used chassis for the drives and hook it up with a SAS adapter.
Copy 3, if you have important data, you will need a 3rd copy, and that should be LTO as well.
Go back to our old system of duplicated external drives, with one living offsite.
That would be the cheapest method I suppose
assuming we'd get maybe 30TB per tape on the 18TB raw LTO 9 tapes
LOL. No. You're going to get 18TB per tape. If anything you're probably going to get less
Would then need to spend $5000-10000AUD on the actual hardware on top of the drives
You don't have to(?). You're only going to be servicing at 1Gbps at a time. Interface limitations aside, even a $30 raspberry pi can do that.
LTTs set up is different because they have high speed networking, multiple users requesting a ton of data all at the same time, and ZFS dedupe going on. You don't.
This solution we could potentially not duplicate the data to external drives though and live with RAID as only form of redundancy...
Not sure what this means
Another clour storage service?
S3 glacier is potentially a solution, provided you (or someone at work) is willing to put in the time and effort to read the fine print. I have a post here on the subreddit if you want a somewhat summarized version.
Even shorter version : 300TB = USD $300 per month. Every single terabyte you want to get back is $100 on top of that. Do the math on how much egress you need.
Oh, and again, don't forget to read the fine print. Because you WILL get screwed over if you try to use it as a drop in replacement for google drive.
Many comments here suggested AWS S3 Glacier. Just wanted to let you know that transferring this amount of data might be faster (and more expensive) with AWS Snowmobile.
AWS Glacier is serval times more expensive FOR ONE YEAR than an LTO library. And if you absolutely can’t afford to get the data back (around 300-400.000$ with 300TB) you can right away not do it at all.
I work for a US VFX company. We mostly use tapes - they're fine as long as you don't need to pull data off them. Most of ours were backed up with Veeam which is in the process of screwing over all tape customers now. New backups are being done with Archiware. We've also started using AWS Deep Glacier which is roughly $1 per TB a month without egress. This is for any archives. If people are still working on stuff here and there, we use storinators to host that data. Hot tier is all flash.
Make sure Enterprises use Enterprise setups, else you'll end up with data loss. Personal or homelabs are whatever but ensure you're setup correctly for anything business related.
If it's for archives and retrieval is rare and not time-critical, I'd look into amazon S3 with Storage Class DEEP_ARCHIVE. It's the cheapest cloud storage.
However, tape may still be the better solution long-term.
I'm going to chime in on this. From what you've said, what you want is a Backblaze 45 drive storage pod. This isn't going to be cheap but with the right setup, you now have an easy to update Storage Server - you'd need two of them and mirror the two offices.
You need ot think about how often you need to access the data. If it's once or twice a year, then the added overhead of having to find and load a tape wouldn't add up that quickly and IMO should be acceptable.
However, for projects you currently work on, you'd want hard drives and/or SSDs, preferably on a network, I suppose. Unless all your in0flight footage resides on the computers you edit them on (in which case I hope they have redundant storage).
Also, if any of your clients needed some archived data, would it be feasible to come back to the tapes, read, upload and share them? If you had a NAS and a fast enough internet connection, you may be able to host a site yourself, thus no need for reading the tape and uploading to a cloud.
Also, if it's video footage, then you shouldn't really count on LTO's compression ability. It's not particularly good for pictures and videos.
In AUD, for a data centre in Sydney, with S3 Glacier Flexible you’re paying around $0.0045 per GB, and with Glacier Deep Archive it’s $0.002 per GB. This is your solution
Most will combine on-premise and multiple cloud storages and then proxy a low res for previewing, with custom metadata modeling to find and retrieve everything.
I still use Google Drive but I won't trust it. My gmail/google presence ive had since gmail was closed beta, including the last 15+ years of photos including my wedding photos, the day my kids were born everything gone.
I got a notification from google one day on my "backup" account that my account had been suspended for breaking TOS but it didnt say what I did.
Its my own fault for not duplicating my data and trusting google would keep it safe for me considering I paid for the 2tb plan of drive.
You are basically taking on customer data archiving as a part of your business.
If you are doing this as a business, everything has a cost and that cost should be passed to the customers.
There is a reason that companies doing long term record retention charge absurd amounts for it.... Iron Mountain takes on a ton of liability and responsibility to keep your crap intact while they have it. I would never take that on willingly.
Some people I know offer to package and transfer the assets to the customer as a paid service when the project is done, making long term storage their problem. (Photo, design)
I also have friends who do the contract line that assets are only kept for a year.
Truthfully... As a business, why would you want to keep anything? If the customers lose their data they need to pay you to make thing again which is better for you.
If you need fast and regular access to the archive, anything up to 1 PB can be handled with HDDs nowadays. If you dont need that, LTO tape will be much cheaper. For your offsite backup encryption+archival storage such as GCP coldline or archival storage is very cost effective and can be combined with either.
Think about your data and organization. Perhaps you only need fast access to a part of the data, so combining the two might be the best solution. Consider if you have an IT department or a data steward to set up a system for organizing that data.
I'm in the exact same boat as you. Cloud storage is completely unusable for people like us.
I've gone back and forth between what I'd want to setup and what would be most effective. The solution I've settled on is building a cheap NAS out of a normal PC, throw as many drives as I possibly can in it and store that at someone's house. It sucks that that's the best solution, but for us and our budget and use case, that's the best for us. Granted you have more data than us so it probably wouldn't all fit in a normal case, but you get the idea
A lot of people have suggested charging clients for long-term storage. I agree with that sentiment. If you go this route, you may be able to use cloud storage a la Dropbox/gDrive - which seems most convenient for you. Costs for consumer-facing cloud storage run roughly $10USD for 2TB. Expensive for hundreds of terabytes indefinitely, but if a single client needs access to (idk) 0.5 tb you could easily charge $30-50 a year to provide them a shared folder in google drive. Maybe more if you want redundancy against the cloud provider losing data.
For anything you need to actively use for work, a giant NAS is probably your best bet. Those YouTubers you’ve seen also use it as part of their team workflow, and maybe that’d also apply to you anyways. You should probably run a regular backup job of these to the other office or to AWS/backblaze. Should be manageable cost if you only need 10-20TB of data for active work.
For everything else… maybe tape if you really want to keep everything. A lot of big organizations seem to be moving away from tape towards networked spinning disk as the price drops. Seems mostly driven by tape being seen as a massive pain to use (not that I have personal experience with it) and expensive equipment. It’s really an organizational decision to directly quantify long term archival needs and value. Once you have a $/TB value to the business, see what fits your budget (could be nothing!) You could try Backblaze or AWS glacier but those get expensive and the cost is ongoing forever.
There are a whole bunch of niche and small-scale companies doing cloud data storage, but I don’t know how they’d get lower cost per byte stored over some big companies (lower margins? Slower speeds? Lower guarantees?). I’d be suspicious of them for mission-critical storage. It’s one thing for a home-user to use them to store their torrented movies, but it’s very different for a business. It could be worth it to just search around. Look at what’s supported as a target by whatever NAS software you use if that’s your route.
Listen to me. Here is the pro solution.
Get yourself something like Fujitsu Eternus cs800 plus Fujitsu lto tape library. Contact sales team and tell them how many data are you going to put there.
Result will be all the data available quick if they reside on disk cache, or little bit later if need to be pulled from tape.
From your point of view data will be available from mounted network share and transparent in terms of technical magic behind it. Basically - imagine yourself an infinite folder where algorithm is moving data to and from tapes, keeps them healthy, refresh and consolidate when needed.
20 tapes each 12tb plus dedup is like 0.5 PB of data. And you can always duplicate tapes and move to external location. Even if somebody would stole everything from 1st location including hardware, you can get data back.
I just want to say a massive thank you to everyone contributing advice and thoughts here. There’s a lot to get through and I’m taking my it all in.
To those saying we should be charging for this, we hear you, you’re not the first to tell us. We’re looking into implementing that going forward and need to assess how we’ll tackle that for older clients.
I feel like this is a good point to assess our whole data infrastructure (live edits and archiving) and we’ll keep you all up to date once we decide on a direction. In the meantime keep the thoughts rolling in!
I've worked for several production companies that have similar or larger archives (one was well into the Petabyte range). LTO is the way to go. It is the cheapest option for very large archives, and if the tapes are properly stored, they last a lot longer than hard drives sitting on a shelf.
The real way to do it is a tiered archive, where everything goes to LTO, you have more recent media (1-2 years old, depending on project length) on hard drives, and current media (still in use + past year or so) on a NAS. LTO is still your primary archive; everything else is for easy access to media you're more likely to need now or in the near future.
From what it sounds you want a NAS and Tape Archive.
So get a device which holds your working Projects, you mentioned arount 20-40TB which is no problem nowdays. Can be done for under 1k with of the shelf stuff.
And Tape backup for stuff you dont need regularly. Maybe chose an older generation of LTO I would look for something that can hold about 1 Project per Tape or the likes of it. LTO5 is pretty cheap used, ca be had for 500 Bucks but is only 1.5TB per tape.
Disclaimer, with LTO never look at the compressed NR, its for compressable data only which video is not. Thus with LTO9 you will only get 18TB
Yeah we've got a solid situation for our live projects. Each of us work off 40TB thunderbolt raids with local external drives as our backup and live online backup to Dropbox.
This is for our archived work, but yeah of that, we access around 20-40TB fairly regualrly. Good to know that tape won't compress video data at all!
Not to be rude or anything, but External RAIDs individual to the user is not really a solid soulution. It may work for 1-2 People working on one project at a time. But it just does not scale. What if someone needs to acces files of that project? they move the raid or plug their laptop on a differen workspace? Not really a great soulution IMO.
Like you say in the last part having a NAS with maybe a bit of room to grow sso 100TB might be the best option that way everyone can access the data and work accross projects. And more importantly it would offer work from a different place in the office or even work from home.
Yea with tape the compressed nr are very missleading. Thats a best case scenario where the files compress 2:1 with TAR+gzip which it literallly never does. Bestcase I have seen was 1.2:1 on a folder consisting of config files. Basically nothing nowdays is compressable you will interact with, except textfiles depending on format. So its best to always asume the raw space as the space you get
300 TB in Backblaze B2 using their online calculator is $21,600 USD a year. I'm sure you can build / expand a new NAS every year for the similar prices. But then you have to deal with the overhead of managing it and replacing disks.
Wasabi has their Reserved Capacity Storage where you can get discounts if you commit to a minimum amount of storage. According to their site the absolute minimum to qualify is 25TB.
I'm in the same biz. I use tape. Specifically a Mac mini + canister from guys that make Hedge. I then index each tape with neofinder, it makes it easy to find and pull projects. The idea was to make a system simple enough that it wasn't one persons full time job.
Back blaze B2, or Wasabi seem to be a lot cheaper than going AWS S3
I've checked with wasabi a while ago and there are no fees for downloading/uploading.
$6/TB/month
A DS1821+ with (2) DX517 expansion bays would cost 4.1K AUD presuming 10% tax and would be 307 TB presuming (18) 22TB drives with a BTFRS file system running SHR-2 (allows for 2 drive failures).
(18) 22TB drives @ $22/tb AUD = $9.5K
So an all in cost for 307TB is 13.6K AUD using that equipment. 27.2K AUD to have a mirrored backup, but it sounds like you're ready for another 300+ TB right now, so 54.4K AUD to have 1:1 backups and 307TB of runway.
If AWS Glacier is what you're comparing to, then you make that up in 6 months.
Rack mount would be more convenient, as you can have 1PB volumes and a little less cumbersome and tidy setup - the 1821+ with expansion bays are 108TB max per volume, so you'd have to deal with 6 different volumes but maybe not a big deal if your filing system is by year/month. But getting into rack mount with Synology for example would basically double your infrastructure cost. Or you bite the big bullet now on scaleability and use a 60 Bay rack mount @ 29.9K AUD for just 1, but it's still roughly the same cost per drive bay as the 16 bay.