So yeah, I've been yapping about Joyent's stuff for a while now, but hear me out. Their Manta Storage service is completely awesome.
If you read the marketing material, it might seem sort of odd that I'm into this. At first glance it looks like it's for big Hadoop jobs and people looking for oil. I'm partial to thinking it's the best thing for Rubyists since delayed_job.
For the Uninitiated Rubyist
So you're going to need people to upload stuff to your site. Things like pictures of cats, avatars, spreadsheets, whatever.
The easiest approach is just to have people upload directly to your app server, your app server makes some adjustments, like making different sized thumbnails, and then shoves all the images off to where they'll permanently live.
There's a problem with that.
Tying up Ruby
If you take the easy approach, there's a pretty nasty issue. If you're using MRI Ruby and the average app server like Thin, your Ruby process is handling all the steps of making the picture available. That means the same process is not able to do things like, handle requests from other people coming to your site. Yes, that person uploading a 5Mb picture of their cat is going to completely ruin the experience for everyone else. Without any sort of code change, your best bet is process concurrency, that is, running another Ruby process to handle more incoming connections. This does not scale well.
The First Generation of Fixing This
So let's break this up into three steps.
- The browser uploads the image
- The server resizes images
- The server puts those images somewhere for permanent storage.
In an effort to offload the latter two tasks from your Ruby web process, the fine folks at Shopify created a Ruby gem called delayed_job. When the upload is done another Ruby process, one dedicated to your sweet cat pictures, takes over and lets your Ruby web process get back to what it's doing. Traffic gets bursty? Lots of cat pics incoming? Get in line!
Literally. This and the latter generations of "workers" or Ruby processes dedicated to doing this sort of work just queue up jobs until they're done. A lot of progress has been made dealing with this, my favorite has to be Mike Perham's Sidekiq -- I've used it to melt servers into small suns -- it's completely baller.
Keep this in mind for later: A full-time worker is going to cost you a chunk of change per month even for the times you're not using it and it has to stay up just in case someone comes along and uploads something.
The Other Problem
Cloud storage is pretty readily available, it's where your stuff is going to end up anyway, so why even bother with the intermediary?
Then your worker has to go and fetch the image from the cloud, make your different sized images, and shove all those back into your cloud storage. This approach may sound like a complete pain in the ass, but it's actually one of the best ones available right now.
One Fell Swoop
Whatever that means.
So Manta. What is it? It's cloud storage with integrated computational capabilities. You can use it just like you would any other cloud storage service and just put things in it and retrieve them, but you can also create jobs on Manta so it can manipulate your data. You can even be fancy and use map/reduce functions.
This is where seasoned Rails people can start reading.
Manta's job queues work as such:
- You create as many tasks for a job as you want when you create a job, map only or map/reduce jobs using any language or tool of your choice (for the most part).
- You add your Manta asset as an input to a job.
- Manta processes your input. You can wait on it or just shove it in the job and go away.
- You can close your Manta job inputs for processing now. Billing ends. Did I mention you're billed by the second?
- Your assets are ready. Or if you didn't close your inputs, feed it some more data.
In practice, these happen as asynchronous callbacks, and happen very quickly. There's no limit to the resources you can throw at a problem, so very little time is spent waiting. Here, go read about it. It's in English and a quick read.
So yeah. I should stop. I should mention that there are services that will handle image resizing for you. They're pretty quick -- as soon as the uploads are done callbacks are fired that will process all 6 versions of the image you have to create in parallel and upload them to your storage bucket. It's pretty performant to work in this manner. Let's just hope that free developer account on that service doesn't hit its limits when you're on vacation.
F*%#ING SCIENCE SIDEBAR!
So Manta is a whole lot more than images. Go look at the tools available in Manta. I'll wait.
So since we have this giant compute facility easily available, Why not store your Nginx logs in Manta? It's a no brainer if you're already running in Joyent's cloud. Oh look, built in GeoIP, PostGIS, and python language support. Why not figure out how many of your customers living on National Park lands within 50 miles of an ocean or BIlly Ocean click on your product pages for surf wax on sunny weekdays from June to September? 5Tb of logs? No problem, the data never leaves Manta.
Surely any of the top great services can transcode your video for you to the formats you need. Oh wait, a professional customer wants color histograms of their content as a QA process. No problem, you've got gnuplot, FFMPEG, and R. Yes, I used to do this at TWC, but it took hours and we could only do a few at a time.
There's something that's a bit more compelling to me as a developer about Manta vs pre-boxed services. It's not all that difficult to setup FFMPEG to do your transcoding work, but sure it's easier to use a transcoding service. What's compelling to me is that it's less of a stretch and more cost effective to have an entire toolchain at your disposal to make the next great transcoding service, and it's probably pretty effortless.
So let's talk price. You can check compute prices for yourself. In short, unless your cat pictures are also finding the cure for cancer, you're not going to be paying much. If you process 30,000 pictures this month and use a dedicated worker on another service, it's a fixed price, the worker is going to be often idle, at other times too busy, and as of this time it's going to cost you ~$40 for the month. Manta is going to cost you $1.20 and always meeting demand.
This may sound like nickels and times and not something a business would worry about, but multiply all those numbers by a factor of 100. How about a factor of 1000? $40,000 vs $1200 is kinda a big deal. And you never need to touch the magic scale slider. Seriously. This makes the Heroku magic scale slider hard to use.
There's something else to think about here. Want to re-process all your images to webp format? What about some other new format that becomes available? Be prepared to pay to get it in and out of your storage.
In the end it's hard to argue anything billed to the second for about any on-demand resource you could need.
So Let Me at it Already Danko!
For the brave, there's the official documentation. It's really pretty easy to follow. There's libraries for Node.js, Python, Ruby and they even explain how to access it with bash.
So tonight I spent some time hacking support into the Fog gem so it works with Manta. You can use my fork, but official support from official sources is coming soon, according to the Joyent dev I bothered while he was on vacation (sorry kevinykchan, I didn't know) Mine's pretty dangerous. Fog is a library for dealing with cloud resources, like putting things in it or starting jobs in a compute cloud. I've yet to try integration into Carrierwave, but for storage it should work. I'll have to spend some time hacking through carrierwave/lib/carrierwave/processing to add the rest of the goodies.
There are also CLI utilities you can install with
npm install manta -g if you already have node installed.
Where after there?
I have no idea. I'm old and can't think of such things anymore. But there's a lot at your fingers, so go give it a shot.