projects | blog


[all the things] [rss]

November 27, 2020 1:52 PM

Using the DigitalOcean Spaces API

This is a very long post. So long that I cut off the post unless you look at the individual entry page for this post. I doubt anyone would care to read this, unless someone is interviewing me for a job, and they want to know how I approach problems. So, read it if you want.

So, I have a bunch of pictures on my wordpress site that I need to port over to my Spaces CDN. I've got a script that parses an XML wordpress export, and any time it runs into an image, I need to:

Simple enough. So what I did was take each image, and extract the path out of it. So for this url https://meyzdaisy.files.wordpress.com/2019/01/img_0308.jpg, I'd need the 2019/01/img_0308.jpg part. I have a bucket that I store all the files in, and each file is accessed using a key. I created a folder in my bucket just for these images that are coming from my WP site called from-wordpress. And with this, the key for this particular image would be from-wordpress/2019/01/img_0308.jpg

Now to use the Spaces API. If you read their documentation, you'll notice that they have examples in JavaScript, Go, Python, PHP, and Ruby. No Perl. Of course there's no Perl. A quick google search for "DigitalOcean Spaces Perl" doesn't seem to come up with anything. Sometimes I think, man it would be so much easier if I just learned Python and used that instead of Perl. But I am a Perl programmer and I will figure this out.

The Spaces API was built to be compatible with Amazon's S3 storage API. That's a pretty popular service. I bet there's a perl module for that. After some quick searching on cpan, I found a few different modules, but Net::Amazon::S3 seemed to have the most ++'s and it looks like it's actively being worked on. Looking at the POD, I see that you can specify a vendor in the constructor. So there's my starting point.

my $vendor = Net::Amazon::S3::Vendor::Generic->new( host => 'nyc3.digitaloceanspaces.com', authorization_method => 'Net::Amazon::S3::Signature::V4', default_region => 'nyc3', ); my $client = Net::Amazon::S3::Client->new( authorization_context => Net::Amazon::S3::Authorization::Basic->new( aws_access_key_id => 'MYACCESSKEY', aws_secret_access_key => 'REDACTED', ), vendor => $vendor, ); my @buckets = $client->buckets; for my $bucket (@buckets) { print "Bucket: " . $bucket->name . "\n"; }

I cloned the git repo for Net::Amazon::S3, and I wrote up this simple script to test if it would work. Unfortunately, it didn't.

meyling:~ $ perl -I ~/net-amazon-s3/lib ~/test-spaces-api.pl 403: Forbidden at /home/meyling/net-amazon-s3/lib/Net/Amazon/S3/Error/Handler/Confess.pm line 18.

Now, how the heck does authentication even work? According to the API docs, both V2 and V4 signatures should work, and I need to use the Authorization header. So, let's track down the headers and see if what we're sending is correct. After doing some digging, I find that the HTTP request is created in Net::Amazon::S3::HTTPRequest. So let's throw a warning in here to see what we've got...

sub http_request { my $self = shift; my $request = $self->_build_request; $self->authorization_method->new( http_request => $self )->sign_request( $request ) unless $request->header( 'Authorization' ); + + warn $request->header('Authorization') . "\n"; return $request; }

(I went and threw in some new lines to make this a little more readable.)

AWS4-HMAC-SHA256 Credential=MYACCESSKEY/20201127/us-east-1/s3/aws4_request, SignedHeaders=content-length;date;host;x-amz-content-sha256;x-amz-date, Signature=REDACTED

First thing I notice is that "us-east-1". That's definitely wrong. That's an Amazon-specific location. I should have "nyc3" there. After doing some more digging, I see that "us-east-1" is indeed hardcoded in Net::Amazon::S3::Operation::Buckets::List::Request (as well as other places). But how do I fetch that default region that I set? I notice that in the HTTPRequest module, we were able to get the host by doing $self->s3->host. Okay, let's try something similar with default_region...

sub http_request { my $self = shift; return $self->_build_http_request ( use_virtual_host => 0, - region => 'us-east-1', + region => $self->s3->default_region // 'us-east-1', ); }
meyling:~ $ perl -I ~/net-amazon-s3/lib ~/test-spaces-api.pl Can't locate object method "default_region" via package "Net::Amazon::S3" at /home/meyling/net-amazon-s3/lib/Net/Amazon/S3/Operation/Buckets/List/Request.pm line 19.

Guess that doesn't work. But why doesn't it work?! I grep for default_region in the whole repo, and the only place it's used is in a method called guess_bucket_region, and the only place that's actually used is in Net::Amazon::S3::Bucket, and to have that, I need to already have a bucket. So, this default_region attribute is a bit useless right now. I just want to be able to access it. Clearly I can access host from the S3 object, so I look into the Net::Amazon::S3, and I think I see what I need to do.

has vendor => ( is => 'ro', isa => 'Net::Amazon::S3::Vendor', required => 1, handles => { authorization_method => 'authorization_method', host => 'host', secure => 'use_https', use_virtual_host => 'use_virtual_host', + default_region => 'default_region', }, );

handles. I don't know what that does exactly, but it looks like default_region needs to be added. Aaaand, it works! Sort of. I got "nyc3" to be added to the Authorization header, but it's still telling me I'm Forbidden.

meyling:~ $ perl -I ~/net-amazon-s3/lib ~/test-spaces-api.pl AWS4-HMAC-SHA256 Credential=5B2BCUV6RXLWDC7ZUFM3/20201127/nyc3/s3/aws4_request,SignedHeaders=content-length;date;host;x-amz-content-sha256;x-amz-date,Signature=e4a3def2c58433e749605b275172324f5f787786eba30a1aab524a33a68c6323 403: Forbidden at /home/meyling/net-amazon-s3/lib/Net/Amazon/S3/Error/Handler/Confess.pm line 18.

So what else is off about this? I look into the SignedHeaders. When I look at the example request in the API Doc, it has both x-amz-content-sha256 and x-amz-date. So those are clearly okay. date, host, and content-length are all listed under Common Headers, but something about content-length seems wrong to me. The doc says "The length in bytes of the request body. Required with PUT requests containing an XML body." But this is a GET request. There is no body. Why are we sending a Content-Length? Looking in Net::Amazon::S3::HTTPRequest, it's explicitly doing that. I don't know why, but I'm gonna comment it out and see what happens.

sub _build_request { my $self = shift; my $method = $self->method; my $headers = $self->headers; my $content = $self->content; my $metadata = $self->metadata; my $http_headers = $self->_merge_meta( $headers, $metadata ); my $uri = $self->request_uri; my $http_request = HTTP::Request->new( $method, $uri, $http_headers, $content ); - $http_request->content_length (0) unless $http_request->content_length; + #$http_request->content_length (0) unless $http_request->content_length; return $http_request; }
meyling:~ $ perl -I ~/net-amazon-s3/lib ~/test-spaces-api.pl Can't call method "findvalue" on an undefined value at /home/meyling/net-amazon-s3/lib/Net/Amazon/S3/Operation/Buckets/List/Response.pm line 25.

Hey! That's a different error message! It looks like we got past the authentication and managed to get a response back. So let's drill down to where the error is happening.

sub _parse_data { my ($self) = @_; my $xpc = $self->xpath_context; my $data = { owner_id => $xpc->findvalue ("/s3:ListAllMyBucketsResult/s3:Owner/s3:ID"), owner_displayname => $xpc->findvalue ("/s3:ListAllMyBucketsResult/s3:Owner/s3:DisplayName"), buckets => [], };

That $self->xpath_context is returning undefined. So what is that doing? Grepping around tells me that this attribute is defined in Net::Amazon::S3::Response. I throw in some more debugging statements and track down that it's not able to create an XML document. I finally track down the issue to this part of the code.

sub is_xml_content { my ($self) = @_; - return $self->content_type =~ m:^application/xml\b: && $self->decoded_content; + return $self->content_type =~ m:^(text|application)/xml\b: && $self->decoded_content; }

The content type that DigitalOcean is spitting back is actually "text/xml", not "application/xml". I'll be honest, I don't know the correct way to handle this, but I figured I could just expand this regex to account for either text/xml or application/xml. Aaaaand let's see what happens...

meyling:~ $ perl -I ~/net-amazon-s3/lib ~/test-spaces-api.pl Bucket: meylingtaing

It worked! Holy crap! That took forever to do something where I was just testing out the tool. I haven't even gotten to doing what I need to do for this blog yet.