Tag Archives: s3

AWS S3 Bucket MP4 Videos wrong mime-type

When copying files from one bucket to another, or uploading files to a bucket, some things may go wrong. For example, some keys representing folders in the file structure go missing – a solution to this problem is explained in a different post, see it here. Recently, another problem appeared: MP4 videos embedded as HMTM5 videos played in all browsers but Internet Explorer 9. In my case this happened, because S3 returned headers which did not contain “video/mp4” as mime-type but “application/octet-stream”. This obviously stopped IE 9 from playing the videos.

How to check the mime-type of a file in S3

If you run into this problem, you can easily find out, if the mime-type is set correctly in your bucket video file. The following example uses PHP with the PHP-SDK for AWS, but you can make this example work for any programming language and the respective SDK (the SDKs can be found here).

First of all, add a simple configuration array containing your credentials and the location of your bucket. Delete the v4 line, if your region does not support it. Save the file as “config.php”.

<?php
return array(
    'includes' => array('_aws'),
    'services' => array(
        'default_settings' => array(
            'params' => array(
                'key'    => 'YOUR_KEY',
                'secret' => 'YOUR_SECRET',
                'region' => 'eu-central-1',
                'signature'    => 'v4'
            )
        )
    )
);

The following PHP script requires the Amazon PHP SDK, you can see the “autoloader” in the “require” statement. It uses the configuration file we created in the previous step, be sure that the path to the file is correct. If the file resides in the same directory and the name is “config.php”, you may leave the line like it is.

Use the variable “$params” to define the name of your bucket and the path to the mp4 file you want to check. The “echo” at the end prints the result of getting the object. The PHP script looks like this:

<?php

require 'aws/aws-autoloader.php';
use Aws\Common\Aws;

$aws = Aws::factory('config.php');
$client = $aws->get('s3');

$params = array(
    'Bucket' => 'bucket_name',
    'Key' => 'user_upload/folder_one/folder_two/file_name.mp4'
);

echo '<pre>';
var_dump($client->getObject($params));
echo '</pre>';

When executing this script, you will get a result which looks like the following:

object(Guzzle\Service\Resource\Model)#89 (2) {
  ["structure":protected]=>
  NULL
  ["data":protected]=>
  array(22) {
    ["Body"]=>
    object(Guzzle\Http\EntityBody)#92 (6) {
      ["contentEncoding":protected]=>
      bool(false)
      ["rewindFunction":protected]=>
      NULL
      ["stream":protected]=>
      resource(7) of type (stream)
      ["size":protected]=>
      NULL
      ["cache":protected]=>
      array(9) {
        ["wrapper_type"]=>
        string(3) "PHP"
        ["stream_type"]=>
        string(4) "TEMP"
        ["mode"]=>
        string(3) "w+b"
        ["unread_bytes"]=>
        int(0)
        ["seekable"]=>
        bool(true)
        ["uri"]=>
        string(10) "php://temp"
        ["is_local"]=>
        bool(true)
        ["is_readable"]=>
        bool(true)
        ["is_writable"]=>
        bool(true)
      }
      ["customData":protected]=>
      array(1) {
        ["default"]=>
        bool(true)
      }
    }
    ["DeleteMarker"]=>
    bool(false)
    ["AcceptRanges"]=>
    string(5) "bytes"
    ["Expiration"]=>
    string(0) ""
    ["Restore"]=>
    string(0) ""
    ["LastModified"]=>
    string(29) "Mon, 01 Dec 2014 13:43:25 GMT"
    ["ContentLength"]=>
    string(8) "60209681"
    ["ETag"]=>
    string(34) ""531b21ac92269de9ca9b6459bccf62a0""
    ["MissingMeta"]=>
    string(0) ""
    ["VersionId"]=>
    string(4) "null"
    ["CacheControl"]=>
    string(0) ""
    ["ContentDisposition"]=>
    string(0) ""
    ["ContentEncoding"]=>
    string(0) ""
    ["ContentLanguage"]=>
    string(0) ""
    ["ContentType"]=>
    string(9) "video/mp4"
    ["Expires"]=>
    string(0) ""
    ["WebsiteRedirectLocation"]=>
    string(0) ""
    ["ServerSideEncryption"]=>
    string(0) ""
    ["SSECustomerAlgorithm"]=>
    string(0) ""
    ["SSECustomerKeyMD5"]=>
    string(0) ""
    ["SSEKMSKeyId"]=>
    string(0) ""
    ["RequestId"]=>
    string(16) "830D3203B2E744AC"
  }
}

Here you need to view the output of “ContentType” – if it is something else than “video/mp4”, the mime-type for the mp4 video is not set correctly. Since this seems to happen sometimes when moving buckets, or when using bulk upload for many mp4 files, I wrote a script which gets all mp4 videos, checks the “ContentType” and set its to “video/mp4” if it’s not already set. The script used looks like this:

<?php

require 'aws/aws-autoloader.php';
use Aws\Common\Aws;

ob_implicit_flush(true);
ob_end_flush();

$aws = Aws::factory('config.php');
$client = $aws->get('s3');

echo "starting to get objects";
echo "<br />";

$allObjects = AWSUtil::listObjectsHelper('pwwstage', $client);

$changedKeys = array();

foreach($allObjects as $object) {
    $paramsGet = array(
        'Bucket' => 'pwwstage',
        'Key' => $object['Key'],
    );

    if (AWSUtil::endsWith($object['Key'], ".mp4")) {
        $getResult = $client->getObject($paramsGet);
        $bodyContent = $getResult['Body'];

        echo "checking key " . $object['Key'];
        echo "<br />";

        if (($getResult['ContentType']) && ($getResult['ContentType'] != 'video/mp4')) {
            $paramsPut = array(
                'Bucket' => 'pwwstage',
                'Key' => $object['Key'],
                'ContentType' => 'video/mp4',
                'Body' => $bodyContent
            );

            echo "updating contenttype on " . $object['Key'];
            echo "<br />";

            $client->putObject($paramsPut);
            array_push($changedKeys, $object['Key']);
        }
    }

}

foreach($changedKeys as $changedKey) {
    echo $changedKey;
    echo "<br />";
}

echo "done";

class AWSUtil
{
    /**
     * Helper function to receive all keys from an object, not only up to 1000
     * This works recursively
     *
     * @param string $bucket
     * * @param array $s3client
     * @param array $responseArray
     * @param string $marker
     * @return array
     */
    public static function listObjectsHelper($bucket, $s3client, $responseArray = array(), $marker = NULL)
    {

        // first call: marker is not set
        if (!$marker) {
            $responseTemp = $s3client->listObjects(array(
                'Bucket' => $bucket,
            ))->toArray();
        } else {
            $responseTemp = $s3client->listObjects(array(
                'Bucket' => $bucket,
                'Marker' => $marker
            ))->toArray();
        }

        foreach ($responseTemp['Contents'] as $res) {
            array_push($responseArray, $res);
        }

        if (count($responseTemp['Contents']) === 1000) {
            $marker = $responseTemp['Contents'][999]['Key'];
            return AWSUtil::listObjectsHelper($bucket, $s3client, $responseArray, $marker);
        } else {
            return $responseArray;
        }
    }

    public static function endsWith($haystack, $needle)
    {
        $length = strlen($needle);
        if ($length == 0) {
            return true;
        }

        return (substr($haystack, -$length) === $needle);
    }
}

Amazon S3 NoSuchKey on folder level – folders don’t exist

While working with Amazon S3 buckets and connecting them to Typo3 I stumbled into a problem: When I copy the contents to the bucket from another bucket or via tools such as BucketExplorer, the XML output (on bucketname.s3.amazonaws.com) showed all files, but not the folders. As an example:

<Contents>
  <Key>folder_one/file_one</Key>
  <LastModified>2014-11-18T14:16:19.000Z</LastModified>
  <ETag>"6e563e7cf3fda1549cb13aeae7a53b24"</ETag>
  <Size>6148</Size>
  <StorageClass>STANDARD</StorageClass>
</Contents>
<Contents>
  <Key>folder_one/folder_two/file_two</Key>
  <LastModified>2014-11-18T14:16:19.000Z</LastModified>
  <ETag>"6e563e7cf3fda1549cb13aeae7a53b24"</ETag>
  <Size>6148</Size>
  <StorageClass>STANDARD</StorageClass>
</Contents>

The problem here is that I supposedly also require entries for the folders themselves, in the example that would be “folder_one/” and “folder_one/folder_two/”. If I don’t have these entries, connected systems like Typo3 won’t recognize, that there are folders. If the entries are missing for the folders on the root level, I even get an exception like this:

HTTP/1.1 404 Not Found x-amz-request-id: XXX x-amz-id-2: XXX Content-Type: application/xml Transfer-Encoding: chunked Date: Tue, 18 Nov 2014 09:10:16 GMT Server: AmazonS3 NoSuchKeyThe specified key does not exist

This is confusing, because when viewing the bucket in the Amazon web interface, it shows the folders while exploring.

After some research I found out that everything in the Amazon S3 buckets is treated as an “object”, files as well es folders. I assume that while uploading or copying to a bucket, the tool being used just copies the objects for the files, but not the objects for the folders. In order to fix this, I wrote a short script in PHP which goes through all objects and adds the missing objects for folders. As an example: Imagine you have an empty bucket and just one file with the object “/folder_one/folder_two/file”. The script will see this path and add the two objects “/folder_one/” and “/folder_one/folder_two/”.

In order to use the script, download the Amazon AWS SDK for PHP here. Prepare a config file, which contains your keys:

<?php
return array(
    'includes' => array('_aws'),
    'services' => array(
        'default_settings' => array(
            'params' => array(
                'key'    => 'YOUR_KEY',
                'secret' => 'YOUR_SECRET_KEY',
                'region' => 'YOUR_REGION',
                'signature'    => 'v4'
            )
        )
    )
);

Note: Fill in your key, secret key and the region. I have to use v4 of the signature, because the bucket I am using is situated in Frankfurt (eu-central-1′). If you use the v2 of the authorization mechanism, just delete the line containing the signature.

The following is the files which contains the logic:

<?php

require 'aws/aws-autoloader.php';
use Aws\Common\Aws;

$aws = Aws::factory('config.php');
$client = $aws->get('s3');

$addObjects = array();

$iterator = $client->getIterator('ListObjects', array('Bucket' => 'YOUR_BUCKET_NAME'));
foreach ($iterator as $object) {
    $path = $object['Key'];

    // only enter the condition, if the path does not end with a slash
    if (substr($path, -1) != "/") {
        // iterate as long as there is a slash in the path
        while (strpos($path, "/") !== FALSE ) {
            $path = substr($path, 0, strrpos($path, "/"));
            if (!in_array($path . "/", $addObjects)) {
                array_push($addObjects, $path . "/");
            }
        }
    }
}

foreach ($addObjects as $object) {
	$params = array(
		'ACL' => 'bucket-owner-full-control',
		'Body' => "",
		'Bucket' => 'YOUR_BUCKET_NAME',
		'Key' => $object
	);
	$client->putObject($params);
}

Be sure to require the autoloader of AWS. We use the factory method of the Aws class and the config file we wrote. The first loop iterates over the object which are available in the bucket. It reads the path of the object and adds all folders, which are a part of the path to the array “$addObjects”. The second loop iterates over the folder paths we just collected and adds objects for them to the bucket. Note: Replace the two occasions of “YOUR_BUCKET_NAME” with the name of the bucket you are using.