Recovering files on Google Cloud Storage

Google Cloud is a complex and very powerful platform. One of its components is Cloud Storage – based on buckets, a place where we can store files. It supports a lot of options like files versioning, saving metadata, generating signed URLs for downloading/pushing files etc. One of the most problematic options is data recovery: if you check Google Console, you will not see any option to do that. Does it mean we cannot recover deleted files? Fortunately, it is possible, but a bit tricky. Trick post is to explain how to do that.

Of course, this will only be possible if a certain condition is met: our bucket must have enabled versioning. Without that, Google will not provide the ability to recover deleted files, so remember to enable this option if bucket is not used only for temporary data – in all other scenarios, it should use this option.

Versioning means we can have multiple versions of one file and switch between them at any time – it applies also to deleted files. Each file uploaded to Google Storage has generation number which is used to determine the active (current) version. If we delete the file with enabled versioning, there will not be an active version, but the file will be still available on bucket. The tricky part there: if you will try to find it, you will not have any results – on both bucket view on cloud interface and also on standard gsutil command. At the beginning, we need to find our deleted files

Checking deleted files

This operation is simple, we can do that by using command:

gsutil ls -a gs://{BUCKET_NAME}

Looks like normal ls, but -a argument will include also deleted files. Of course, command can list also only all files from specific directory:

gsutil ls -a gs://{BUCKET_NAME}/{DIRECTORY_PATH}

Conclusion: you can list all active files, or all files (active + inactive), but there is no possibility to list only inactive files. It generates some implications if we need to do something bigger, but right now, just try to recover single file.

Recovery procedure

Right now, when we have timestamps, we can “recover” files. It is quite simple because instead of using any specific command, we should use only cp – so command required to copy items. In this scenario, we need to pass not only path, but also timestamp of version we want to recover. For example:

gsutil cp gs://{BUCKET_NAME}/{PATH_TO_OBJECT}#{TIMESTAMP} gs://{BUCKET_NAME}/{PATH_TO_OBJECT}

And some real scenario:

gustil cp gs://my-bucket/myfile.jpg#1689145344557329 gs://my-bucket/myfile.jpg

Mass Recovery

As I mentioned, mass recovery is trickier because we do not have the ability to list only inactive files. In result, we must have all objects paths or scan directory and check files metadata. If they will have DeleteMarker set to True, it means, they have been deleted and potentially we want to recover them using commands above

There is an example bash script to recover all files saved in objects.txt (paths to deleted objects or all objects on our bucket) in particular bucket in parallel – 50 files in parallel to speed up this operation. It is only an example, so if you need a more specific solution, you need to adjust it.

#!/bin/bash

bucket_name="YOUR_BUCKET_NAME"

process_object() {
  object_version="$1"
  echo "Currently checking file: $object_version"

  metadata=$(gsutil stat "$object_version")

  if [[ $metadata == *"DeleteMarker: True"* ]]; then
    live_object=${object_version%%#*}
    echo "Need to recover: $object_version"
    gsutil cp "$object_version" "$live_object"
  fi
}

export -f process_object

# Run 50 processes in parallel
cat objects.txt | parallel -j 50 process_object

echo "Processing complete."

Summary

As you can see, because of some limitation, recovering deleted files in Google Storage buckets can be a bit problematic – it all depends on your needs. The good thing is you need to know only few commands and can then adjust rest to meet all your requirements. It makes these buckets flexible and maybe not the best in some scenarios, still a good option.