Google Cloud is a complex and very powerful platform. One of its components is Cloud Storage – based on buckets, a place where we can store files. It supports a lot of options like files versioning, saving metadata, generating signed URLs for downloading/pushing files etc. One of the most problematic options is data recovery: if you check Google Console, you will not see any option to do that. Does it mean we cannot recover deleted files? Fortunately, it is possible, but a bit tricky. Trick post is to explain how to do that.
Of course, this will only be possible if a certain condition is met: our bucket must have enabled versioning. Without that, Google will not provide the ability to recover deleted files, so remember to enable this option if bucket is not used only for temporary data – in all other scenarios, it should use this option.
Versioning means we can have multiple versions of one file and switch between them at any time – it applies also to deleted files. Each file uploaded to Google Storage has generation number
which is used to determine the active (current) version. If we delete the file with enabled versioning, there will not be an active version, but the file will be still available on bucket. The tricky part there: if you will try to find it, you will not have any results – on both bucket view on cloud interface and also on standard gsutil
command. At the beginning, we need to find our deleted files
Table of Contents
Checking deleted files
This operation is simple, we can do that by using command:
gsutil ls -a gs://{BUCKET_NAME}
Looks like normal ls, but -a
argument will include also deleted files. Of course, command can list also only all files from specific directory:
gsutil ls -a gs://{BUCKET_NAME}/{DIRECTORY_PATH}
Conclusion: you can list all active files, or all files (active + inactive), but there is no possibility to list only inactive files. It generates some implications if we need to do something bigger, but right now, just try to recover single file.
Recovery procedure
Right now, when we have timestamps, we can “recover” files. It is quite simple because instead of using any specific command, we should use only cp
– so command required to copy items. In this scenario, we need to pass not only path, but also timestamp of version we want to recover. For example:
gsutil cp gs://{BUCKET_NAME}/{PATH_TO_OBJECT}#{TIMESTAMP} gs://{BUCKET_NAME}/{PATH_TO_OBJECT}
And some real scenario:
gustil cp gs://my-bucket/myfile.jpg#1689145344557329 gs://my-bucket/myfile.jpg
Mass Recovery
As I mentioned, mass recovery is trickier because we do not have the ability to list only inactive files. In result, we must have all objects paths or scan directory and check files metadata. If they will have DeleteMarker
set to True
, it means, they have been deleted and potentially we want to recover them using commands above
There is an example bash script to recover all files saved in objects.txt
(paths to deleted objects or all objects on our bucket) in particular bucket in parallel – 50 files in parallel to speed up this operation. It is only an example, so if you need a more specific solution, you need to adjust it.
#!/bin/bash
bucket_name="YOUR_BUCKET_NAME"
process_object() {
object_version="$1"
echo "Currently checking file: $object_version"
metadata=$(gsutil stat "$object_version")
if [[ $metadata == *"DeleteMarker: True"* ]]; then
live_object=${object_version%%#*}
echo "Need to recover: $object_version"
gsutil cp "$object_version" "$live_object"
fi
}
export -f process_object
# Run 50 processes in parallel
cat objects.txt | parallel -j 50 process_object
echo "Processing complete."
Summary
As you can see, because of some limitation, recovering deleted files in Google Storage buckets can be a bit problematic – it all depends on your needs. The good thing is you need to know only few commands and can then adjust rest to meet all your requirements. It makes these buckets flexible and maybe not the best in some scenarios, still a good option.