S3

MongoDB Replica-Set Aware Backup Script

I’ve created a nice little bash script to take MongoDB backups that is replicaset aware.

It will only take a backup from a replica so if you have the classic master,replica,arbiter configuration you can setup the script via cron on both (current) master and replica and the backup will only run on the replica.

It will then tar.gz the backup and upload it to Google Storage. It can be easily adapted to upload the backup to S3 using s3cmd or the aws cli (aws-cli).

Clone S3 Bucket Script

I had to backup an S3 bucket so I whiped out a small script to clone a bucket.

It’s written in Python and depends on the excellent Boto library. If you are running Python < 2.7 you’ll also need the argparse library (both available also via pip).

View the gist here: https://gist.github.com/1275085

Or here below:

#Amazon S3 #Amazon Simple Storage Service #Bucket #Clone #Code #gist #S3

Crawling to the people

Yaniv let the cat out of the bag about some of our ideas for making other parts of the search and its relevant data open, free and accessible to all of us.

I’d thought I’ll add some background and my thoughts on the subject.

First, the idea was iterated a couple of times when we were in that place where you have a solution(s) and you are seeking a problem(s) to solve.

[Read More]

#Amazon #crawling #EC2 #Google #indexing #infrsatructure #open-api #S3 #spinn3r #standards #web-crawling #web-index