Sunday, January 3, 2010

Google Docs + cURL = Fun

Well for a little back story I'll just explain why this all came about. I play WoW and my guild uses an online site to display DKP (This doesn't really matter if you don't know what WoW and DKP are. Long story short I didn't like the way the site was displaying the information so I wrote a script that pulls the data off of the original site and stores the data in a CSV file on my local computer.

Ok, Cool. Now I have a CSV file that I could easily upload to Google docs from the web interface, but I wanted to automate this since I'm lazy. :).

Problems:

1. I have a CSV I want to automatically upload to google docs with one command.
2. Which programing interface to the gdocs api should I use?

Well since I am a linux system admin by trade I love to use pure command-line tools that are available on most linux machines. Initially I was going to use Python because I like it, but when I was viewing the different interfaces I noticed an interface called "Protocol". Well basically this interface allows me to use curl inside a bash script which warms my heart :). Ok now that story time is over lets get into the nitty gritty.

Sometimes the Google's documentation of their api is kind of confusing so I will try to explain how I did things the best I can and maybe someone will find it useful. Also keep in mind that there are separate api's for spreadsheet's etc, but I am using a broad approach and just using the docs api which includes spreadsheets as well.

First things first, authentication. We will use curl to get an authentication token so that we can use this token to perform other needed tasks.

curl https://www.google.com/accounts/ClientLogin \
-d Email=USERNAME@gmail.com -d Passwd=CHANGEME \
-d accountType=GOOGLE \
-d source=cURL \
-d service=writely

Ok, now this will return something like the following:

SID=XXXXXX
LSID=XXXXX
Auth=XXXXX

Now all we really care about here is the "Auth" portion so we can use some bash magic if you pipe the output to the following it would do what you are needing.

grep Auth | cut -d\= -f2

In my scripts I stored this value as ${token}. Next thing that we will do now that we have our authentication token is get a list of our documents from google.

curl -L --silent --header "Authorization: GoogleLogin auth=${token}" "https://docs.google.com/feeds/documents/private/full"

Now this returns unformatted XML so if you need to view the output in a more human readable fashion so that you can get an idea of what you need to parse out you can pipe the output through the following, but generally your scripts shouldn't need a human readable version ;).

tidy -xml -indent -quiet

Now that we have pretty output you need to find the document that you are wanting to update. Since I am dynamically creating the spreadsheet on my end I am just wanting to upload a completely new version of the file. To do this you will need to parse out the URL for the "edit-media" link as this will be the URL that you will need to send your post command to. You should end up with a link similar to the following.

https://docs.google.com/feeds/media/private/full/spreadsheet%3A0ApXXXxxkCdHXXaWZOWjZyU2JVUlVHYjZwZ25WNHc/g3qaf1ch

Again I will store this value in a variable called ${document} as I will need it later. It is also handy to parse out the documents name which I store as ${slug}. Ok now we have our auth token, the correct URL for the document that we are wanting to edit, and document name. All that's left is issuing the POST via cURL to our ${document} to upload the local temp.csv.

curl -L --silent --request PUT --header "Authorization: GoogleLogin auth=${token}" --header "Slug: ${slug}" --header "Content-Type: text/csv" --data-binary "@temp.csv" "${document}"

The curl command should spit some XML back at you once the upload is complete. Woot! Our document has now been uploaded.

Note if you are wanting to do this for something other than a CSV. You will need to use a different "Content-Type" which can be found here:

http://code.google.com/apis/documents/faq.html#WhatKindOfFilesCanIUpload

References:

http://code.google.com/apis/gdata/articles/using_cURL.html
http://code.google.com/apis/documents/docs/2.0/developers_guide_protocol.html

10 comments:

dean said...

thanks...

nice to meet u.. ;)

Antonio Fernandes Jr said...

Thanks you rock for this but i would like to note that this gives an error on:

curl -L --silent --header "Authorization: GoogleLogin auth=${token}" "http://docs.google.com/feeds/documents/private/full

after some googling i found that the url should have https instead of http

I appreciate all your research

John said...

Yep minor oversight when typing the url out, it was https in my original script. I have updated the post.

Thanks.

Chris said...

Hi, awesome page! I've been looking for this for a long time! Could you please just guide me in one final step? When using this method you described, the CSV file overwrites the entire Spreadsheet. I want it to write only to Sheet 1 in the spreadsheet. On Sheet 2 I would like to keep some other manually entered data. How would I do this? Is there some switch that I need to add in the --header section of the curl command?

Thanks, Chris

John said...

Chris,

I have not tested this but I believe you would simply append #gid=NUMBER to the document URL for which ever sheet you wanted to upload to. NUMBER is the corresponding sheet you are wanting to edit, the first sheet would be 0 then 1 ... etc.

Hope that helps.

John

Unknown said...
This comment has been removed by the author.
Anonymous said...

This is great, except I can't get it to work. I get "Content-Type text/csv is not a valid input type"

Any idea what I'm doing wrong?

Thanks, Ron

Anonymous said...

I was sloppy,
feeds/media/private vs. feeds/documents/private
Thanks again for this blog

Anonymous said...

Qhat script did your use to use {$token}, I dont feel like putting in the super long code every time.

Anonymous said...

I got the xml after running the last script but no update to my spreadsheet.