Carlsbad Aquafarm SeapHOx

Carlsbad Aquafarm SeapHOx

15-Jan-2018    

Here is a Jupyter Notebook with a Python script that I’ve been using occasionally to scrape a Google Sheet with near-real-time data. The Google Sheet updates ~ 2x/hr but this website is static so it updates only when I manually do so. See my ThingSpeak channel for the near-real-time data output.

PJB-20180420_ScrapeGoogleSheet

Google Sheets Scraper

Goal: scrape the Google Sheet with autofilling data from the Particle Electron at Carlsbad Aquafarm Google Sheet named "SeapHOx_OuterLagoon" is here

In [1]:
import numpy as np
import pandas as pd
# import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import os

%matplotlib inline

Scrape Google Sheets

Run python script from https://developers.google.com/sheets/api/quickstart/python which needs to run separately as it interacts with the web browser (in other words, don't copy and paste script into here.

In [2]:
%run ./quickstart.py
# type(values)
# print(values[-5:])

Grab Useful Data from Sheet

In [3]:
electron_array = pd.DataFrame(values) # includes timestamp
data_col = electron_array.iloc[:, 1]
data_array = pd.DataFrame(data_col.str.split(',', expand = True))
data_array.columns = ['Date', 'Time', 'V_batt', 'V_int', 'V_ext', 'P_dbar', 'pH_int', 'O2_umolkg', 'temp_SBE', 'sal_SBE', 'V_batt_elec', 'charge_status']
data_array.set_index(pd.to_datetime(data_array['Date'] + ' ' + data_array['Time']), inplace = True)
data_array.drop(['Date', 'Time'], axis = 1, inplace = True)
data_array.head()
Out[3]:
V_batt V_int V_ext P_dbar pH_int O2_umolkg temp_SBE sal_SBE V_batt_elec charge_status
2018-04-17 18:30:25 18.860 0.07985 -0.83308 5.33600 8.05349 328.51599 13.43740 33.55860 4.05 84.77
2018-04-17 19:00:25 18.870 0.08055 -0.83331 5.41600 8.08004 328.32300 13.48790 33.56140 4.05 84.62
2018-04-17 19:30:25 18.860 0.08002 -0.83326 5.38800 8.07537 325.82300 13.53800 33.55890 4.05 84.62
2018-04-17 20:02:20 18.860 0.07946 -0.83334 5.35400 8.06744 324.28900 13.54920 33.56210 4.05 84.48
2018-04-17 20:32:51 18.850 0.07959 -0.83293 5.21900 8.06995 326.27499 13.65300 33.56350 4.05 84.48

Filter

  • Filter based on date
  • Cast to type float (for some reason the str.split leaves it as arbitrary object)
  • This was necessary in early notebook as the input data wasn't filtered at all but the Google Sheet should be cleaner to begin with (i.e., no land data)
  • Filtration may come in handy later so keep this here for now
In [4]:
date_filt = data_array.index > '2018-04-17 18:30:00'
data_filt = data_array[date_filt]

import pytz
pacific = pytz.timezone('US/Pacific')
data_filt.index = data_filt.index.tz_localize(pytz.utc).tz_convert(pacific)

data_filt = data_filt.astype('float')

data_filt.tail()
# data_filt.V_press
Out[4]:
V_batt V_int V_ext P_dbar pH_int O2_umolkg temp_SBE sal_SBE V_batt_elec charge_status
2018-05-08 12:00:25-07:00 17.26 0.08581 -0.82617 4.389 8.14761 291.12201 17.0914 33.6705 3.52 1.40
2018-05-08 12:30:25-07:00 17.26 0.08594 -0.82605 4.406 8.15002 292.57901 17.0995 33.6732 3.52 1.38
2018-05-08 13:00:25-07:00 17.26 0.08536 -0.82661 4.392 8.13971 291.05701 17.2207 33.6641 3.52 1.40
2018-05-08 13:30:25-07:00 17.26 0.08583 -0.82597 4.426 8.14509 290.01300 17.9711 33.4835 3.52 1.40
2018-05-08 14:00:25-07:00 17.26 0.08517 -0.82652 4.497 8.13257 286.94101 17.7326 33.5828 3.52 1.35

Plot

In [5]:
fig, axs = plt.subplots(6, 1, figsize = (10, 10), sharex = True)
axs[0].plot(data_filt.index, data_filt.V_batt)
axs[0].set_ylabel('V_batt')
ax2 = axs[0].twinx()
ax2.plot(data_filt.index, data_filt.V_batt_elec, 'r')
ax2.set_ylabel('V_batt_elec', color='r')
ax2.tick_params('y', colors='r')

axs[1].plot(data_filt.index, data_filt.P_dbar)
axs[1].set_ylabel('P_dbar')

axs[2].plot(data_filt.index, data_filt.sal_SBE)
axs[2].set_ylabel('sal_SBE')

axs[3].plot(data_filt.index, data_filt.temp_SBE)
axs[3].set_ylabel('Temp')

axs[4].plot(data_filt.index, data_filt.V_int)
axs[4].set_ylabel('V_int')
ax2 = axs[4].twinx()
ax2.plot(data_filt.index, data_filt.V_ext, 'r')
ax2.set_ylabel('V_ext', color='r')
ax2.tick_params('y', colors='r')

axs[5].plot(data_filt.index, data_filt.pH_int)
axs[5].set_ylabel('pH')
ax2 = axs[5].twinx()
ax2.plot(data_filt.index, data_filt.O2_umolkg, 'r')
ax2.set_ylabel('O2', color='r')
ax2.tick_params('y', colors='r')

axs[0].xaxis_date() # make sure it knows that x is a date/time

for axi in axs.flat:
#     axi.xaxis.set_major_locator(plt.MaxNLocator(3))
#     print(axi)
    axi.yaxis.set_major_locator(plt.MaxNLocator(3))
#     axi.yaxis.set_major_formatter(ticker.FormatStrFormatter("%.02f"))

fig.autofmt_xdate() # makes the date labels easier to read.
In [6]:
fig, axs = plt.subplots(1, 1, figsize = (10, 10), sharex = True)
pHOx = axs.scatter(x = data_filt.pH_int,
                   y = data_filt.O2_umolkg,
                   c = data_filt.P_dbar,
                   s = 100)
axs.set_xlabel('pH (int)')
axs.set_ylabel('O2 (umol per something)')
plt.colorbar(pHOx, label = 'P (dbar)');
In [7]:
cmap = plt.get_cmap('coolwarm')
corr = data_filt.corr()
corr.style.background_gradient(cmap, axis=1)\
    .set_properties(**{'max-width': '80px', 'font-size': '10pt'})\
    .set_caption("SeapHOx Correlations")\
    .set_precision(2)
Out[7]:
<th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row0" rowspan=1> V_batt <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row1" rowspan=1> V_int <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row2" rowspan=1> V_ext <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row3" rowspan=1> P_dbar <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row4" rowspan=1> pH_int <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row5" rowspan=1> O2_umolkg <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row6" rowspan=1> temp_SBE <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row7" rowspan=1> sal_SBE <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row8" rowspan=1> V_batt_elec <th id="T_44f60da2_530a_11e8_b87e_acde48001122" class="row_heading level0 row9" rowspan=1> charge_status
SeapHOx Correlations
<th class="col_heading level0 col0" colspan=1> V_batt <th class="col_heading level0 col1" colspan=1> V_int <th class="col_heading level0 col2" colspan=1> V_ext <th class="col_heading level0 col3" colspan=1> P_dbar <th class="col_heading level0 col4" colspan=1> pH_int <th class="col_heading level0 col5" colspan=1> O2_umolkg <th class="col_heading level0 col6" colspan=1> temp_SBE <th class="col_heading level0 col7" colspan=1> sal_SBE <th class="col_heading level0 col8" colspan=1> V_batt_elec <th class="col_heading level0 col9" colspan=1> charge_status </tr> </thead>
1 -0.34 -0.22 0.059 -0.26 0.41 -0.68 -0.48 0.91 0.88 </tr>
-0.34 1 0.99 0.032 0.99 0.54 0.7 0.21 -0.012 0.047 </tr>
-0.22 0.99 1 -0.025 0.98 0.6 0.67 0.13 0.1 0.16 </tr>
0.059 0.032 -0.025 1 0.091 0.24 -0.32 0.05 0.035 0.028 </tr>
-0.26 0.99 0.98 0.091 1 0.61 0.6 0.16 0.071 0.12 </tr>
0.41 0.54 0.6 0.24 0.61 1 -0.04 -0.34 0.5 0.49 </tr>
-0.68 0.7 0.67 -0.32 0.6 -0.04 1 0.4 -0.48 -0.4 </tr>
-0.48 0.21 0.13 0.05 0.16 -0.34 0.4 1 -0.39 -0.31 </tr>
0.91 -0.012 0.1 0.035 0.071 0.5 -0.48 -0.39 1 0.99 </tr>
0.88 0.047 0.16 0.028 0.12 0.49 -0.4 -0.31 0.99 1 </tr> </tbody> </table> </div> </div> </div> </div> </div>
In [ ]:

</div> </div> </body> </html>