3 Useful Tables

In this section we will be reviewing some interesting tables and good places to get started.

3.1 Browse Data

Let’s start by clicking on the Browse Data tab in the top right of the Metabase environment:

Here, you should see the options Octoparse and ScrapeStorm:

  • Octoparse is the schema that is associated with data I have collected by using the Octoparse web scraping software.

  • Conversely, ScrapeStorm is the schema associated with data collected using the ScrapeStorm web scraping software.

  • You should also see an option for PredictCryptoPredictions. This schema does not have much in it right now, but over time as I do more predictive modeling it will populate with new tables for new predictive models and is used to simulate model performance before starting to programmatically trade using the predictions. This guide ignores this schema/database for now to focus on the raw data itself, which always comes from the databases Octoparse and ScrapeStorm.

Back in MetaBase, let’s click on the option that says Octoparse:

I would recommend starting here because this was the first/original database and will have more historical data compared to ScrapeStorm, which I got up and running much later

  • Now you should see the tables that are contained within the Octoparse schema. By hovering over each table, you will see three options appear, which will be better explained in the next section about Documentation Usage. In the screenshot the mouse is hovering over the i symbol for the Bitgur table:

    • By clicking on the middle button that says Learn more about this table, you will be brought to its documentation:

For now, let’s go ahead and click on the name of the table Bitgur:

After clicking on the table name, you should see some example data show up. This shows the first 2,000 rows of data found in the table:

In the next section Usage Guide we will walk through some of the functionality associated with the things circled in red in the screenshot above using the Bitgur table as an example.

3.2 Useful tables

  • For the previews below, keep an eye out for a button to show more columns:

  • Things will tend to live as chr/strings within the database because I found that saving everything as a string prevents schema conflicts from no longer uploading data to the database after it gets collected. The previews below will show you the data types as well, so just keep in mind you might have to change the data types sometimes after extracting the data from the database.

  • The data shown below should be no more than 1 day old. The latest data is shown for each table and this document refreshes automatically daily.

3.2.1 Tables in Octoparse db

All date/time fields in the Octoparse database are in UTC Bitgur BitgurPerformance CoinCheckup CoinCheckupDetails CoinStatsPrices CoinToBuy TechnicalAnalysis

3.2.2 Tables in ScrapeStorm db

Any date/time fields in the ScrapeStorm database are in the MST timezone (Colorado time) Messari ShrimpyPrices ShrimpyPricesBTC

3.2.3 PredictCryptoPredictions db

  • For the PredictCrypto project, I have been working on different iterations of the predictive models to predict and trade on the live cryptocurrency markets. For an overview of what this process looks like from start to finish, please see the Alteryx Use Case for the project: https://community.alteryx.com/t5/Alteryx-Use-Cases/Predicting-and-Trading-on-the-Cryptocurrency-Markets-using/ta-p/494058

  • As I improve things on the predictive modeling side of things, I am going to create different iterations of the model and write out predictions made in real time by the newest models and save those predictions so I can analyze what would have happened by actually trading on them. Once I have done more progress I will provide more tools and better documentation around analyzing the performance of the different predictive models.

3.2.4 Database size info

Size in MB of both the Octoparse database and the ScrapeStorm db as of the last time this document was refreshed (updated daily):

DB name Size in MB Today
Octoparse 34,026.359 2020-02-27
PredictCryptoPredictions 1,859.141 2020-02-27
ScrapeStorm 1,633.703 2020-02-27

Number of rows by table:

Database Table Name Rows
Octoparse Bitgur 23,421,450
Octoparse BitgurPerformance 3,676,837
Octoparse TechnicalAnalysis 2,118,285
Octoparse InvestingPrices 1,956,712
ScrapeStorm SymbolsMessariJoin 1,836,586
ScrapeStorm Messari 1,327,760
ScrapeStorm ShrimpyPricesBTC 1,255,753
Octoparse CoinToBuyCountries 1,225,981
ScrapeStorm BitgurBackup 969,123
Octoparse CoinCheckup 878,801
ScrapeStorm ShrimpyPrices 758,782
ScrapeStorm IntoTheBlockPrices 327,292
Octoparse CoinCheckupDetails 127,200
ScrapeStorm KuCoinPrices 102,375
ScrapeStorm GithubActivity_9months 29,564
ScrapeStorm GithubActivity_6months 25,196
ScrapeStorm GithubActivity_12months 25,146
ScrapeStorm GithubActivity_3months 19,826

3.2.5 Why two web scraping tools?

Web scraping has its challenges in terms of stability, so I built some additional resilience by using two different tools that work independently of each other and do similar things (and in some cases collect the same data). Although not a perfect solution, having both up and running means we can usually fill the gaps that might arise in each tool respectively.