However, after analysis, I decide to pause it because the query to get the most popular accounts is very slow - it takes more than 20 minutes to retrieve results.
On the other hand, the query to get the most blocked accounts is fast - less than a minute - which makes it feasible to do so.
When a query takes ages, the SQLite suffers a known database lock problem. SQLite is not very good at concurrency - when a thread is reading all rows, all other write operations (such as insertion) will be locked:
1 2 3 4 5
database is locked Traceback (most recent call last): File "test.py", line 135, in <module> con.commit() sqlite3.OperationalError: database is locked
Almost Same query - but with different parameters. For ‘most blocked’ the filtering is where what = 'ignore' and for ‘most popular’ the filtering is where what = 'blog'
This is the table schema and indices:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
sqlite> .schema CREATE TABLE mute ( who text, whom text, what text, time text, block integer, constraint pkey primary key (who, whom) ); CREATE INDEX index_from on mute(who); CREATE INDEX index_to on mute(whom); CREATE INDEX index_time on mute(time); CREATE INDEX index_what on mute(what); CREATE INDEX index_block on mute(block);
And if we run the query to see the number of the candidate rows:
1 2 3 4
sqlite> select count(rowid) from mute where what='ignore'; 332096 sqlite> select count(rowid) from mute where what='blog'; 100757608
It is a huge difference! Yes, for both queries, SQLite will utilize the index what and process the result set by grouping - the more rows - the more time it requires!
This can be confirmed by the explain statement which takes you how the SQLite will process your query.
SQLite is a relational database - which makes it inefficient in handling a large dataset. The steem blockchain has been running for years - the data accumulates quickly.