Importing pokedex can take several seconds due to its rather large
dependencies—in particular, sqlalchemy, whoosh, and pkg_resources seem
to be the largest offenders. Normally, it would be possible to import
only the submodules one needs (pokedex.db, say), but pokedex.__init__
brings in all the submodules, for use by the command-line interface.
The fix is rather obvious:
- Move the command-line stuff into pokedex.main.
Note: because the submodules are no longer imported by default, any
script which expects `import pokedex` to be useful will likely break.
Note: the `pokedex` command will not work until you re-run `python
setup.py develop`, to update entry_points.txt.
- Don't import pkg_resources until necessary.
- Everything now accepts -i, -e, -q, and -v.
- Plumbing commands now announce what database/index they're using and
where they got them from.
- New command status, which does nothing but still does the announcing.
- New command reindex, which recreates only the whoosh index.
Now state is held within an object, rather than passed back to the
caller who must then pass it in again. That was retarded and I don't
know why I ever did it.
Code is much cleaner now.
With apologies to anyone running annotate.
English fuzzy matches are preferred, followed by Roomaji and then
everything else.
The return tuple from lookup() now has a `name` parameter for the actual
name that was matched.
The setup command loads the default data into a default location, then
creates a whoosh index in a default location.
get_index is now open_index and can be made to explicitly recreate the
index. It also actually opens the index if it already existed, even
across processes, now that FileStorage is working.
The lookup command takes no switches for aiming at a different database;
it only uses the default data stores.
csvimport is now load; csvexport is now dump.
Both take an optional -e switch to specify an engine, but will happily
use a default SQLite database in the pokedex package directory.
Additionally, the CSV directory is now controlled by the optional -d
switch, and defaults to Doing The Right Thing.
So `pokedex load` now does exactly what you'd expect: loads the data
from the right files into a consistently-located database.
Good news: This no longer relies on InnoDB's default row order.
Bad news: InnoDB in MySQL 5.0 has a bug where it will sort rows
physically according to a secondary index, if there's a composite
primary key and a single-column index and the phase of the moon is
right. So a couple tables have been, once again, reordered -- but
correctly this time.
Good news: This bug will no longer fuck me up!
Curse's type_id was 0, which is bogus; this has been fixed by creating a
real ????? type.
Fourth-gen moves all had zero as a contest effect id, which was also
bogus.
Pokémon 494 and 495 were junk and have been scrapped entirely.
pokemon_form_groups's description column was too short.
pokedex's connect() now takes kwargs passed to sessionmaker().
A more major change: some tables, like pokemon, are self-referential and
contain rows that refer to rows later in the table (for example, Pikachu
evolves from Pichu, which has a higher id). At the moment such a row is
loaded, the foreign key is thus bogus. I solved this by turning on
autocommit and wrapping add() in a try block, then attempting to readd
every failed row again after the rest of the table is finished. Slows
the import down a bit, but makes it work perfectly with foreign key
checks on.
Apparently the secret property on a singleton hidden in the guts of
SQLAlchemy has been made private recently, so what I wanted to do (get a
list of all ORM classes) is now impossible. I gave up on trying to find
a real solution and just slapped together something using dir().
It used to abruptly abort if a csv file were missing, which wasn't very
nice when I'd just added a new table definition and was trying to reload
everything else.
Now it prints a status per table while loading, and will declare missing
tables to be... missing.
Tables weren't being defined as UTF-8 if that wasn't the server default.
A lot of tables were trying to create erroneous auto_increment columns.
Foreign key checks were pretty much fucking everything up.