Linked-to objects aren't required to have identifiers now, so object_url()
in custom extensions might need to be changed.
The one in the test did, for example.
Now results are sorted by is-this-your-language (times levenshtein
distance, if appropriate), then by rough class of result (Pokémon, then
moves, then abilities, etc.) and finally by name.
This fixes a couple issues:
- If both a foreign name and a local name matched a wildcard lookup,
you'll see the local name. Before, you'd see whichever happened to be
first alphabetically.
- Wildcard results are more likely to have useful stuff at the top,
rather than being dominated by foreign junk and names of obscure
locations.
This also updates our usage of the whoosh API, which was old and busted
as of 2.0 or so.
Language identifiers are stored and retrieved, rather than English
names.
Language weighting biases towards the current language, rather than to
English.
Language is no longer considered nullable to indicate English.
Duplicate names in other languages are no longer omitted from the index.
Previously, every single spline-pokedex request tacked another markdown
extension onto a global list in spline, making markdown processing just
a little bit slower over time. This is terrible.
Now we do something a little less crazy and a little more global. Wait,
is that less crazy or more?
All accessors now take a `root` arg, the root of the media tree.
Alternatively `root` can be a custom MediaFile subclass, which should allow
neat tricks like:
- Checking some kind of manifest to prevent stat() calls
- Custom properties of the file objects (e.g. for HTML <img> tags)
- Downloading the media on demand
Tests assume media is at pokedex/data/media, skip otherwise.
- the Session has a `pokedex_link_maker` property, whose `object_url`
method is used to make URLs in Markdown
- pokemon.names_table.name is now an ordinary Unicode column
- pokemon.name is a MarkdownString that is aware of the session and the
language the string is in
- pokemon.name_map is a dict-like association_proxy of the above
- move.effect works similarly, with transparent $effect_chance substitution
as before
- as_text() is now a function that takes the session as an argument
- likewise as_html(), which also takes URL makers and the language
- since there should be only one link extension, it is registered by
setting default_link_extension, not appending to markdown_extensions.
This only affects the __html__ attribute.
Sometimes, translations are incomplete. Handle this gracefully by allowing
fallback languages. If there are none, fall back to the identifier to get
at least some order.
A few tests of the accessors, along with a very dumb, long-running script
to ensure everything is in its proper place, and there's nothing but the
proper things.
For now it still finds some beta form cruft for Burmy, Pichu and Cherrim.
(Translations cannot be dumped properly because the source string hash
isn't in the database.)
By default, unofficial texts are only dumped for English, but that can
be configured if someone wants CSVs for different language(s).
Official texts (<thing>_names rows for official languages) are always
dumped.
There are now (well, have been for a while) multiple ways to evolve
a Pokémon from its unique parent, so the current schema wasn't working.
The parent Pokémon has moved back to the main pokemon table, and
pokemon_evolution has grown an artificial primary key.
New evolution methods for Milotic, Leafeon, Glaceon, Magnezone, and
Probopass have been added.
English and Japanese. Woo!
The text dump contained a bunch of duplicate location names (possibly
for the Entralink?). I've merged them in the locations table, but
location_game_indices still has the duplicates—that is, a location can
now have multiple game_index values in one generation (necessitating a
small schema change).
As per http://bugs.veekun.com/projects/pokedex/wiki/Identifiers?version=3.
- The following tables were handled in commit "2090e34 Move English
texts to language-specific tables": berry_firmness, item_categories,
move_battle_styles, move_damage_classes, move_effect_categories,
pokeathlon_stats, pokemon_colors, pokemon_habitats, regions, types,
versions.
- These tables are skipped, pending further discussion:
generations, growth_rates, move_targets, stats.
- Deviations from the wiki:
- egg_groups: 'no-eggs' is not changed to 'noeggs'
- encounter_terrains: the 'old-rod' alternative is used.
- types: 'unknown' is not changed to '???'
- pokemon_move_methods:
- 'level-up' is not changed to 'level'
- 'colosseum-purification' and 'xd-purification' are left alone,
because colosseum and xd have not yet been added as versions.
- 'xd-shadow' is left alone for consistency with 'xd-purificaiton'.
Importing pokedex can take several seconds due to its rather large
dependencies—in particular, sqlalchemy, whoosh, and pkg_resources seem
to be the largest offenders. Normally, it would be possible to import
only the submodules one needs (pokedex.db, say), but pokedex.__init__
brings in all the submodules, for use by the command-line interface.
The fix is rather obvious:
- Move the command-line stuff into pokedex.main.
Note: because the submodules are no longer imported by default, any
script which expects `import pokedex` to be useful will likely break.
Note: the `pokedex` command will not work until you re-run `python
setup.py develop`, to update entry_points.txt.
- Don't import pkg_resources until necessary.
Add English as a language
Add columns:
identifier: same as iso639 except 'roomaji' for Roomaji
order: English first, then Japanese and Roomaji, others undefined
official: True for all the languages so far
- Helper base class: Named
Subclasses: OfficiallyNamed, UnofficiallyNamed
for these, a 'name' column is created in the appropriate text table
also, they get automatic __str__/__repr__/__unicode__
- Faux columns: ProseColumn, TextColumn
these become columns in the appropriate text tables
these text tables (*_text, *_prose) are auto-generated at the end
the main table gets one property (singular name) that gets the English text
and one (plural name) with dict of texts keyed by language
- Every named table gets 'identifier'
- Languages compare & hash equal to their identifiers
- Existing foreign-name tables replaced by the autogenerated ones
- order_by: names replaced by identifiers
- New function: all_tables(), yields all tables
- Markdown move properties removed for now
- Schema test suite
As far as evolution is concerned, it's always day or night--there is no
in-between period when Eevee will not evolve into Espeon or Umbreon, and
there are no Pokémon that only evolve during the early part of the day.
Having 'morning' as a separate value is thus misleading, albeit not
terribly misleading since it never appeared outside of tables.py.
Chimecho does not have gender differences; its Platinum and HG/SS second-
frame female backsprites have one hand posed a little differently, but
no actual design differences.
Torchic does count even though the only difference is a single-pixel dark
speck on the male's rear; the speck was carried over into B/W even though
the backsprites were entirely redone, so I'm guessing it either was
deliberate or has ascended into canon via "let's throw it in anyway."
The French section of pokemonblackwhite.com updated and fixed the Tail
Slap issue! \o/
"Coeur de Coq" and "Eclair Fou" are spelled like that to match the game
names—no "œ"; no accents on capital letters. They'd be preferable, but
I'd rather keep our names consistent until I can get them for
*everything*.
Nothing but Pokémon yet because a) I don't feel like it right now;
b) the French section on pokemonblackwhite.com calls Tail Slap two
different things.
I sort of liked having it there as a bit of a joke, but it looks sort of
weird to have a not-really-there type included all over, especially now
that it really is... not there.
I noticed that the Togepi line was also missing Magic Coat when I went to
double-check Togetic's Twister. I checked Togepi and Togekiss and added
that, too.
Also fix shikijika, mebukijika, and genosekuto's alt-form icons, even
though they aren't used anywhere, and fix their cropped sprites.
... Also fix basurao's entries in the pokemon_form_sprites table.
Apparently pokemon_form_sprites isn't for form sprites, but for what
Eevee calls """sprite forms""". So the form names are form names, and
not just for image filenames. So the space versus hyphen matters for
flavour page links.
There's now a hole in the items table: there's no item 667. There are
two records for the Live Caster in B/W, and I couldn't figure out why,
or see any difference between them, and they were causing problems, so
I deleted the second one.
Primo is the dude in the Violet City Pokémon Center who used to host the
Teachy TV programs and now sits around asking passersby what they think
of him or whatever. If you tell him the right phrases for your trainer
ID, he'll give you an egg. See: http://www.filb.de/games/tools/aikotoba
The step counts we had weren't even good estimates. To hatch an egg
uninterrupted takes (counter + 1) * 255 steps in gen IV; what we had
was counter * 256.
Phione and Manaphy have different counters, as do Croagunk and Toxicroak
for some reason, so they're associated with individual Pokémon now,
rather than entire evolution chains. Double-checked with Pearl,
Platinum, and SoulSilver; there were no differences between the three,
aside from the alternate forms introduced in Platinum.
Taken from a SoulSilver text dump. No other errors.
Not so obvious: Bayleef had a hiragana "be" instead of a katakana "be".
Must have missed it when we noticed herugaa et al had hiragana "he"
instead of katakana.
Level up, TM, and tutor moves have already been ripped, so this should
be the last.
There are no changes (from what we had before) to Crystal, and only a
few additions to Gold/Silver.
Also, just to be safe, i checked that the egg moves in Silver are the
same as in Gold.
Thanks once again to UPC--it's easier to find something when you know
what you're looking for.
Most names as ripped from HeartGold or SoulSilver. Gen-III-only names
ripped from Emerald and de-allcapsed; for French, I also judged where
accents belong on newly-lowercase letters. A couple of them might have
mistakes.
- Gen I has them all mixed around.
- Gen II has no surprises, but I figured it's good to be thorough.
- Gen III has the first 251 in order, then a big break, then the
third-gen Pokémon mixed around, though families are usually together.
- Gen IV has the 493 in order and then alternate forms after Arceus,
which will be useful to have once Gen V comes and we have to bump
the alt forms in the pokemon table forward.
The Gen III data didn't have any errors, and I assume our Gen IV data is
much more recent and trustworthy and isn't worth checking. Crystal
tutor compatibility is stored right after HMs, so it was easy; I don't
know about any other tutors.
Gen III and IV only seem to shy-hyphenate compound words; I determined
whether or not to use a shy hyphen by looking at other instances of the
word. If it's consistently not hyphenated or just hyphenated on a line
break, I figure they mean for it to be a compound word, e.g.
"kindhearted" rather than "kind-hearted".
"Supereffective" is weird, but they seem to consistently spell it as all
one word when it's an attributive adjective, only ever hyphenating it on
a line break and only spacing it as a predicative adjective. So I
counted it as a compound word in the flavour text for Filter and Solid
Rock.
"Fire-\nand Ice-type" should be displayed "Fire- and Ice-type", but the
flavour text rendering can't tell that it's not "Fire-and". Added zero-
width spaces to invisibly separate these hyphens from the newlines,
preventing them from being interpreted as hyphenated words split over
two lines.
Items with the same name are considered the same. So, for example,
Storage Key is all one item, even though there are multiple storage keys
named "Storage Key" across the generations. As far as I know, this only
ever affects miscellaneous keys.
The Itemfinder is considered the same item as the Dowsing MCHN. They
have the same Japanese name and do the same thing; as far as I'm
concerned, the name change is just another data change.
I wrote effects for the newly-added items very quickly. They aren't
very good. I'm leaving it up to whoever takes care of issue #247 to
write good ones.
I meant to include this in the last commit. Whoops.
Rotom's description is *really long*, so I needed to bump the length up
to fit it. Also changed it to an RstTextColumn.
They now use our modified reST to link a few things like "Gracidea",
mention HG/SS where applicable, and are much more correct in general.
I might have missed some odd thing, and there are still a couple of
stylistic issues. Rotom's description is really long, for example, and
I'm not sure what to do about that; all of it seems fairly important.
This adds Japanese, French, German, Spanish, and Italian names, as
ripped from SoulSilver (Japanese) or Platinum (everything else).
This also fixes a couple of backrefs.
Gen II move flavour sometimes has shy hyphens; these, like in the
Pokémon flavour text, are represented by U+00AD SHY HYPHEN even though
the Unicode standard specifies that it be used to mark where a shy
hyphen *could* go rather than where one was placed. (Supposedly, at
least; I haven't read it for myself.)
I compared with a rip from a Mystery Dungeon game. These are the only
two that didn't match, ignoring accents on capital letters. I need to
find an official list of names that includes accents on capital
letters....
We had D/P flavour text in the abilities table already, but I didn't
entirely trust it, so I reripped it along with the rest when I moved
flavour text into its own table. And we didn't actually use the D/P
text anywhere, so I'm just going to pretend that it is entirely new.
Page breaks are represented by form feeds and soft hyphens are
represented by soft hyphens, even though the Unicode standard's idea of
a soft hyphen is different from what we mean here.
My ripping scripts are at http://github.com/Zhorken/pokemon-flavour
- Everything now accepts -i, -e, -q, and -v.
- Plumbing commands now announce what database/index they're using and
where they got them from.
- New command status, which does nothing but still does the announcing.
- New command reindex, which recreates only the whoosh index.
- encounter_type_id -> encounter_terrain_id
- Added a version_id column. Previous rates were from Diamond and
HeartGold; these have been copied to Pearl & Platinum and SoulSilver,
respectively, which i assume is accurate. RBY rates need to be added.
Based on a Platinum text dump; I'm pretty sure Conversion2 was all one
word at some point.
Interestingly, the use messages for U-turn all read "___________ used
U-Turn!", but it's "U-turn" as the actual move name.
- Wobbles are based on WHICH number is greater than some pivot, not how
many. This was making everything totally wrong, especially 0 wobbles.
- HG/SS balls all modify capture rate, rather than ball bonus.
- Everything really is integer math; even the sqrts. Bonuses are
relative to 10, not 1. HP is now treated as integer math, too.
- Implemented a minor game bug with very hard to catch Pokémon.
Now state is held within an object, rather than passed back to the
caller who must then pass it in again. That was retarded and I don't
know why I ever did it.
Code is much cleaner now.
With apologies to anyone running annotate.
Language codes are ISO 639-1; country codes are ISO 3166-1 alpha-2.
The country codes are important to keep for flags and stuff, I guess,
but reporting the language code as a short form for the language is
more correct.
Gonna see if I can do that, I guess. I added the language codes mostly
just because I was adding languages.
The only differences from Platinum are that Shuckle holds a Berry
Juice, Sky Shaymin holds a Lum Berry, and the *rizers are only held by
the final forms, only 5% of the time.
Every flavor page should work with no missing sprites. Save perhaps for
Unown, because I honestly don't have them.
Every sprite exists as ###-form.png. There is also still a ###.png,
containing a reasonable default form, so people who don't give a crap
about this mess can just use the numbered sprites. Beta forms should
now all be ###-beta.png.
Form groups now have a notion of "in-battle", which is used to hide
overworld sprites when appropriate.
Form sprites have a first-class sense of being a default or not, too.
Deoxys is... well, let's not talk about Deoxys. Deoxys is fixed.
Taken from http://www.pokepedia.fr/ (Liste des Pokémon dans l'ordre du
Pokédex National). They apparently took them from the French Mystery
Dungeon games (Poképédia:Conventions de Style).
This also corrects some typos.
This had been done before, but some of the changes were lost when I
re-ripped Diamond and Pearl.
Also, Turnback Cave has been collapsed into seven sections rather than
four. The previous change in particular ignored that the encounter
rates for the first three areas were lower than elsewhere. I'm
conjecturing wildly, but I believe those first three are the actual
pillar rooms, and the following four identical groups are the groups of
rooms between the pillars.
Conditions are now condition values; condition groups are conditions.
Types are now terrain. Slots are first-class things.
Encounters' condition values and slots' conditions have been broken off
into their own tables, as HG/SS has several slots affected by multiple
conditions.
This also fixes an absolute TON of errors with evolved Pokémon learning
a move both at level 1 and the pre-evolution's level, as well as
miscellaneous other problems.
Only the version group a forme actually exists in now has any moves for
that forme.
In addition, Deoxys formes were not showing any gen 3 moves at all
previously, because they were marked as only existing in gen 4. This
has been fixed.
Also fixed roomaji conversion to not die spectacularly when given
hiragana. For some reason I let it know about hiragana soukuon and
youon, but nothing else, so it gets totally confused.
This gives the correct ordering to level-up moves that have the same
level.
It also fixes move errors with Wartortle, Blastoise, Persian, Golduck,
Rapidash, Kabutops, Croconaw, Feraligatr, Noctowl, Sharpedo, Piplup's
family, Shinx's family, and Yanmega. Yikes.
Wrote a little add() function to clean up the duplication of
add_document().
Delete the index directory if it exists and we're being forced to
recreate it.
English fuzzy matches are preferred, followed by Roomaji and then
everything else.
The return tuple from lookup() now has a `name` parameter for the actual
name that was matched.
The setup command loads the default data into a default location, then
creates a whoosh index in a default location.
get_index is now open_index and can be made to explicitly recreate the
index. It also actually opens the index if it already existed, even
across processes, now that FileStorage is working.
The lookup command takes no switches for aiming at a different database;
it only uses the default data stores.
csvimport is now load; csvexport is now dump.
Both take an optional -e switch to specify an engine, but will happily
use a default SQLite database in the pokedex package directory.
Additionally, the CSV directory is now controlled by the optional -d
switch, and defaults to Doing The Right Thing.
So `pokedex load` now does exactly what you'd expect: loads the data
from the right files into a consistently-located database.
Good news: This no longer relies on InnoDB's default row order.
Bad news: InnoDB in MySQL 5.0 has a bug where it will sort rows
physically according to a secondary index, if there's a composite
primary key and a single-column index and the phase of the moon is
right. So a couple tables have been, once again, reordered -- but
correctly this time.
Good news: This bug will no longer fuck me up!
Whoosh's spelling module unfortunately ignores any "words" that don't
look like words, even though the algorithm words fine with arbitrary
input.
I had to clone some code from whoosh.spelling, but avoiding the
isalpha() check solved a bunch of problems. Now the index happily
compares against anything I feed into it.
Curse's type_id was 0, which is bogus; this has been fixed by creating a
real ????? type.
Fourth-gen moves all had zero as a contest effect id, which was also
bogus.
Pokémon 494 and 495 were junk and have been scrapped entirely.
pokemon_form_groups's description column was too short.
pokedex's connect() now takes kwargs passed to sessionmaker().
A more major change: some tables, like pokemon, are self-referential and
contain rows that refer to rows later in the table (for example, Pikachu
evolves from Pichu, which has a higher id). At the moment such a row is
loaded, the foreign key is thus bogus. I solved this by turning on
autocommit and wrapping add() in a try block, then attempting to readd
every failed row again after the rest of the table is finished. Slows
the import down a bit, but makes it work perfectly with foreign key
checks on.
Types, abilities, egg groups, and stats for Shaymin, Giratina, and Rotom forms.
Updated height and weight for Shaymin and Giratina forms.
Added Giratina's form descriptions and updated Shaymin's to mention link
battles and freezing.
All Solaceon Ruins rooms are identical, so there is no reason to have
them duplicated.
All the Old Chateau rooms are similarly identical, EXCEPT for the lone
room that can spawn Gengar. I also left the Rotom room in, for when I
get around to adding event encounters.
Great Marsh is now Great Marsh instead of Safari Zone.
Ruin Maniac Tunnel has been consolidated into one location, and the
areas are actually named informatively.
Turnback Cave has more appropriate area names.
Routes with two parts now mention the cardinal direction in the area
names. Also, several town names have been fixed.
Lake Verity's area names now mention WHAT they are before/after.
Apparently the secret property on a singleton hidden in the guts of
SQLAlchemy has been made private recently, so what I wanted to do (get a
list of all ORM classes) is now impossible. I gave up on trying to find
a real solution and just slapped together something using dir().
This does NOT actually change the data at all! These tables were
apparently created with no key defined, so the rows were in arbitrary
order -- but when I created and populated the tables in MySQL on
nyarumaa, the keys were defined correctly, and InnoDB ordered them by
key. This is about what should happen anyway and the discrepancy adds
clutter when dumping corrections, so I'm just committing the new order.
It used to abruptly abort if a csv file were missing, which wasn't very
nice when I'd just added a new table definition and was trying to reload
everything else.
Now it prints a status per table while loading, and will declare missing
tables to be... missing.
Finally! Location order is the same as from the old dex, which was
something like the game but ultimately arbitrary, so it's not any better
now.
This takes a very different approach to storage, rather than copying the
game exactly and trying to fix everything in code. Comments coming
shortly so other people can actually make use of this.