Adventures with sources

Note: I’ve been involved in a particular part of this project for nearly a week and have material for numerous posts but due to holiday activities haven’t had the time to make any posts, so now I’ll just provide the background for some future posts when I catch up.

My project involves building up a large corpus of Spanish food items and descriptions (aka preparations, recipes)  from menus of actual restaurants in Spain and then from that corpus deducing (ideally with some AI-ish code) a Spanish-to-English “vocabulary” of food terms/phrases/concepts, not just for Spanish in general, but specific to Iberian Spanish (thus avoiding many terms only found in Western Hemisphere, or worse, that mean something different in Spain than elsewhere (e.g. tortilla)).

So I was doing my virtual walk along the Camino de Santiago and finding restaurants via POIs on Google Maps and then grabbing those that had websites and online menus to extract for my corpus, a tedious process but one that produces good “raw” data plus gives me a chance to actually internalize some Spain food knowledge (vs pure AI approach that Google uses). In doing this I went off on two levels of distraction. First, I noticed a grocery store (supermercado) on the map that then had a website and as it turns out, also, online ordering. This was a great source since it had photos of the available products and thus the opportunity to get equivalent English words for the Spanish names of food items PLUS this would be oriented to Spain, not general Spanish. Second, in doing this I stumbled on to another business that is selling food online, Gallina Blanca, which I learned was a rather large multinational supplier to restaurants and homecooks. On their website they have a large number of recipes (recetas) which if I get decent translations would be a large source for my corpus, not just ingredients, but also preparation techniques.

But then I discovered an interesting item at the bottom of the webpage, a link to a diccionario.  I thought this would be a real bonanza of food terms but quickly learned: a) it is a real pain to extract information just due to the mechanics of how the webpage is built (i.e. mostly javascript, not HTML, therefore nothing to “copy” (with mouse selection) despite seeing it on the screen), and, b) then as I tediously did begin to extract some information and began to work out a process I realized ‘dictionary’, in this case, meant something different than I thought. I’m used to finding “translation dictionaries” online and referring to these just as ‘dictionary’. But at Gallina Blanca’s site they have the classic notion of a dictionary, i.e. a word (or term) and its definition, in Spanish. What I was expecting, naively, was the English equivalent for Spanish food terms and it turns out I have to do a lot more work to get that. So, for example,

AGUACATE Árbol originario de América, cultivado por su fruto, de pulpa espesa y perfumada. Muy usado para ensaladas, salsas y sopas. Native tree of America, cultivated by its fruit, of thick and fragrant pulp. Widely used for salads, sauces and soups.

Note: This is a good example of how there are enough cognates in the Spanish definition that it is possible, plus knowing only a few rules, to assign the word-by-word correspondence between Spanish and English for the corpus even without any significant knowledge of Spanish.

The actual dictionary gives me the definition in Spanish of aguacate. I used spanishdict.com’s translate function to get the English from the Spanish definition.  BUT, this didn’t quite get me the same thing as finding that aguacate literally translates to avocado (which is itself a loanword in English) which then gives me a much better notion of what aguacate really means (especially in context of a restaurant menu) than the dictionary definition.

But in other cases having the definition is handy, especially in comparison to the literal translations in some dictionaries. For instance,

ABRILLANTAR Dar brillo a cualquier preparado con jalea, gelatina, grasa, o pintando con huevo la superficie de un manjar antes de meter al horno o de presentarlo. Give shine to any preparation with jelly, jelly, fat, or by painting with egg the surface of a delicacy before putting it in the oven or presenting it.

Brighten any preparation with jelly, jelly, grease, or painting with egg the surface of a delicacy before putting it in the oven or presenting it.

For this word I got both the spanishdict.com translation (first one) and the Google translation (second one). abrillantar is obviously a verb and literally means ‘to polish’  (which one might guess) but its meaning in the cooking sense is better explained by the definition provided from this website, which I’d probably translate simply as ‘glaze’ even though glasear is the Spanish verb for that.

So after some time I have steadily refined my process (and streamlined it a bit, actually learning a semi-hidden feature of MSWord to reduce number of manual steps per entry) and begun to realize what I can really learn from the tedious process of crunching through a large number of terms (gracias, GallinaBlanca). And several of those entries will be the basis of some future posts.

But I have been curious about attempting to discover the source and range of this dictionary. GallinaBlanca doesn’t say anything (that I can find) about where they obtained this dictionary. So I face a classic problem I had in earlier versions of this project of getting terms that really are for Spain, not somewhere else in Spanish-speaking world where those terms might not be used or understood in a restaurant in Spain. But without any explanation of this dictionary this is a guessing process for me, but occasionally I get clues. For example,

AREPA Pan de maíz amasado con huevos y manteca. Corn bread kneaded with eggs and lard.

Wikipedia has a good article on arepa that makes it fairly clear this is something common in Colombia or Venezuela but doesn’t even mention Spain. This is just one clue (I’ve had a few others in my work thus far but none as clear as this one, yet) and so I expect to find more as I work through and thus, hopefully, determine if this dictionary cannot fully apply to Spain (which, btw, I learned (I think) is best to refer to its Spanish as ‘Iberian’ or ‘peninsular’ (and not Castilian or castellano, since that irritates some people).

Also given there is no explanation of the source of this dictionary at the website I also have questions whether it is accurate. There is a clear error in the javascript – each page of the dictionary (by letter) is obtained by clicking on the letter in an A B …Z bar except that starting at ‘O’ it’s off by one position (‘O’ gets you words starting with ‘N’ and so forth). IOW, a simple human error. And one error makes me question if there are others. So, for example

ACIDELAR Poner zumo de limón o vinagre en el agua para cocinar huevos escalfados o verduras, para que no ennegrezcan. Put lemon juice or vinegar in the water to cook poached eggs or vegetables, so that they do not blackened. ₽  ₽ Put lemon juice or vinegar in the water to cook poached eggs or vegetables, so they do not blacken.

I couldn’t find acidelar in any online dictionary but did find this

ACIDULAR (literally: acidulate, make sour) Rociar con un líquido ácido frutas, verduras u hortalizas, con el fin de que conserven su blancura o color. Sprinkle with an acidic liquid fruit, vegetables or vegetables so that they retain their whiteness or colour.

Spray fruit, vegetables or vegetables with an acidic liquid, in order to preserve their whiteness or color.

so since the definitions seem mostly the same is the ‘e’ really supposed to be ‘u’ and whoever composed this dictionary just made a typo?

But the most interesting mystery (thus far) along these lines was

ALBARICO Especie de Palma (Bractis setulosa). Species of palm (Bractis setulosa).

I couldn’t find albarico in any online dictionary BUT there is the very similar term (in the GallinaBlanca dictionary and elsewhere) which is

ALBARICOQUE (literally: apricot, probably the fruit, not the tree itself) Fruto de albaricoquero, de hueso liso y piel y carne amarillas. Albaricoquero. Apricot fruit, smooth bone and yellow skin and flesh. Apricot.

So while the descriptions are quite different I thought, perhaps, albarico was simple some short form of albaricoque. But this is wrong.

In some cases words in Spanish, especially for plants, are derived from the Latin scientific nomenclature so I tried to look up, Bractis setulosa, assuming that (from the definition) was the Latin name. No results. So I tried bractis alone (found nothing that made sense) and then setulosa. A-ha! There was a valid entry, that seemed to match definition, for bactris setulosa AND this is a tree, according to Wikipedia ” spiny palm which is found in Colombia, Venezuela, Ecuador, Peru, Trinidad and Tobago and Suriname”. (also evidence this dictionary is not for Iberian Spanish) So that looks like a very plausible human typo, moving the ‘r’ from second syllable where it should be to first syllable where it is wrong (in the sense nothing can be found for that spelling). I felt this was a good piece of detective work on my part to spot this AND it represents fairly good proof there are mistakes in this dictionary. And where there is one there may be more.

So I’m still learning all sorts of interesting things from my slow plodding through all these entries (and will do some more posts) but all this work shows the challenge of trying to get an accurate (and even harder, complete) translation dictionary for Iberian Spanish.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.