I recently described my shift in focus to learning about Spanish comida terms from studying recipes (recetas) instead of menus (cartas: my original focus) and also focusing on Mexico instead of Spain. I’ve just begun this project but I have processed a single sample that I’ve analyzed to reflect on the difference.
My basic question is how could one compile the largest and most comprehensive corpus of words related to food, cooking, dining and gastronomy. An extensive and as accurate as possible corpus can then be fed into a computer program (AI, or fairly conventional algorithmic) to generate a “translation” tool, not to translate in literature sense but good enough for a diner to select what they want to eat from a menu.
There are many ways but four main approaches to ordering food in Spanish: 1) be completely fluent in Spanish, as well as the cuisine of the local area, 2) use a translation tool based on a corpus just extracted from menus for the desired cuisine, e.g. Spain, maybe even regions of Spain, or any other Spanish speaking country where it’s likely there may be many terms that are not used in Spain, 3) used a translation tool based on the broadest cooking and dining and gastronomy sources, and, 4) achieve sufficient fluency in Spanish to discuss food choices with the waiter, or chef, if needed, or perhaps other people in seeking recommendations.
So, IOW, learn Spanish generally, but also including specialized terminology for food and cooking and gastronomy or just obtain (or in my case, build) a translation tool specialized on food terms. Which is easier and/or most effective? Which would accomplish my original goal?
So I’ve started with a good/fun/interesting site for getting lots of recipes in Spanish (and some with human English translations) for food in Mexico. Now I actually have about 10 cookbooks (we’re a bit of collectors of cookbooks), all in English. So I’m fairly familiar with Mexican cooking so I looked around, briefly, online to find a good Spanish source and found:
Mexico en mi cocina
I explored this site to get a feel for what content is there and settled on this recipe as my first test case: Tostadas de tinga de atún and a companion site in English Tuna tinga toasts.
Now right away we have an interesting word: tostadas. In the general sense of Spanish (and definitely in Spain) this would be ‘toast’ or ‘piece of toast’ and from any previous look at menus in Spain this is what this word appears to mean (or sometimes equivalent to crostini or bruschetta). But to anyone who has eaten in most any Mexican restaurant in USA (or presumably Mexico), it has the meaning my dictionary lists as applicable to Mexico as tortilla. Now in Mexico (and USA) tortilla is the familiar “maiz pancake” as the dictionary says, although often it may not be from maiz (corn) but also might be wheat flour, sometimes even whole wheat flour. In contrast, in Spain, tortilla is almost universally a kind of omelet (as dictionaries or Duolingo say, but it’s really closer to the Italian frittata than the French omelet; in fact, on some menus I studied in Spain what we norteamericanos think of as ‘omelet’ is called tortilla frances. So right away I have a good example of how Spanish words are not universally understood the same way in different Spanish speaking countries.
All this said, however, a dish in a restaurante mexicano in the USA labelled as a tostada would not be just a tortilla, but a tortilla, usually fried and crisp, placed flat on a plate and piled with various additional ingredients. In fact this receta I’m using as an example, the corn tortilla has a thin spread of frijoles, then the tinga de atún (tuna in a red sauce), then shredded lettuce (lechaga) and then a dressing of Mexican crema (something similar to sour cream or crème fraîche). I think you can see a picture from this site (or use the main url to go to the page).
I may do some other post about some other interesting issues, on this page, about reading Spanish but now I just want to show a couple of statistics about the issue of knowing Spanish (generally) versus just looking at food/cooking related Spanish.
As of today, I’ve studied in Duolingo for 550 days. I’ve done 94 of their lessons (known as “skills”). According to their statistics I’m 59.1% complete, and have done 2623 lexemes (58.7%) out of 4466. Several of the skills have been focused on restaurants or grocery scenarios. My rough guess is I’ve spent about 1500 hours just on this study. In addition I’ve now completed about 30 hours of intensive “immersion” technique classroom study. I’ve come fairly close to completing the A2 (CERF) standard level of Spanish, which means I’m getting close to intermediate level, although in terms of verbal proficiency I’m still back in early A1 level, IOW, just barely able to talk to a waiter, not hold an extensive culinary discussion. All this is certainly in the range of about one year of high school Spanish, maybe even a bit more.
Now what does that do to help me read the receta? Interestingly fairly useful, although I have to say also having the pictures of the preparation of the dish helped me puzzle through some words I didn’t know. And for the most part I could “parse” almost all of the sentences as I’m basically familiar with most of the Spanish grammar to read this.
BUT, and a big but.
I just don’t have enough vocabulary to really read this. So that’s some of the data I’ve analyzed. This is a problem with learning a language. A small number of words are the most frequently used and thus quickly learned in general Spanish classes but then a vast number of words is required to really understand. IOW, you spend 10% of your time to learn 80% of the text (by count) and another 1000% of your time to learn the other 20% (by count). The, of, and, for are handy to know but have little information content.
I have written a couple of programs to help me: 1) a program (lexer) with a lot of options and special features to identify all the unique terms/words/lexemes (essentially the same thing in this context), and, 2) another program (flashcards) I use for my own types of drills, where I have coded all the words I’ve encountered in Duolingo (that 2623 number above) but that I expand with all the conjugations for the tenses I’ve learned and a few more variations so my drill has about 4500 words in it. I then have a option to compare all the words from lexer with all the words in my flashcards to find “new” words.
So the text of the webpage for this recipe, which includes some descriptive material, not just only the recipe, has 226 unique words (for instance, it has cebolla (onion) and cebollas which I count as two words, even though cebollas is just the plural of cebolla; or cocido, cocina, cocínalos, cocine which are different forms of the verb cocinar (to cook); or la, las, los (but not el), which are variants of the in Spanish, the most common words).
IOW, 226 “words” is not very many, but how big would my own vocabulary need to be to be likely to know most of these 226 words?
Well we start with the statistics that I’ve learned 93 out of 226, (41.2%) of these words in 1.5 years of studying Spanish, so by that measure I’ve got 2.16 years to go. BUT, many of these words are specialized to cooking and thus not very likely to be learned in another two years of general Spanish. So here are the words I’ve learned in 1.5 years, all fairly common:
aceite cebolla cebollas cena cocina comer comida comidas fresco frijoles fuego jugos latas menú mexicana mexicano pescado picante plato preparar queso sal saludable saludables suave taza tazas tomate tostada tostadas vegetal
Not bad, but try to figure out the recipe from just that vocabulary. BTW, fuego, which I’ve learned as ‘fire’ and medio (media for me since it goes with hora which is feminine) I’ve learned as ‘half past'(as in a time) and alto which I’ve learned at ‘tall’ are used in the phrase, fuego medio-alto, which one, with my knowledge I might, but just barely, guess is ‘medium-high heat’. Did you get it, Dear Reader? So while I’ve “learned” these three words I’ve never had them in this combination, so ‘fire half-past tall’ is a pretty lousy translation.
For instance, here are the words (50) from the ingredients part of the recipe (including a few terms for measures), with the words I haven’t had in Duolingo marked in red:
aceite adobo ajo al atún blanca cada cebolla chile chipotle crema cucharada cucharadita de desmenuzado diente en enlatado finamente fresco frijoles grande gusto latas lechuga maíz mediano mexicana mexicano negros o onzas orégano picada picado pimienta pintos queso refritos sal tamaño taza tazas tomate tostadas una vegetal y
So, the words I haven’t learned in a general Spanish class is about half of the ingredients section AND most of the words that are really critical to this recipe I have not learned (some, of course, I remember from studying menus in Spain). So for instance, one ingredient is:
1 diente de ajo grande finamente picado
Now this has an interesting tidbit. In Duolingo I learned diente as tooth and ajo is a very common word most people would know to be garlic. So what is a garlic tooth? My favorite dictionary SpanishDict.Com doesn’t know, word-by-word, what this is, but here’s where 1.5 years of studying Spanish pays off (especially with frequent use of this dictionary and understanding how to look things up) and so the de is an important clue (general Spanish knowledge) which is ‘of’ but more importantly that means ajo is a qualifier of diente, so treating this as a single “term” we find ‘clove of garlic’. So either general study or specific looking at cooking/food terms makes this understandable. Now grande isn’t hard and finamente can be deduced (due to general Spanish knowledge) as an adverb (-mente ending) and a guess (it’s a bit of a cognate) this is either finally or finely and of course finely makes sense in a recipe. Again from general knowledge of Spanish most words ending in -ado are past participles of -AR verbs, which I’d then deduce as being picar. Not very likely to guess that, but by luck, in this blog, I’ve previously learned what para picar means on a menu in Spain. picar has multiple meanings and an somewhat unusual one, ‘to peck’ (like a bird) leads to ‘to nibble’ (for a person), so this somewhat common phrase essentially means ‘to snack on’, i.e. some kind of finger food placed on the table to be shared. But in this recipe its meaning ‘to chop’ applies and the past participle in English would be ‘chopped’, which of course is what it means. So Google Transfer actually got this spot-on
1 large garlic clove, finely minced
So if Google is getting this right, why do we need to either learn Spanish or use an automated tool just for cooking/food? And in fact the Google Translation is very close to the human translation (just a couple of the usual GT mistakes) or my translation. So would be fine as long as you have an internet connection is some tiny town in Mexico, but maybe you’d like to have an app on your phone that works offline.
So let’s consider the final statistic. Of the 93 words in the recipe that I have not encountered in 1.5 years of general study of Spanish, 59 (63.4%) are related to food or cooking. So a word like mariscos (generic ‘seafood’, sometimes just used for shell fish, esp. in Spain is a common “food” word. espolvorea is the conjugated from espolvorear (to sprinkle) which I call a cooking term (you might see this on a menu) or desmenuzado (past participle of desmenuzar (to crumble, among many definitions), so crumbled) is another cooking term. Note: Both of these verbs are in my unfinished COOKING VERBS page so I guess I’ll need to finish that and possibly expand it as I crunch through recipes, as I note a verb in this recipe, ensamblar, that I don’t have in my list and it is a useful verb to include.
tamaño I had to look up (size) which is interesting as I’d learned talla (also size) in Duolingo but it only applies to clothing which is another interesting point – Spanish words have multiple translations into English (and vice-versa) and some of those only apply in certain contexts, so therefore even learning one of multiple meanings in a general Spanish course may not help, or even be confusing.
One thing I can say is that learning how verbs work in Spanish and various, especially all the common, conjugations makes it easier to figure out things and in some case more clear (for instance, -zando vs -zado, is crumbling vs crumbled and that would be handy to know.
So here’s all the words (93) I haven’t had in 1.5 years of general Spanish with the food/cooking words (59) embolded. and words that I can recall from my previous work (as part of my original purpose of this blog), i.e. translating menus in Spain.
acerca activación adicional adobo agrega agregar ahumado ajo alacena aperitivos aprovecha aproximadamente así atún botana calidad calienta cantidad celebrar chile chiles chipotle cocción cocido cocínalos cocine coloca combina condimenta crear crema cubra cucharada cucharadita decisiones dejar delicia deliciosa deliciosas delicioso derretido desmenuzado elaboración enlatado ensamblar envasado espolvorea expresadas finamente fuente gotas haya incluir ingredientes lechuga liberado maíz mariscos mediano mitad oliva onzas opción opiniones orégano patrocinada picada picado pimienta pintos pizca podrás preparación propósito proteínas publicación raciones realmente receta refritos rico rocíe sabor sartén sea será soltado tamaño tinga total transparente usaremos virgen
And, of course, even in this small sample we see a difference between Spain and Mexico’s Spanish in that chipotle, pintos, refritos or tinga are unlucky to appear in Spain. And, as exercise for the reader, in Spain this phrase: Si te gusta la comida un poco picante, would most likely have os instead of te.
So, what does this [over]analysis say? I would conclude that learning Spanish, even in general way, is helpful, but using a standard math/legal paradigm: a) not necessary (although helpful), and, b) not sufficient.