Sometimes I do other things …

… like actually do some cooking, or in this case, baking (and find a new verb at same time)

I´ve been grinding through season 6 of Pati´s Mexican Table, fishing for verbs, and as that season focused its recipes on Oaxaca there is a lot of good stuff there. Doing all this desk work makes me hungry.

For a couple of years, a long time ago, I went crazy baking bread (tons of posts in another blog about that) but I picked a bad time to do it, when otherwise I needed to do some serious weight loss. Bread no matter what flour (sorry whole grain has its advantages, but lower calories is not one of them) you use is high calorie and not a good companion for diets.

So I’ve been stuck dieting again and so have resisted any baking, but one of Pati’s recipes was different than any dough I’d tried, so I had to give it a try. I won’t copy her recipe into this post since you can link to it (Oaxacan Yolk Bread, Pan de yema). So here’s my result:

It’s not as pretty as the photo on Pati’s site but it’s fairly equivalent.

The main thing that makes it different, and hence the name, is all the egg yokes, which along with butter and sugar pushes this in the direction of brioche. It had a divine smell while cooking. It’s also an unusual baking technique involving creating a poolish which adds its own smell (super yeasty) to the kitchen.

I think mine puffed up too much (hence the cuts on the top are not well defined like Pati’s) because I didn’t have any all purpose flour and instead used bread flower. Combined with 12 minutes of machine kneading this created a coarser texture and more gluten development to support the rise than would be ideal.

While I’ve been fishing for verbs in Pati’s recipes I’m simultaneously looking in another source (a Kindle book solely in Spanish, Recetario de Cocina Mexicana by Diana Baker) that had a very similar recipe under the name Pan Oaxaqueño but has another interesting ingredient not in Pati’s receta: 1 cucharada de semillas de anís; it also uses whole eggs instead of the yolks meaning Pati’s name for her bread is more appropriate.

That recipe contains a verb I’ve not encountered in any of Pati’s: barnizar which both Google Translate and spanishdict.com thinks is ‘to varnish’ (SD has a secondary translation of ‘to glaze’ which is a bit closer). Basically this is what was used to describe using the egg wash before baking.

Pati’s uses the language: . Haz 3 cortes encima de cada pan y unta con el huevo y el agua to describe the process (apparently no direct name for egg wash: the English (which I believe is the original) says: Make a 3 slashes on the top of each bun and brush with egg wash (notice how easy it is to line up the Spanish and the English in simple direct sentences like one finds in instructions.)

Apparently there is no very direct translation of ‘egg wash’, so I’ll leave it as an exercise for you, Dear Reader, to explain why lavado de huevo I found online is terrible.

Fishing for verbs is good way to study

I have a backlog of about 8 ideas for posts, but I’m going to skip ahead to my most recent idea.

For over a week I’ve switched gears on my project, dropped getting more NYTimes articles (dual language, a chance to compare human translation to Google, and my own knowledge) and have been accumulating a big collection of cooking verbs, the right way, not from mining lists, but seeing the verbs used in high quality online recipes, published from good authors.

When I started this I had to immediately add some more code to my app, that I originally created to do drills in conjugation but has instead become a way to manage collections of verbs (someday I’ll get around to other parts of speech). So, for example, I have a collection of 536 from my Duolingo lessons where I collected those at a slow rate.

But lately I’ve found a way to mine a very good and large online source that is dual language, English (even though the author is a native Spanish speaker she does her TV shows in English and originally her website was all English and her cookbooks are English), and then someone (no idea who) has translated this into Spanish, and created a parallel Spanish language version of the website (they are not exactly the same, the translations and page layouts have enough “mistakes” (sorry for the criticism) to believe this was entirely a human driven process.

Anyway after four intense days, digging through 30 recipes with a complex process, plus gradually adding more code to my app to speed up the process, I saw a much better way, from the same source to

go fishing for verbs

While I’m a long way from fluent in Spanish, plus there is a ton of vocabulary I don’t know, recipes are simple enough type of text that I can “parse” the steps in the recipe even without knowing all the words. Since this site has English translations I can just snag those and thus dispense with getting Google translation. And since I’m almost entirely just looking for verbs (I do notice some other things) I have speeded up my process where I got about half as much done in one day as it took me four days to do on the first batch of recipes I processed.

But, this has an unexpected consequence, a good one. Unlike the NYTimes stories, which were longer, and even more tedious work to process, and with more complex prose I had a harder time parsing, I can zip through menus fairly quickly. And thus, I’m getting practice.

Maybe some people can quickly learn new vocabulary but for me (and I think most people) a fair amount of repetition is required, in whatever language learning process you’re using, to get to where you have retained most of what you’ve been exposed to. So with the NYTimes articles they were about different subjects every time and so I got little repetition of any new words I encountered, so, IOW, I basically didn’t learn any new vocabulary. I was will aware of this and in fact part of the mechanical work I was doing (accumulating vocabulary and building my own dictionary I can use in my code) was to, eventually, be part of yet another app that would use those stories I’d painfully accumulated for drills.

But I got so lost just getting “data” I spent little time actually studying.

Now zipping through these recipes I’m getting much more repetition and in just the last five days have had enough drill a whole bunch of vocabulary in recipes is sinking in. This actually is similar to what happened when I first started this project, crunching through multiple menus every day I was actually learning something, not just accumulating data.

So, if you want to learn cooking and food vocabulary, just go to any high quality online site with lots of recipes in Spanish (and ideally another language you’d like, or do a bit of tedious work (scrape the Spanish text into some word processor and get the Google translation side=by-side with it (never aligns exactly, but usually close enough) and then read every single word and “extract” whatever you like (verbs are a big chunk of the text, plus you’re getting to practice conjugation recognition, plus other verb-ish things like adjectives derived from participles. And getting used to object pronoun suffixes, simple but hard to get used to since there is no English equivalent.

Practice and more practice, for me, and I suspect for you, you just have to put in the time.

Finding verbs related to cooking -2

Well, after four intense days of crunching through recipes and also writing new code I’m finally done processing season 12 of Pati’s Mexican Table. Naturally I’m not going to directly publish anything from her blog, since that is copyright, but I’ve done some derivative things, just for my research to support my project and present a few results to my readers. Please visit the source for the more interesting material.

I had planned to published various preliminary results but as of late today I have crunched through all 30 recipes (recetas) of seasons 12 (doce) looking for many things, but right now focused on finding verbs. To do this I need both my manual efforts and so code (especially a few new features) to look for and then convert verbs into a useful (to my code) machine readable format (XML and internal objects in my code). Doing all this, collecting raw data and converting to internal data I can process to do various kinds of analysis allows me to work on the question, what are verbs that are related to food and cooking.

So, the bottom line.

Extracting every verb used at Pati’s site (plus a few from a Kindle book I have) I now have 275 verbs, some of which are very common and not directly related to food/cooking (but likely to be used in any kind of discussion about food and cooking). So I’ll comment more about those, what they are, how much they are used, but I have to write a bit more code to accomplish that, plus crunch a bit more of the corpus I used to accomplish this.

So, for now, I’ll just address another issue, i.e. the frequency of usage of various words just in the ingredients sections of all 30 of the menus in Pati’s season 12. Of course, this data (too large to include here entirely) is only a fraction of what one needs to know about the Spanish words used in ingredients sections of recipes (or menus), but the most common stuff probably has been revealed in this relatively small sample.

I do a lot of work with each recipe which means I don’t just grab and process that kind of source material as fast I can and thus it takes a while to really get a statistically sound sample of all the possible information of this type. I’ll get there, but right now here’s the results from this bit.

Remember, these are words from largely a Mexico connection and thus may not apply to other regions of hispanohablantes.

Extracting from 30 different ingredients section of recipes AND crunching a bit and consolidating related terms (which my code doesn’t automatically do, at least for food, even for all generic terms), here are the answer. Please see the comments below this data. The total (raw, not consolidated to lemmas, but eliminating some “glue” words of Spanish) is 312 words (lots can be reduced to a lemma) which is an amazingly small set to consider for 30 unique recipes. Since WordPress doesn’t allow me to import these with formatting I’ll include an image of the list, just below

This list is mostly nouns and adjectives with many of the other (not very food related) words and/or verbs removed (a separate post subject). IOW, recipes can be describes with only an average about 10 words each. Now as I look at more of Pati’s site, the aggregate number will increase, but interestingly I suspect there will be a small drop, per recipe, with more data.

Amazing, isn’t it. You only need to know 10 words to read a recipe, hard to believe.

But, of course, which words?

So, here’s just a raw list, that I consolidated by hand (merging singulars and plurals for nouns, singulars and plurals and masculine and feminine for adjectives, and largely deleting verbs (even their participles unless adjective, also verbs, in any form are relatively uncommon in the ingredient portion of a recipe)

o (29) y (20) ni (4) u(1)55
taza (25) tazas (21)46
a (18) al (20)38
la (15) el (9) las (5) los (7)36
cucharadita (27) cucharaditas (7)34
de (30) del (3)33
sal29
sin (20) con (9)29
fresco (9) frescos (9) frescas (7) fresca (1)26
cortada (7) cortadas (7) cortados (7) cortado (4)25
g (9) kg (9) kilo (5) kilos (2)25
blanca (14) blanco (9) blancas (1)24
cucharadas (17) cucharada (7)24
gusto (19) guste (1)20
aceite19
cebolla (16) cebollas (2)18
habanero (14) habaneros (4)18
picada (14) picadas (4)18
chiles (9) chile (7)16
en16
para16
hojas (13) hoja (1)14
negra (12) negros (2)14
picados (8) picado (6)14
un (1) una (5) unas (4) uno (3) unos (1)14
vegetal14
maduro (7) maduro (6)13
más13
cilantro12
jugo (11) jugos (1)12
molida (2) molido (2) molidos (1)12
pimienta12
tallo (7) tallos (5)12
ajo11
limón11
pelada (3) peladas (3) pelados (3) pelado (2)11
dientes (1) diente (1)10
rebanadas (9) rebanado (1)10
agua9
aguacate (7) aguacates (1)8
frijol (4) frijoles (4)8
lima8
semillas (5) semilla (3)8
xcatic8
banana7
caldo7
güero (3) güeros (4)7
jitomates (5) jitomate (2)7
opcional7
pedazos7
pepitas (6) pepita (1)7
quebrada7
servir7
vinagre7
cm6
crudas (5) crudos (1)6
descongeladas (2) descongelados (4)6
finamente6
mitad6
naranja6
oliva6
pelar6
pollo6
queso6
salsa6
tortillas6
tostadas (5) tostados (1)6
como5
ensalada5
epazote5
gran (1) grandes (4) grande (2)5
harina5
muy5
pequeños (4) pequeña (1)5
rallado5
suficiente (4) suficientes (1)5
acelgas4
agria4
albahaca4
calientes4
cebollines4
chaya4
chícharos4
colado4
delgadas4
espinacas4
filete (3) filetes (1)4
huevos4
maíz4
mantequilla4
masa4
miel4
orégano4
ramitas4
seco (4) seca (1)4
si4
verduras4
adornar3
árbol3
arroz3
azúcar3
bastones3
cada3
crema3
dividida (1) divididas (2)3
dulce3
engrasar3
enjuagadas (2) enjuagados (1)3
fileteada3
finas (2) fino (1)3
gorda3
hacer3
jalapeño (1) jalapeños (2)3
laurel3
ligeramente3
manteca3
mezcla3
morada (3) moradas (1)3
parte (3) parte (2)3
plátano (2) plátanos (1)3
puedes3
separadas3
superior3
usar3

Another way of looking at this list is: a) you should learn the common words unrelated to a specific dish, and, b) ingredients (i.e. mostly nouns) are relatively less common than adjectives that modify the nouns.

And this also poses a problem for language learning. The most common words, the one you learn is almost any kind of language learning approach, are relatively IRRELEVENT for actually understanding a recipe. Look at that, the first couple of entries in my table – critical words but basically contentless.

So the first claim I’ll make is learn all the easy and common words in Spanish so you don’t even have to think about them when looking at text. AND, when it comes to food/cooking learn a few others that convey little information and/or are used commonly (but relevant).

Now a sample size of 30 recipes for a single source is too limited for me to present any “conclusions”, but as I do more work you can see the path I’m on.

And, again, learning some of the general bits of Spanish (for non Spanish speakers), such as the common words I present on my page here at this blog is critical so that you can ELIMINATE these words from anything you have to think about while trying to read about food or cooking or menus.

What is the point of the pages at this blog

You might wonder why I spend so much time on the lists that go into the pages (easier to find than the many posts). Well, it’s simple – like it or not, short of my magic (and non-existent) app on your phone, if you’re going to read a menu, in Spanish, anywhere, you’re going to have to know enough vocabulary (or spend hours doing dictionary lookups).

Over the years I’ve been engaged in this I’ve tried about everything people recommend for learning any language. I would say most people encourage the combination of immersion conversationally-oriented classes and just talking to Spanish speakers.

Perhaps. That can be a good way to learn, and maybe more fun than other ways, but, unless the class you’re taking and the people you’re talking to are involved in cooking and food you’re not going to learn very much to help you read menus.

I did a very good immersion class and did learn some conversational skills, but, at least in only 16 weeks of classes, most of what I actually learned is simple conversational stuff, useful, but not very helpful for reading menus.

And then there are the more conventional classes, either the stodgy ones in schools or the dumbed-down online stuff like Duolingo. These, IMHO, do actually do a better job of teaching since, and, yes I know it’s a nasty word, they do find a way to introduce grammar, not just blindly parroting little phrases you hear (but don’t understand) in immersion classes. For instance, when do you say cómo te llamas or cómo se llama, and why? Using the wrong one will certainly label you as ignorant to native speakers, but in a few places it can even be insulting. And how stupid you’ll look if you say yo estoy quince años (or should it be soy, and of course, it isn’t either ser/estar, but instead using tener.

Yes, this stuff matters.

And then there is what I’ve been doing for past two years, lots of reading (esp. as I don’t have, at least in-person, access to Spanish speakers). Here you’ll get the pitch for extensive (aka skimming, don’t worry about words you don’t know, just get the gist of it) vs intensive reading. I’ve been doing both and this exposes you to a much larger vocabulary and with enough variety in your reading materials also grammar. For instance, the NYTimes Spanish section I read is really English articles translated by ??? (they don’t say, a native speaker, then from where) and even with my limited skill I can detect the difference in the Spanish between, say a native writer from Spain, i.e. Javier Marías and Corazón tan blanco.

But when it comes to food, I’m sorry, there is no substitute. You just have to learn a fair amount of specialized vocabulary: the nouns for the ingredients, the adjectives as modifiers of those nouns, and, especially the verbs. And that is a large vocabulary and there are few good web resources (I know, I’ve searched for them, a lot, and mined them a lot, esp. in my early days of learning Spanish food terminology).

But there is even one more layer, which, in fact you’ll actually need in your native language, and that’s culinary knowledge, esp. specialized terms that aren’t even generic words. You don’t think so, OK, fine, tell me what gazpacho is (or the much less well known, salmorejo); or what paella is (or the much less known zarzuela); and then tell me if you want to eat morcilla in Spain or menudo in Mexico?

Have fun crunching through your dictionary while the camarero (or is it camarera, or maybe mesero) waits for you to order. Or have fun asking them about the food choices with your phrasebook Spanish.

So, short of having my magic app, you’re going to have to do the work. And I can’t build that app, until I also know enough myself.

So, for the moment, the lists I create and try to organize on these pages are going to be your best help. Sure, there is stuff all over the net, BUT: a) the quality (aka accuracy) is often poor, b) the quantity is usually limited, and, c) a list alone, without so actual learning tool (or a very smart app) doesn’t do much for you.

So I’ll continue doing the work (having now restarted and done an intensive burst in last few days, that will generate at least 8 more posts) to provide you with something. Meanwhile learn Spanish at least to B1 level and then get some cookbooks written Spanish, ESPECIALLY for the cuisine you intend to consume, again remember I’ve already told you how tortilla is totally different in Spain and Mexico so you can’t even trust vocabulary if you don’t know the regional differences in actual cooking.

OH, and did you notice that the cooking verbs page was just updated, for the first time in several years. And if you scan it, you’ll notice lots more is planned and I really hope to get this thing done. And verbs help a lot, esp. as often the adjectives are very directly derived from them, esp. via that pesky thing called grammar. I only know a fraction of the verbs I have on my page, but I can claim that with my B1+ level I can find, about 97% accurately, the verb-y things in Spanish texts.

Finding verbs related to cooking -1

I’ve been thinking about this post and a lot of preliminary information, but I first thought I should check what I’ve said before about the process (and my progress) of finding verbs. And I discovered most of what I was going to say, well, I’ve already said, here: Finding verbs related to cooking

This is good since I’ve also spent a lot of the day actually working on that and so don’t have a lot of time for the post. So if, Dear Reader, you desire the background, go look at my post from nearly two years ago. And to myself, Dear Writer, good thing you checked instead of being so redundant.

In one of the other posts, under the tag verbs, I also mention that I want to find recipes written for Spain, not just anything in Spanish, because the regional variation in food terms can be quite large. For instance, tortilla is very different in Spain than Mexico (and USA), and while chorizo is a sausage both places they are very different in Spain (which I’ve bought online) vs Mexico (which I can buy in numerous mercados here in Omaha).

But now I’m going to violate that rule, for two reasons: 1) I have a large source of recipes with direct line-by-line equivalence in Spanish and English, and, 2) I’m a big fan of Pati Jinich, who has numerous shows, but the one with tons of recipes is Pati’s Mexican Table. I watch Pati on TV a lot (her shows have gotten better, more professional (better funded) over the years and both the content and Pati herself are quite entertaining as well as informative. But her language (English in the shows, but having been born in Mexico obviously she’s fluent in Spanish) is very much more Mexico oriented than Spain.

This might have a big impact for some of the food vocabulary, but, for finding verbs, maybe not so much.

Now, as brief digression, why do I care about this: 1) a very long time ago, early in the Internet days, I created a lexicon of Spanish food terms by just grabbing anything I could find. As I consolidated sources I discovered more and more disagreements (the meaning in English of a particular Spanish word) and eventually realized I was getting all these regional differences, IOW, I learned my lesson, and, then, 2) due to the movie The Way I actually thought I might walk the Camino and thus I wanted to be able to read menus and wisely pick my food choices at the small restaurants along the path. That started my project I’ve documented here.

But a source of dual language recipes is hard to come by and thus far I’ve never found one that originates in Spain and then has English translations. And I have enough experience (those silly 1134 NYTimes articles) comparing human and Google translations, to know there are numerous issues with Google Translate, as often I pointed out in some of my earliest posts about reading menus, with nothing but Google to help me.

So Pati’s site is too valuable despite its regional tilt.

And it’s not too hard to mine. Now she has broken down the online recipes by season. I’d previously just, kinda, randomly, looked at some recipes, but this time I extracted all 30 of the recipes from season 12, and knew how to then find the Spanish equivalents (not entirely simple process since switching the website itself to its Spanish version has the same content, but not directly lined up, so matching las 30 recetas de la temporada 12 page is not simple.

But after hours of work I did it and have 30 entries like this:

BTW: WordPress is now so obnoxious about not letting me format text the way I want I will mostly post screen clippings from my MSWord documents.

So I started crunching away on the first two recipes using a process I’d developed before (a 4-up table of Spanish, English, Google Translate of Spanish, Google Translate of the English) and split between the Ingredients portion of the menu and the Preparation instructions. After all the mechanical word of creating the tables in MSWord then I extract a lexico.

and

In the lexicon, the terms in cyan and bold are those that are not found in the Duolingo Spanish lessons (at least as far as I got before Duo screwed up the Spanish course by completely reorganizing it, thus screwing up all their existing learners’ study plans). My nominal assumption (for myself) is that, therefore I “know” all the purple terms (mostly true, at least for this list).

But extracting words with my code (a text parser that generates a word list) doesn’t find verbs. And that can be a bit tricky, esp. as if I expand my notion of ‘verb’ to words related to verbs (like participles) if I’m trying to extract all the verb-ish words I have to do quite a bit of hand work. So, one benefit of all my Spanish lessons is while I may not be able to read (meaning know the meaning of) every word in a text, I can mostly parse that text to find verbs. AND, I have the side-by-side human translations as well as the more literal Google translation if I get stuck (plus, in this exercise, every now and then looking up words in my dictionary).

So for the first two recipes I tried (Sweet Lime Roasted Chicken with Broccolini/Pollo Rostizado con Lima y Brócoli; Honey Habanero Rice with Peas/Arroz con Miel, Habanero y Chícharos) I got this (consolidated from four separate parsing passes) list:

Once I had the raw list, then I went back and did various edits: 1) strikethru the object pronoun stuff to get the root conjugation, 2) add infinitives for verb-ish green (the green, e.g. absorber is the infinitive that has the past participle absorbido which is also an adjective; 2) insert the infinitive for as many verb-ish forms are present, e.g. tapar for its conjunction tapa and the imperative conjunction of object pronoun tápalo; 3) delete any words I extracted that I thought were verb-ish that aren’t.

Then I reprocess the list and get this:

which is what I’m working on now.

A long time ago I wrote my own program to study Spanish verbs and do drills. I created a notation system for adding verbs into an XML file, then I read back and convert into C# classes: a base class verb and derived classes regverb and irregverb, which share some methods but, naturally, return different answers. I use spanishdict.com to get a complete definition (really translation to English) and I use conjugacion.es to get the conjugation, esp. when irregular. I have a semi-automated parser of the definition from spanishdict.com, and then have to hand-edit that to get what I use to create my XML, and I can parse the entire conjugation (when irregular) from conjugacion.es. My app looks something like this:

So I Create a new dictionary, then Add regular and Add irregular (with definition in bottom text block, and conjugation in clipboard (for irregulars), and Save as, and thus get entries like this:

Here’s the definition in clearer text:

So back to this list (my work-in-process verbs from first two recipes)

I originally wrote (about four years) this program for an entirely different purpose. Over time my needs for it have evolved. Plus I discovered some flaws (mostly due to my lack of knowledge about Spanish, but also source). So now I’m revisiting this. So, just today, I started working on not-difficult, but-tedious code to extract conjugations from another source (and I’ll probably need yet another, but I still haven’t finished this one).

Now what is the point of this?

And, btw, I believe I’m within the fair use concept of copyright since material I directly take from other websites (if not explicitly copyrighted, certainly they are implicitly, esp. for mass extractions) since I only publish snippets (like the definition of cocer, which looks like this at spanishdict.com

I will use this program, esp. when I expand it to handle a few other parts of speech, e.g. nouns and adjectives (just need new classes, but a new base class to have two levels of inheritance) to: 1) do drills so I actually learn these words, not just grind through a process of collecting them, and, 2a) to be able to find all occurrences (forms) of verbs in a text (infinitive, participles, conjugated forms, pronoun-suffixed forms), and, 2b) in a lexicon extracted from a text (or a manually extracted verb list), determine which verbs (or other POS) I have already put in my dictionary, so I can just work on anything new.

Now all the work I’ve done with the massive set of NYTimes articles has taught me I find all the common stuff pretty quickly and then the less common stuff over a very long period of time. I’m sure recipes are going to follow that pattern as well, so while today I was starting from scratch (and now have a backlog of verbs to add to my dictionary) in the future each new recipe will only provide a few more words.

BUT, I do have to do the work to get the answer. Which is, create the biggest and best list of Spanish verbs related to food, cooking, menus and restaurants the world has ever seen. AND, do the work to get it all installed here thus making this the best source on the Internet.

And then, maybe, someday, get back to my original project of writing a smartphone app to translate a menu, even hand-written from a blackboard, in Spain.

But, today, I have more code to write and more words to process.

And then I have some new material for yet more posts.

So stay tuned for more exciting stuff.

What have I been doing

It’s been about 6 months since my last post so I thought maybe I should do a post so WordPress won’t think I’ve died and delete my blog. Plus, to my surprise, in 2023, despite only doing a few posts I had the most visits, so either WordPress has changed the way they collect statistics or people are looking here. In my early days I worked very hard to try to write interesting and useful posts, for people learning Spanish and interested in eating in restaurants with Spanish menus. But I learned that’s a small group, either people know Spanish and can read the menus (and thus don’t need my help) or they just stumble through with their native language. But it’s still an interesting subject to me.

Also, in my earliest posts I was convinced I could create an app that could translate menu of restaurants in Spain (and I focused along the Camino de Santiago, as that was where I wanted to go) even without learning the language. But along the way I actually decided to try to learn Spanish, first with Duolingo, later with some classes (converted to Zoom during Covid) and more recently by intensively reading the Spanish language section of the New York Times, where I could compare human translations with Google translations. This activity turned into an interest in a different app (one to use side-by-side Spanish and English versions of the same article) as a learning tool. To do this required having my own machine readable dictionary (translations) and some classes to install in the reading/drill app.

And somehow then that began to consume all my available time. And for various other reasons I also haven’t posted much.

So here’s what I’ve accomplished in almost three years.

I have 1144 stories from the NYTimes converted to my internal format (MSWord files I feed to my various code). I don’t have the precise number but these represent a corpus of about 2 million words. From each story I manually create a list of non-Spanish words (to exclude from my lexicon), mostly proper nouns or words from another language. Then, for each story, I generate a list of all the unique words (note: ‘word’ is a messy concept and doesn’t always map; such as multiple ‘words’ (in Spanish) really meaning just one word in English, e.g. sin embargo (however)). Also ‘word’ really is one ‘form’ (aka inflection) of a base word (or lemma, what you look up in a dictionary), so, for example, in Spanish there are (based on my stats) about 3.7 ‘forms’ (or words as I’m using that term here) per adjective, and, about 11 ‘forms’ (actually found in the text) per verb.

So with all that, as context, when I load all the unique words from each of the 1144 I have 711,420 total words (IOW, about 1/3rd of the raw word count) which then represent 52,927 unique ‘words’. Now I’ve struggled to find out how many words there might be in the Spanish language (say the “official” version, which excludes a lot from the western hemisphere). The official source has 99,000 lemmas in their dictionary and so my best guess is that represents about 400,000 word forms, give or take 50,000 or so.

Now I have created two dictionaries (and C# classes to provide a variety of dictionary methods). I use all sorts of things to pick which of those 52,927 words to put in my dictionaries (with my own notation that handles various forms of inflections in Spanish, so one of my dictionary entries might represent up to 10 ‘words’). That has been a lot of work and so I still have 11390 words (usually less frequently used) to add. Currently I find about 20 more words per day, from about the 1.4 articles/day that I extract from the NYTimes.

I first “invented” my dictionary (its notation, plus some C# classes) when I was studying with Duolingo, so I could have my own drill (glorified flash cards) system. The words introduced by Duolingo were mostly the most common words in Spanish so that dictionary covers a lot of the total number of words in any corpus. But not very much of the Duolingo lexicon covers food so that dictionary, with its 5641 entries (and 14165 forms) represents only a small fraction of the vocabulary needed for menus (or more broadly, for recetas (recipes), and reading those in Spanish is a good way to then understand menus).

When I started doing the NYTimes articles, I expanded my dictionary notation system (I understood some parts of Spanish better, given my classes, such as object pronouns appended to verbs) and the C# code, so now my NYTimes dictionary is a bit more robust (I’ll eventually have some samples at the bottom of this post). So with considerably more effort I now have 20429 entries, representing 60069 forms.

During my accumulation of all this data I did not mark which entries have even the slightest connection to food or cooking or menus so it’s quite a chore to try to get any statistics on that. But that is what my actual point of post is – how much of the several hours per day I put into all this actually gives me any information about food/cooking/restaurant vocabulary. And, of course, not much, especially as very few of the NYTimes articles are about this subject, but a few are, and sometimes there are casual mentions in articles of some completely unrelated subject to food.

So with a quick scan, mostly ignoring verbs, I’ve found 62 words, since January 2024, that have a fairly strong connection to food/cooking. That’s less than one per day AND, of course, just finding words and adding them to my dictionaries doesn’t mean I’ve actually memorized and learned these words and thus would recognize them on menus! But it’s a start.

So, once again, as I’m commented before, does learning the language (through various techniques) actually help very much with reading menus and selecting food in a restaurant. This was once argued to me, that reading menus would be impossible without actually learning Spanish, but I was fairly convinced I could build my translation app just extracting words and Google translations from sample menus. After about five years, I believe it’s more true than not, that it is possible to build a menu translator without learning Spanish, and, that as a corollary learning Spanish (in a general way, not specifically focused on food) is only a bit of help and unless one achieves almost complete fluency (I’m just barely in the B1 level which is way too little) learning the language itself is actually less effective than just studying menus (or better, recetas, since both ingredients and preparation are the real core of understanding menus).

Now on the issue of verbs, I have a page at this blog with food/cooking/restaurant related verbs, that I started, but unfortunately never really finished. This contains 42 verbs with definitions, 297 ToDo verbs (likely food verbs, not defined in that page) and 15 “common” verbs that are likely to appear somehow in food related text. Recrunching this a little yields 322 infinitives that (someday) I’ll have well defined in this blog (which, btw, will be the largest list of verbs I believe will exist anywhere in the Internet, at least that Google searches can find).

So one thing my various programs (and accumulated data) can do is indicate how much of this covered in the NYTimes corpus I’ve accumulated. And the answer is: 130 of my list of cooking verbs do NOT appear in the NYTimes corpus, and a quick eyeball scan of those I know the missing verbs are very much cooking related. IOW, only about 60% of useful cooking verbs could have been learned just by reading (and learning) from new articles.

Now one thing about actually learning Spanish, however, is then understanding some of how verbs (a lot of what one has to study in learning any language) are more useful. For instance, many of the past participles of verbs are also adjectives (with four forms, gender and number), such as: horneado (baked, also horneada, horneados, horneadas) which is derived from hornear (to bake). Also, nouns are often closely related to one of the conjugated forms of verbs. So, while it’s very tedious learning verbs, esp. those related to cooking (and the best way to find these is crunching through lots of recetas, esp. if you can also find dual language, such as my favorite site, Pati’s Mexican Table, where I’ve extracted many recipes in both Spanish and English (the translations don’t match, line-by-line, however).

So, after five years, here’s my answer to what one has to do in order to create the data and the code for a menu translation app: 1) read and analyze lots of menus, being careful about regional variations in the Spanish (e.g. tortilla is very different thing in Spain than in Mexico), 2) get a working knowledge of Spanish (medium Duolingo level), and, 3) then read as many cookbooks, in Spanish (with Google Translate to help, and spanishdict.com (also wiktionary), to generate a vocabulary.

So, wrapping up, here is the list of words, in the first 100 days of 2024 I found (and got definitions for) in the NYTimes articles (I will leave it as an exercise for you, Dear Reader, to deduce my notation system):

AFILAR (afilo, afilas, afila, afilamos, , afilan)to sharpen (present)
alita[s]fnoun: chicken wing
almazara[s]fnoun: oil mill, oil press; olive-oil mill
almeja[s]fnoun: clam
AMASAR (amaso, amasas, amasa, amasamos, , amasan)to knead, to amass (present)
arepa[s]fnoun: arepa, corncake <LatAm>
balancead{o|a}[s]adj: balanced
bebedero[s]mnoun: drinking trough; watering hole; water dispenser; water bowl; water fountain, drinking fountain; spout
brasa[s]fnoun: ember, hot coal
cabracho[s]mnoun: red scorpionfish
calcinad{o|a}[s]adj: charred, burned
canela[s]fnoun: cinnamon
cervecería[s]fnoun: brewery; bar, beer hall
ciruela[s]fnoun: plum
col[es]fnoun: cabbage
cucharadita[s]fnoun: teaspoon, teaspoonful
cucharilla[s]fnoun: teaspoon; spoon
curad{o|a}[s]adj: cured; tanned; drunk
diluid{o|a}[s]adj: diluted, dissolved; weak
edulcorante[s]mnoun: sweetener
EMPAPAR (empapo, empapas, empapa, empapamos, , empapan)to soak (present)
eneldo[s]mnoun: dill
enfriado[s]mnoun: cooling, chilling
ennegreci{endo|do|dos|da|das}pp: to blacken (ennegrecer)
equino[s]mnoun: horse; sea urchin
erizo[s]mnoun: porcupine fish; sea urchin; burr
espárrago[s]mnoun: asparagus; stud
ESTOFAR (estofo, estofas, estofa, estofamos, , estofan)to stew (present)
filete[s]mnoun: filet, steak; thread
frambuesa[s]fnoun: raspberry
ganadería[s]fnoun: stockbreeding, ranching, livestock farming; cattle, livestock; livestock farm, ranch
glasea{ndo|do|dos|da|das}pp: to glaze (glasear)
glasead{o|a}[s]adj: glazed; glacé
habanero[s]mnoun: habanero <Mex>
herbáce{o|a}[s]adj: herbaceous
hielera[s]fnoun: cooler; ice bucket <LatAm>
hogaza[s]fnoun: large loaf, large round loaf
hornead{o|a}[s]adj: baked
horneado[s]mnoun: cooking time, baking time; baking
horner{o|a}[s]mfnoun: baker
infusi{ón|ones}fnoun: infusion, tea
ingesti{ón|ones}fnoun: ingestion, consumption
levadura[s]fnoun: yeast, leaven
licorería[s]fnoun: liquor store, package store; distillery; liquor industry; liquor
MASTICAR (mastico, masticas, mastica, masticamos, , mastican)to chew (present)
mazorca[s]fnoun: ear; cob, corncob, ear of corn
molino[s]mnoun: grinder; mill
molusco[s]mnoun: mollusk
néctar[es]mnoun: nectar
nopal[es]mnoun: prickly pear
NUTRIR (nutro, nutres, nutre, nutrimos, , nutren)to nourish (present)
probadita[s]fnoun: little bite, little sip
quemarsese: to burn oneself
quinua[s]fnoun: quinoa <Andes; Argentina>
rábano[s]mnoun: radish
roble[s]mnoun: oak; oak tree
rollito[s]mnoun: roll <CenAm; Mexico; SoCone>
suizo[s]mnoun: sugared bun, brioche <ESP>
tajada[s]fnoun: slice, cut; share; slice of fried plantain
ternera[s]fnoun: veal, beef; calf, heifer
tilapia[s]fnoun: tilapia
toro[s]mnoun: bull

What is esté? And it’s not este typo.

esta or está is bad enough, but what is this é thing in this case?

Needless to say estar is one of the most heavily used verbs in cooking (or dining) so seeing it with many conjugations is likely. But for SSL (Spanish as Second Language) speakers, dealing with moods, and thus the subjunctive isn’t very obvious as few examples exist, in English, of this.

So let’s look at a few examples, with human translations:

y esté algo firme

Note: sentence also contains no tenga grumos which translates to ‘it is smooth’, but has does tener end up conjugated as tenga, another example of subjunctive
it is … and somewhat firm

which is literally: it doesn’t have lumps
Cuando la masa esté listaWhen the dough is ready
hasta que esté cocidountil completely cooked
Cuando esté calienteOnce hot
  
Plural Version
  
Cuando los frijoles estén completamente suaveOnce the beans have completely softened
hasta que estén completamente molidosuntil completely smooth

These examples are just pulled from a couple of recipes (in both English and Spanish) at Pati’s Mexican Table. According to spanishdict.com this is when the subjunctive is used:

The Spanish present subjunctive is used to talk about situations of doubt, desire, emotion, necessity, or uncertainty.

In this case we can probably say it is ‘uncertainty’ (not hot yet, but will be sometime, and that’s what we’re waiting for) since it hasn’t happened yet (soft, smooth, hot) and therefore can use the indicative (a statement of fact or certainty).

Note also that the human translation is often less literal (Once hot instead of the literal when it is hot).

So recipes (recetas) are certainly (see this is indicative) a place to encounter the subjunctive.

And, in case you forgot: está is the third person (he/she/formalyou/it) conjugation and esta is ‘this’ for feminine and singular.

A little blog note

There is no way (I think) for you, Dear Reader, to notice that I sometimes add comments to my own posts, esp. the older ones. I started all this seven years ago and a lot has happened since then, including actually doing a lot of studying of the Spanish language. So now I can find mistakes in my earlier posts which indicates I’ve learned something in seven years.

But I’ve always seen this blog as primarily for me, to document my work. Thus returning to earlier posts, which I doubt anyone of you would ever do, is part of the process. This blog, unlike my other secret one, is almost entirely focused on one subject (instead of rambling about lots of things as I do in the other one) and I’ve been advised that generates more interest – potential readers can know in advance what the posts might be about.

It’s possible this is true, but at the same time the single focus of this blog (and my associated and never-ending project) is awfully limited in interest. Most people who are fluent in Spanish (while they might learn something here, esp. if they’re from Western Hemisphere and not familiar with Spain) find little they don’t already know and might sometimes laugh at how simple my observations are. Conversely people who don’t know Spanish probably don’t want to go through all the work – a simple cheat sheet (or just assuming everyone will speak English) is probably sufficient.

So for this topic readership is modest, but despite that it does serve my purpose. Believing someone might read the posts forces some discipline on me to write them a little better (better, doesn’t include briefly). So I’ll read my old posts, even if no one else ever does.

And, btw, seven years later Google Translate is no better at menus in Spain than when I first started, and I’ve also tried a couple of the new AI chatbots, who do fairly well on translated generic Spanish but not so great on food in Spain Spanish.

Exploring the new sources

As I mentioned in this previous post I actually found a lot more than I expected while looking for some new restaurant menus to examine, so now I’ll continue here with one of those.

Ready to spend some serious cash (muy mucho dinero)? Well, here’s your chance. For the mere price of


€300/per person, just food


you get food that looks like this (below). I’ll try to muddle through its menu to see what Spanish to English word pairs I can find. BTW, this is a menú (not a carta, like most estadounidenses think of as what “menu” means), which means it’s a whole of individual items already picked out for you.

Now actually I was a bit off on the price when I found them selling this gift card:

Of course I’d want the wine pairing (who wants a bad wine pairing)! And I’ve always got $1320 lying around (plus a plane ticket to Spain) ready to go. This place is some serious “good eats”. It even made the French Laundry in Napa look reasonable.

If you want a lot more beautiful pictures, go to Google Maps and first search for: Larrabetzu, Vizcaya, and then search for: Azurmendi and you should get something like:

and then scroll down and you can find this link for the menu, and down a little further to find just insane photos of this exquisite place (not exactly the standard Camino faire, but being just each of Bilbao it’s bound to be the art crowd instead).

Fortunately this website has a well translated multilingual menu and this another chance to compare and human translation to English and Google translations (and maybe dictionary lookups and searches).

NOTE: To save space I use these abbreviations: GT (Google Translate), SD (spanishdict.com) and DLE (the official Spanish dictionary). I also (mostly) embedded comments (in this color) within my usual four column side-by-side.

Now the beautifully done website start the menu like this, with all four versions:

El restaurante Azurmendi*** te propone una experiencia gastronómica única:Azurmendi*** restaurant offers you an unique gastronomic experience:

it took me a while to realize this is how they’re pointing out their three Michelin stars
The Azurmendi*** restaurant offers you a unique gastronomic experience:Restaurante Azurmendi*** te ofrece una experiencia gastronómica única:
El menú que se muestra a continuación puede estar sujeto a pequeñas modificaciones de última hora.The menu shown below is subject to minor last-minute changes.The menu shown below may be subject to small last-minute changes.El menú que se muestra a continuación está sujeto a cambios menores de última hora.
AdarrakAdarrak

This word stumped me for quite a while until I found this chef’s restaurant in Lisbon with the explanation, see the previous post, this is Basque for ‘branches” which was his newer menu after Erroak (“Roots”).
adarrakAdarrak
Las ramasThe branchesThe branchesLas ramas
Las últimas creaciones de Eneko Atxa adentran al comensal en una experiencia estacional para los cinco sentidos.The latest plates of Eneko Atxa engage the diner into a seasonal experience for the five senses.The latest creations by Eneko Atxa take the diner into a seasonal experience for all five senses.Los últimos platos de Eneko Atxa sumergen al comensal en una experiencia de temporada para los cinco sentidos.

So as I usually do I started looking through the menu sequentially so this is the top part. It’s hard to count exactly how many courses there are in this tasting menu but it’s a lot.

~ PICNIC DE BIENVENIDA ~~WELCOME PICNIC~~ WELCOME PICNIC ~~PICNIC DE BIENVENIDA~
Limón Grass“Lemon Grass”Lemon Grass“La hierba de limón”
Brioche de salazonesSmoked fish brioche

SD has: salazones (sing: salazón) as salted fish just like GT thinks

DLE says:

2. f. Carne o pescado conservados en salazón (Salted meat or fish).
salted brioche


so GT is correct (first definition in DLE) but not the right choice:   DLE says:

1. Acción de salar un alimento, como carne o pescado, para su conservación (Action of salting a food , such as meat or fish , for its conservation)
.Brioche de pescado ahumado
Polvorón JoselitoJoselito “polvorón”

There is no direct English for this, but a polvorón is a type of short-bread cookie presumably by whoever Joselito is
Polvoron JoselitoJoselito “polvorón”
MarianitoMarianito

No wonder there is no translation:
“Marianito cocktail is a vermouth based drink which is mainly served in Northern Spain, particularly in the Basque Country, La Rioja and Burgos.”
MarianitoMarianito

So I began to see my dilemma in studying this menu. The food is so exquisite and upscale the menu is not going to contain many ordinary Spanish food or cooking terms. Or at least certainly of the kinds of foods one will find in the usual mom-and-pop along the Camino whose Menú del Día is about 1/20th the price of this place.

So I went down a different path: generate a lexicon of all the terms in the menu and pick out some that look interesting. So if you’d like to see the whole menu I suggest going to site, where you can also find this gorgeous video describing their food and dining experience.

So here’s two tables with some comments interspersed (oops, WordPress can’t handle tables this complicated (easy in MSWord) so I’ll only show the edited table after discarding irrelevant words produced by the lexer. The Spanish term directly from the menu is on the left, its frequency of occurrence in the menu is the number and the Google Translation is on the right. Most of these are fairly ordinary Spanish food words. My comments on on the English side in this color:

trufa3truffle
agua2water
esencia2essence
fermentada2fermented
ibérico2Iberian (while this could mean many things it is referring to the famous ham just with this one word; a lot of places might use the full name, Jamon Iberico de Bellota, but I suppose that in a place like this what else could it be but the best)
jugo2juice
trufado2truffled (amazing Spanish has the verb: trufar = ‘to stuff with truffles’ so this is either the past participle or also an adjective; probably won’t see this on the Camino)
albahaca1basil
algas1algae (GT is technically correct but in a food context the correct translation is ‘seaweed’)
asado1roast
atún1tuna
bogavante1lobster
brioche1Brioche
brotes1buds   (it’s common GT says ‘buds’ but in previous menus I’ve studied, plus it is the restaurant’s English translation it should be sprouts”; I’d deduced this in prior menu teardowns but it’s also handy to have a fluent (in both language and food) person confirm it)
café1coffee
caramelo1candy
castañeta1castañeta (GT didn’t know and I couldn’t find this anywhere, but castaña = chestnut, and that’s the human English translation, so this is a diminutive for either small or early)
cerdo1pig
descascarillado1shelling   (peeled: Again GT is not wrong and SD is no help (thinks it’s ‘chipped’) but again the human translation comes to the rescue with the better term)
erizo1sea ​​urchin
esparrago1asparagus
estofado1stew
flor1flower
gel1gel
granizado1granita
guisante1pea
hidromiel1mead
hierbas1herbs
hojas1leaves
huevas1roe
huevo1egg
limón1lemon
manitas1handyman  (I’ve encountered this in several previous menu studies and took some effort to figure out it should be trotters (which isn’t entirely obvious what that is, but it usually is whole pig’s feet) and the human translation confirms this)
manzana1apple
merengue1Meringue
néctar1nectar
ostra1oyster
pan1bread
picnic1picnic
pimientos1Peppers
polvorón1shortbread (while this is a reasonable “translation”, really this is like paella or gazpacho, these are just polvorones)
quisquillas1shrimps   (according to SD, this word is used only in Spain, with camarones more common worldwide and as I can personally attest in USA)
reducción1reduction
rocío1Dew
rosa1pink
salazones1salted  (really salted fish)
taco1Taco
texturas1textures
toffe1toffee
vainas1pods (a bit unclear on this and human translation is the same, but for single SD thinks it’s a green bean)
vegetales1vegetables

So, while this menu is way too sophisticated for my general needs, a fancy restaurant like this also supplied a high-quality human translation and thus would add some confidence measure to my corpus for a few words.

I doubt I’ll ever go to this place, but the virtual visit was interesting.

A simple search led to a goldmine of translation sources

I started with a simple premise, do search query (in Spanish, to get Google to give Spanish sources, not English) for restaurant reviews in Spain, and scrolling down a bit I found this:

These are the 10 best luxury restaurants in Spain, according to Tripadvisor (GT of the original Éstos son los 10 mejores restaurantes de lujo de España, según Tripadvisor), printed in El Mundo, and so I immediately started looking at the first one (next post), which was:

Azurmendi (link is to English version of the tasting menu, of the multilingual menus at the site)

but as I usually do I like to try to get a geographic orientation, so here’s the satphoto (thanks Google Maps) of the immediate area of the restaurant

From the StreetView on that entrance road, you can see on right side of satphoto:

Hint (after more study): Take the left fork (smaller road) and you get three stars and 4X the price.

Now a little bit of reading about Azurmendi leads to learning its head chef is Eneko Atxa (National Gastronomy Award 2015 for Best Head chef, featured in the select group of European’s Young Leader, wins Best Chef Award 2019 in Europe by Madrid Fusion, wins the National Prize for Healthy Gastronomy to the Most Outstanding Personality 2018; and more, check out the history page). Notice, labelled in the satphoto, that there is an adjacent restaurant (mere $$$, and turns out only a single Michelin star) of the same name, so what’s going on. And what about that Bodega as well.

Restaurante ENEKO (link is to English version of the tasting menu, of the multilingual menus at the site) is now my second source to study.

From my brief start on the Azurmendi I was stuck by the term Adarrak which GT couldn’t translate and the human translator for the website didn’t translate. This led to a lot of searching and eventually answer, and, surprise, yet a third website and multilingual menu:

ENEKO Lisboa (link is to English version of the tasting menu, of the multilingual menus at the site) is now my third source to study.

And finally answered the question of Adarrak,

Eneko Lisboa offers two tasting menus: Erroak and Adarrak.The first, Erroak (“Roots”) consists of iconic Eneko dishes that have remained in the menu due to their singularity. The second menu, Adarrak (“Branches”) is comprised of Chef Eneko Atxa’s newest creations.

So it must be a word but what language (note here’s a fragment of the menu), from the English Azurmendi menu:

Now his Lisbon restaurant has Portuguese as one of its language so I tried that, and, nope, nada. And previously I thought, also given the location and looking at the name of this person it might be Basque, so without a dictionary all I could do is try GT, which came up with something: Horns, but at least a translation, and, then Erroak definitely translated to ‘roots’, as per the description, so that must be it.

Wow, fun little adventure, now a bunch of menus to study and a lot of posts to write.