Back to menus; a big project

My primary purpose for this blog is to record my progress in developing an application to translate menus in Spain. I worked diligently on this for about nine months but then got into some side-trips in other projects. But now I’m trying to get back to that primary objective.

For 78 days now I’ve also been trying to actually learn Spanish via the nice online application, Duolingo. While this diverted me from my primary task it has been useful. My sister always thought my idea was silly and that instead I should just learn the language. That’s not a bad idea but it looked harder (and more time consuming) than my primary limited work just to read menus, based on the assumption I’d soon be heading to Spain to tour along the route of the Camino de Santiago. Therefore I needed results sooner than I could learn the language.

To build my application I’d first need a large corpus of terms from menus with accurate English equivalents. To do that I’d import the text from websites into a working document and crunch through all the terms. Often that gave me some interesting observations that I was converting to posts, hopefully also interesting to my readers. Obviously there are going to be mistakes in manually collating data so my corpus needed to be carefully curated, with the terms and my “guesses” at translation with a “confidence” factor. Then via the large corpus I could extract the accurate equivalent Spanish to English translations I’d need for the application.

That’s a long slog so a couple of times I went ahead and created a minimally curated “glossary” which I have as a page here at this site. In my searches I found a number of glossaries, or even dictionaries in Spanish, covering food. Years ago when I first got interested in these I just extracted all the glossaries I could find and manually collated them into a single glossary. It was a mess!

The trouble is that food terms in Spanish (my searches) yield results that either don’t apply to Spain’s food dialect or were just wrong. After all any other person who compiles glossaries makes mistakes too. Or I’d make mistakes extracting and collating them. And my lack of any fluency in Spanish meant I often misinterpreted the raw material I was attempting to organize. That previous experience convinced me I needed to be very precise about collating material AND focused on Spain as the source of the raw material and so my idea about creating a corpus evolved.

But in nearly a year I still don’t have that corpus. And without it I can’t build my application. And in the meantime I needed to get some “drill” code done since I reached the point where I was forgetting more than I was learning. And while Duolingo is fairly good for learning Spanish it’s not as good for repeating previous lessons (and their vocabulary). And repetition is the key to learning a language. So I found myself forgetting vocabulary I’d once before acquired.

So I set out to build a drill application, which has some of the same elements I’d need in the translation application. And like compiling glossaries I’ve done this also, in the past – the first time for Italian food terms. So I’ve built drill programs before with only limited success.

The key to a drill program is to be efficient and force me to do repetitions of the vocabulary I know the least well. That’s harder than it sounds. Plus most of the types of drill I did (glorified flashcards, a common language learning technique) took so much time that as my vocabulary grew my repetition, of any particular word, got less and less frequent. Even with an hour a day I could only repeat a fraction of the vocabulary I’d acquired.

So I had some ideas how to improve this and make the drill more efficient. But I needed data even to do the programming. So I fairly quickly assembled the glossary I posted at this blog without being too concerned about its accuracy.

So with that lengthy background now I can describe what I’ve more recently done and the “big project” I’m now doing. I built my first version of the drill application centered around the Duolingo vocabulary. As I’d do each lesson I would fairly careful assemble the “database” (a complex XML) to feed the feed program. For my Duo vocabulary that now contains about 1100 “terms” and 1400 “forms” of those terms. By forms I mean the usual four spellings of adjectives (in Spanish both gender and number) and the first set of conjugations for verbs. Getting all that going for Duo vocabulary drills got me a fairly useful and efficient drill program which is helpful as a supplement to Duolingo.

So then using that code and crunching the glossary I’d assembled here I started on the food terms. And that was a bit of a mess because the glossary sucked.

So to fix this I went back to my 30 or so working documents of all the menus I’d processed. Rather than the more difficult chore of extracting material for a well curated corpus I just quickly (a couple of days) just extracted all the accumulated Spanish. That’s a tedious chore but it does reveal some of the problems of getting “raw” material from the websites. Naturally I found lots of spelling mistakes (easier for me to recognize now that I know a little Spanish) but also the inconsistencies in gender and sometimes number. Also many instances of words are very inconsistent on the use of accents in the Spanish words. My Duolingo study also let me learn the rule that accents sometimes change (for real, not typos) in certain circumstances.

So once I’d compiled all my “words” from all menus I had about 10,000 “raw” bits that I was able to clean up, de-duplicate and consolidate (like all the forms of adjectives under a single “term”) and ended up with about 5500 lines.

Then in a separate process I took the latest (v3.3) copy of my glossary and then combined that with about six other glossaries. That was a chore and resulted in about 4000 entries.

So then I combined these, all the glossary “words” and all the menu “words” and started going through all that by hand. I’m now down with everything through M (since I sort all 9000 or so lines into alphabetic order). I’ve done a few hundred “fixes” to my glossary and about 100 additions. But more importantly all those changes are in my XML “database” for the drill program. With a bit of code I can then extract from that XML to create text I can paste into the glossary page here.

So when I’m finally done with all that tedious manual work I can update my glossary and it will be a big change so I’ll make that the v4.0 version which I believe will be quite a bit better than my current v3.3 but not as good as a curated corpus needs to be. And, really my glossary will then mostly contain words that exist in reference sources (several online dictionaries I use) and/or reconciliation with the other glossaries I found.

Please note, therefore, than my word product is fully derivative from many sources and my editorial work and thus constitutes “original” work. I’m quite conscious of never (almost never) posting anything in this blog that would violate copyright, i.e. the wholesale use of someone else’s glossary.

And now all my material is synchronized – my XML database for the drill program, my derived glossary with reconciliation to other glossaries or reference sources, and I’m only including terms in either place that I’ve found in menus so my product is more closely aligned with Spain dialect and I can exclude other Spanish food terms.

Now, while that isn’t done, I’m back into the code for my drill program. In the case of my Duolingo vocabulary I feed into the drill program I (mostly) know that vocabulary by memory. Duolingo is divided into lessons (aka skills) that require 40 actual drills (to pass the skill and unlock the next one) which means about 800 individual drills. At Duolingo I’ve now done 16,843 “XPs” over 31 skills. On average each skill introduces around 30 words (forms actually). So when I do my “refresh my memory” drills with that vocabulary I have relatively few words I ever mark as uncertain, or worse, “I’m wrong” or “I’m clueless” (really forgot). That means all the scoring I’ve done with that vocabulary has relatively few “errors” and my aggregate score on most terms is 100%.

In contrast I’m much worse on my new food vocabulary. As I’d work on menus I’d “learn” many words, but since I had almost no repetition of those (the most common words appear on many menus so that was my repetition) and I’d done none of my own drill. Now that I have something to feed my drill program I’m getting a lot more “bad” scores. That’s good and bad. It’s bad because it means I don’t know those words very well, by memory. It’s good because now all the scoring of the drills I record in the XML has a lot more data than the drills on Duolingo vocabulary.

So that means back to programming. How do I consolidate tens of thousands of individual drills into some sort of metric that rates each word in the vocabulary as to how well I know it (and/or don’t confuse similar terms). Because I want to drill myself on what I know the least. I don’t very much need to drill on carne or aqua or cerveza or a few hundred other food words and I don’t want to waste the limited time I have for drills (even less than my free time because drill is tedious and I can only tolerate a certain amount each day). So that’s now the algorithms I’m trying to develop so my drill program is even more efficient and therefore more useful.

So while I thought I’d be done with this by now I have probably another week to finish cleaning up my food vocabulary and enhancing up my drill program.  But once I’m done with that I can spend 15-30 minutes every day (or most days) so I get more of the food vocabulary into longer-term memory along with a growing Duolingo vocabulary. Thus I’d hope to have reasonable fluency within a few months so soon I may need to head to some Spanish speaking country to test myself.

Now, note, all this is “reading” (and less “writing”) Spanish. Hearing or speaking is an entirely different problem. But without mastery over much of the vocabulary actual conversation is pretty hopeless. I’d originally assumed I’d have no more audible Spanish than a few phrases and the rest I’d do through reading (plenty of time to study a menu, have to be fast to have conversation).

Now, finally, all this I’m just doing for myself, other than relating some hopefully “interesting” tidbits here in the blog. While I’ve built many software products over my working life all this I’m just doing for myself. But at least, as a derivative from this work, I do hope to end up with the best glossary for food terms in Spain here at this blog as my contribution to others who might need this.

 

Advertisements

A Camino not taken

Since I lived in California, in Palo Alto, not far from the major street El Camino Real I have known that ‘camino’ means way or route or road. What I’ve now learned is that it is derived from the verb caminar and the first person singular (yo) conjugation is camino; IOW, it also means “I walk”. I was born in the city of Amarillo Texas which I now (mostly) know how to pronounce correctly and that it is the masculine singular adjective ‘yellow’. Interesting how Spanish has been all around me.

But that’s not what this post is about. As I’ve mentioned I’ve gone off now on several digressions from my original project and subject of this blog – that is a virtual trek along the Camino de Santiago, decoding restaurant menus along the way so I can produce a food specific translation tool. I haven’t dropped that project and from time to time, given I’m still putting in miles on my treadmill which I convert to distance along the Camino route I do check what restaurants I “encounter”, that is via Google maps and StreetViews and their ratings and most importantly user submitted photos. I’ve seen thousands of typical mom-and-pop Spanish dishes but with the exception of a Spain oriented restaurant in Columbus Ohio I have yet to actually taste any comida español.

What I have been doing, after getting a new computer to use in my programming projects, is going back through 30,000 old digital photos and selecting those that either are visually interesting or interesting as reminders of my travels. So, in the absence of any other posts getting created, I thought I could just try adding a few of these photos. I’ve had a fondness for photos, otherwise fairly boring, of roads or, better, trails I do manage to walk. So I figured I’d post all my photos of trails (many more interesting than much of the Camino) and when I run out of those then some roads.

So I’ll start with this one:

Note: I still haven’t quite figured out how WordPress resizes photos so I have much higher resolution photos than I’m posting and I’ll have to figure out the trick to getting better quality.

I picked this shot because it fits my title – the footpath bifurcates with a more obvious trails and a lesser one. Naturally I hiked the lesser one (this was about four years ago).

The location is the Theodore Roosevelt National Park in North Dakota, USA. I camped there for about a week, often overrun by bison who decided the fresh spring grass in my campsite was what they wanted to eat. So I sometimes retreated to my car as a thin nylon tent is not much to stop a bison. This park has its name from the fact that President Teddy Roosevelt, as a young man, had various physical weaknesses and he chose to go to North Dakota (not then a National Park) and try his hand at ranching. Despite definitely being a “dude” from a rich East Coast family, tough and later rough-rider Teddy eventually impressed the locals with his tenacity and energy. So when the land was transferred to the US Park Service naturally it was named after him.

Now the area of the park is interesting because it appears almost out of nowhere in the middle of very flat plains of western North Dakota. While, at the bottom (where that trail is, the first photo), it can appear to be mountainous but it is actually canyons created by the Little Missouri River, which eventually flows into the Missouri River which is the border of my home state, about 10 miles away. So here’s a sample of the larger area from the top of the canyons:

Note: This photo still looks horribly fuzzy to me despite having 1920 pixel wide resolution – what is WordPress doing?

Since this is spring it’s quite green at this time of year but this is mostly prairie with some cedar trees. It was consider fairly inhospitable to any life and was known as “badlands”. But now there is actually a Badlands National Park in South Dakota, which I visited on the way to TRNP and it looks rather different.

So this is my first “camino” post with what looks like disappointing photos (in WordPress edit mode) and hopefully these images will look better in the finished post. Otherwise I’m clueless, now, how to get decent photos that I have (from a Nikon, not a cell phone) into posts.

I won’t do that many of these so it’s not just filler until I get back to Spain restaurant menus but I do enjoy (me gusta caminar) and I have some photos to prove it.

p.s. Since I mentioned bison (incorrectly called ‘buffalo’) I suppose I should include a picture of one of those sitting in my campsite near my fire pit.

retreat-1 026(16-9)

This is a real wild (and very large) animal. I was amused by signs warning tourists not to bother the wildlife, but there was no sign explaining what to do if the wildlife was bothering us!

Something different

One problem with a virtual trek is that I don’t get an chance to take my own photos. I can’t post photos of other people so I can only talk about my “trek”. So photos to follow, but a little preface (scroll down if you’re impatient for the good stuff).

Well, actually I do go places. And I take photos. I very much enjoy the posts of loyal readers with fantastic photos, places I’d love to see, but at least I can experience through other people’s postings. So here’s a few to return the favor.

So, I recently got a new computer and I really wanted a new and fresh set of photos for my screen saver on my new large display. So I dug into my archive of over 40,000 photos to pick a few of the best. It was an adventure to look back over almost 20 years and a variety of digital cameras.

And while Spain, my current interest, is not much like Texas, there is some resemblance. When I first moved to Nebraska from the San Francisco Bay Area (Los Altos to be specific) I was really depressed. Withing an hour of my old house I could find, even on foot, beautiful country. Within a few more hours I could either be cross-country skiing or sipping wine in Napa or riding my bike along the Pacific Coast. In contrast even 6-8 hours of driving from Omaha it’s still just cornfields. So I went crazy, also given it is winter in Nebraska, and I threw my backpacking gear in the car and headed south. Three days later I found myself in Big Bend National Park in Texas. Now I get to say anything I want about Texas because I was born there in a city called Amarillo, needless to say nowhere near the correct Spanish pronunciation of the adjective, ‘yellow’. Texas is a huge state, probably bigger than Spain, so I’d never been to Big Bend and it was a thrill to visit. Later I convinced my wife that visiting some place where I’d been sleeping in a tent on the ground was still a fun vacation.

As many of my loyal Readers are not from the USA, you might still know that our insane president (pretender) wants to build a wall along the USA and Mexico border. Actually there is a “barrier” on most of the border except Texas. Folks in Texas hate “imminent domain” so even putting up fences has run into local opposition. But the real “barrier” is nature, fierce, but beautiful.

But far more important a big chunk of the US/Mexican border is a fantastically beautiful place, either the National Park  or the Texas State Park. Twice I’ve visited this area and the second time I had a digital camera so here are so photos to give you feel for this beautiful place AND how impossible the terrain is for any sort of hordes crossing the border. I’m not sure I’ve seen any border that is LESS possible for easy crossing. And it would be horrible to spoil the beauty of this area with an utterly useless Wall just to make MAGAs in Michigan (who’ve never been anywhere near the border) happy.

So here are my photos, please ENJOY this beautiful place. And for once I can contribute something to see.

Ick. There is something I don’t understand about posting photos. These photos look like a blurry mess, but not what I have in my files (these are originally 15Mpixel files from a Nikon). I’m trying various things to make them look like I see them, not sure what WordPress needs.

Here are a couple of scenic vistas in the general vicinity of the border:

Actually this isn’t quite near the border, it’s the Chisos Basin in Big Bend National Park but that’s where I was for this fantastic sunrise (it is about 5AM and a long exposure). Chisos Basis is the only accommodation in the park and is surrounded by mountains on all sides. The air is incredibly clear, and, of course dry (it is desert) so sunrises and sunsets are fantastic.

Do I mention you can see the sky here. My photos don’t even come close to the experience you can have, standing in the desert and seeing sky everywhere.

But now we come to the border.

From the US side this is looking north, in Texas State Park, with the Rio Grande behind us.

(Note: these photos look crummy to me, but they’re not all blurry like I see them as I make this post. I guess I don’t understand how to incorporate good photos in WordPress – click on the photo for a better one, but still much lower resolution than my original).

You can just barely see the river here, but this is a hint of surrounding country.

And here it is = the border, the Rio Grande – you can see the streams of immigrants flooding across. They come well equipped with climbing gear.

Again does that look like the kind of river you’re going to see a migrant caravan of women and children rushing across? Go luck kids.

A few miles down the river, still rough country – great sightseeing on the highway on the US side, pretty rough country with miles of desert on the Mexican side.

Here the Rio Grande might be easy to cross, but

here, not some much. This is the St Elena Gorge, as awesome cleft with steep cliffs on both sides of the border. When I first went to Big Bend my parents, who were “snowbirds” (people in cold climates with RVs who head to warmer climes near the border) warned my about Mexicans stealing my car. When I saw this gorge my reaction was – GOOD LUCK. A huge expanse of fierce desert to get to this gorge and then technical rock climbing to get to the US side. Hey, anyone intrepid enough to make that journey can steal my car! Needless to say there were no car thieves and anyone except USA tourists anywhere near this spot.

Maybe this crossing is lot easier, but still seriously demanding of outdoor skills.

And in case these barriers are not discouraging here’s a few other things you would face.

 

Amazing, this guy, about the size of my hand was just sauntering across the highway. Supposedly they’re fairly gentle but I wouldn’t want to put that idea to the test.

And, just more fun

These are called “horse crippler” cactus, and for good reason. Anyone daring this part of the world needs serious boots (and a good eye not to step on these).

A few times in my life I just zipped through the southwestern deserts of the USA but when I finally visited, slowly, on foot, these areas I was stunned at their beauty, something you have to see close up and in sync with nature.

The idea of putting a 10m high wall across this country, despite its stupidity for all the other reasons, is a criminal offense against the sanctity of nature. Spain has its beautiful spots, which I still hope to see, but the USA has fantastic spots as well.

Now, these photos are yucky, so I’m going to see if I can make them look better, more like I see them (I do have a rather good Nikon camera to shoot this stuff, not some two-bit cellphone camera).

Lost a post

This is primarily a note to myself to punctuate the flow of this blog to reflect some history.

I’m disappointed that I managed to delete a post I had mostly composed. Sometimes, and usually for more difficult posts, I work offline in MSWord to compose my posts. The way WordPress works isn’t that helpful for posts that take a while to compose or when I need to do more research for the subject of the post. And in the case of my lost post I was doing a lot of background research.

Previously I’d complained (in other contexts) about losing posts or especially comments in WordPress. Since the text editor running in a browser can’t access the local file system no temporary saves can be made (for posts, but not comments, WordPress does some saves to its cloud). I’ve lost enough that way to have adopted working offline (which can do temp saves) but even that is subject to human error.

So, what the post was about was restaurants in Astorga (surprised to find so many) and a local dish that is quite popular and featured by many of those restaurants, cocina maragato. I had done a lot of research on all this which was contained in the incomplete post I lost (also the menu translations that were background source information for the post). So you can go look at Astorga yourself (via Google Maps) and start with this link for cocina maragato.

Then, poof, in too much of a hurry I deleted the entire post AND the multiple menus I’d extracted and analyzed. What happened was that in attempting to add a new menu strange stuff showed up when I pasted the content from the website into MSWord. So I thought I was reversing that paste but instead deleted all the text in the file. Later in the evening while doing shutdown of my computer I got the notification from MSWord whether I wanted to save or not and, stupidly, just said yes without checking what changes it was trying to save. Poof, now I have an entirely empty file that previously had been many pages long.

This was quite discouraging to lose hours of work so I’m doing this notification post to kind of purge my despair and thus get back to work again on menus. And then make some new real posts.

 

Blog note

After consolidating terms from numerous menus, plus the recent post about restaurant terms, I substantially updated the page under the tab RESTAURANT PHRASES. The main change was the addition of a list of phrases which I’ll include here for convenience. Enjoy!

 

In this list the notation {x|y} means this word occurs with either x or y in this position, usually this is gender in adjectives, so {a|o}. [x] means optional, most often [s].

a elegir to choose [from]
a tu elección at your choice
acompañad{a|o}[s] accompanied
al centro in the center (of table, i.e. for sharing)
al estilo X in the style of X
al gusto to taste (doneness), i.e. cooked to order
al peso by weight
bebida[s] drinks
carta the a la carte menu
casa literally house, from this restaurant
caser{a|o} homemade
combinados combinations
degustación tasting/taste (often a separate menu)
del día of the day
diario daily (available item or open)
elaboración preparation
eliges tú los ingredientes you choose the ingredients
en temporada in season
entrantes starters (aka appetizers)
especialidad specialties
horario hours (as in when it is open)
incluid{a|o}[s] included
ingredientes ingredients
mesa table (different from tabla)
para acabar to finish (after main part of meal)
para comer to eat (main part of menu)
para compartir to share
para picar to nibble on (aka snacks or appetizers)
por encargo on request
postres desserts
precio[s] price
primeros [platos] (primer) first course
segundos [platos] second course
selección/seleccionado selection/selected
servido [con] served [with]
surtido assortment
tabla board/plank or platter (usually an assortment, often of ham)
unidad unit (abbreviation uds)
vari{e|a}d{a|o}[s] assorted, varied, variety

Too many menus, too little time

I’m only about five miles away from León (on my virtual trek, previously mentioned) where I’m bound to find a lot of online restaurant menus so I’ve been rushing to finish my list from the city of Palenica. I can work on the menus in bits and pieces, extracting and formatting the material into my source files and then analyzing the entries, doing lookups and searches on terms that machine translations handled badly. This isn’t easy and beyond mere mechanical, sometimes, but I can pick it up and put it down, thus squeezing this work into crooks and crannies of my day.

But the real work, actually generating a corpus and then, even more, creating the software to collate all this and actually create a Spain food translator that is far better than the extant machine translations requires a really concentrated effort and so I’ve essentially done none of this. I have to remember what it was like to work hard all day long on this kind of task, day after day, as I did when I was in a real job of software architect. But I find I can never get around to this for a “fun” project.

In between is writing these posts. I can’t do that in bits and pieces either. While a post is a shorter task I still require some concentration and focus, plus usually even more research. But that’s the good part. My quick cursory analysis of menus is sufficient to find specific translation issues for posts and thus, wanting to get it right in the posts, the need for more careful research and conclusions. And even though this may only be a few hours it’s hard to get that hunk of uninterrupted time. So my posts have really been infrequent.

I write the posts as part of a discipline to do this work more carefully. Knowing someone might notice my mistakes and then (and I’d love it if they did) comment as to my mistakes forces me to be more careful. Plus, sometimes, I try to tell more story than just the translations and that even enriches my data collection more.

So posts are great to do (and hopefully of some interest to you, Dear Reader) but it’s hard to get them done.

I have material for at least six posts about the menus from Palencia that I’ve studied. I really hope I can apply myself and get these posts done before I start digging in León menus.

So here are some restaurants you might find interesting. There were 159 restaurants in my starting list but I only looked at the ones with real websites (the Facebook sites are useless to my purpose and frankly, IMHO, worthless to a potential customer). Many of the websites then have little information and especially lack menus. Then often the menus are in two formats I just barely can use: 1) just images (i.e. no text to extract from browser so have to manually transcribe, hard to do accurately) or, 2) PDF’s. While I can usually (not always) get text from the PDF’s it: a) takes a lot of manual post-processing to organize, and, b) then it’s not easy to get Google translations (I have to build my own temporary webpage from the extracted and processed PDF information to let Google chomp on it), and, c) using Microsoft’s translation within MSWord is both a bit clumsier and overall somewhat inferior to Google (although in some cases it is better as well).

So my criteria for looking at restaurants in the following list has little to do with any sense of their quality or interesting cuisine. BUT, that said, usually I’ve found what appear to be the better restaurants often also have the better websites. I encourage them (not that any of them will be listening) to put more work in it. Perhaps for local clientele websites are not very important but for tourists I believe they’re beginning to be critical. I have another post about how I was persuaded to recently visit, even going out of my way, a particular restaurant in Ohio solely on the grounds of its website, although later learning it was also “rated” as one of the best in Columbus. And while pretty pictures of the food and glowing descriptions are nice online menus are far more important, again IMHO, for “selling” your restaurant to new customers.

So here’s the list I’ve processed, hopefully with stories to come when I can find the time for posts.

Bar Comedor El Garaje http://barelgaraje.es
Bar El Cobre https://barelcobrepalencia.es/
Casa Pepe’s http://casapepes.es/
Dominos (just wanted to compare to both US menus and local restaurants but some new vocabulary did appear) https://www.dominospizza.es/carta-de-pizzas
El Majuelo http://www.elmajuelopalencia.es
El Rincon de Istambul (interesting since they focus on Turkish food and so had non-Spanish items I had to look up) http://rincondeistambul.es
Gastrobar Donde Dani http://gastrobardondedani.es
Habana Cafeteria (interesting that a cafeteria has different selection which revealed some new terms) https://habanacafeteria.com
La Barra de Villoldo https://labarradevilloldo.com
Ponte Vecchio (interesting since they focus on Italian food and so had non-Spanish items I had to look up) http://www.pontevecchio.es
Restaurante – Cerveceria Las Hurdes http://cervecerialashurdes.com
Restaurante Asador Palencia La Encina http://www.asadorlaencina.com/es/palencia/
Restaurante El Brezo http://www.elbrezo.com
Restaurante La Cantara https://restaurantelacantara.com
Restaurante La Traserilla http://www.latraserilla.es/
Restaurante-Bar Mano http://barmaño.es
Restaurante-Cervecería Moesia https://moesia.es/

 

para picar and other restaurant phrases

Despite my lack of posts I have been continuing to study menus from restaurants in Spain, at the moment from a large list of restaurants in the city of Palencia. In that work I’ve thought of probably half a dozen posts I’d like to write. But posts are harder than study. I need concentrated time without interruptions and real focus. Study is mostly mechanical and I can do bits and pieces at a time, easily stopping and restarting later. I don’t know about you but I have to finish what I start, in one sitting, when it comes to posts. Of course 😉 if I did shorter posts maybe I wouldn’t have this problem. But, alas, I accumulate so much material it’s hard to neglect it all.

But there is a potentially relatively brief topic about some phrases one finds on many menus. The phrases are simple, but the literal translation of various machine translations aren’t very helpful. So let’s start with this one.

A menu was basically divided into three sections with these phrases (with Google translations):

para picar algo to chop something
para comer to eat
para acabar to finish

I doubt I’ll be doing any chopping while dining in a restaurant. Just para picar is more common than including the algo part, so what does this mean? para is just a preposition meaning ‘for’ or somewhat more helpful in this context ‘in order to’. picar has a host of meanings: to chop, mince, grind, cut, crush {to divide into pieces}; to sting, bite {by an animal}; to peck at {birds}; to break up (big pieces), chip (small pieces) {mining}; to punch; to needle {colloquial) to antagonize}; to spur on {horse racing}; to goad, prod {bullfighting}; to play staccato {music}; to rot, corrode, rust; to key in {computing} to eat, nibble on {(colloquial) to snack on}

Now we’re not bullfighting or mining or horse racing, so probably the sense related to eating best applies. While ‘to nibble on’ is the obvious dictionary definition to use the sense for this ‘to snack on’ probably fits best.

That then makes the following section, para comer (to eat) make more sense. After nibbling some snacks we’re ready for some serious eating. And para acabar precedes desserts, coffees and after-dinner drinks so that has an easy fit.

So let’s look at a few others which do translate reasonably well via machine literal translation:

a elegir to choose
para compartir to share
por encargo on request
a tu elección at your choice
eliges tú los ingredientes you choose the ingredients

Despite both a elegir and a tu elección having ‘choose’ or ‘choice’ they seem to have quite different purposes in menus. a elegir usually precedes a list where one may choose one item whereas a tu elección seems to allow one to “customize” an item.

And here are a few more

al peso
casero

al peso usually is in the pricing section, i.e. one can order an amount (by weight) of something and then the price will be determined by that weight. casero or casera (if preceding a feminine noun) is quite common and best translates as ‘homemade’ although often the mechanical translations just say ‘home’ or ‘house’ (for those translations that “claim” context sensitivity, not word-by-word literal) but of course that is the word that is the stem of this, casa. While ‘homemade’ clearly means made in this establishment it doesn’t necessarily mean ‘made from scratch’, or, IOW, it may just be assembled from purchased elements.

And, even though this is another post, some menus like to use brand names as the simple label of the item, especially at one establishment for desserts. So I learned MAGNUM MOMENTS is not some strange loanwords in Spanish, but just a European brand of ice cream in a particular portion and COPA BRASIL or DELISS LATTE are the names of packaged ice cream treats. Literal translation (or no translation at all as Google stumbled on these) isn’t going to help you much in picking one.

There are other phrases I’ve encountered but these where just in a few of the menus from a couple of restaurants. Someday I’ll have to complete a full list.

Poor name choice of this blog

When I started this blog I just looked up a few words to pick what I thought would be an appropriate name. Did I mention (of course, just kidding) I don’t speak any Spanish. But after months of working on my project I’ve learned a few things as I’ve been analyzing thousands of machine translations by either Google or Microsoft.

Here’s an interesting “mistake” that led to some study and then a part of my point in this post:

brillante

Vino perfectamente límpido y transparente. Es un factor que tiene que ver con la juventud del vino. Atravesado por la luz parece brillar.

bright

It came perfectly limpid and transparent. It is a factor that has to do with the youth of wine. Pierced by the light seems to shine.

Translating vino as ‘it came’ is strange (but occurs frequently in the wine vocabulary I’m compiling now).  Recalling the familiar saying of Julius Caesar (veni vidi vici or I came, I saw, I conquered) is the clue. The Spanish for ‘to come’ is venir and the conjugation for third person singular past tense is, ta-da, vino! So, in fact, ‘it came’ is a completely reasonable (but wrong) translation. Obviously given I’m looking at lots of definitions of wine terms, vino in this context is, of course, wine. So much for context sensitivity in machine translations.

But the point which I learned by just some brief reading about syntax of Spanish is that most verbs are regular in their conjugations and so pronouns can be deduced unambiguously and thus are usually omitted.

So mistake #1. Yo is unneeded and I should have solely used traduzco (the first person singular present tense of traducir (to translate).

Mistake #2 is a bit less obvious. Yes, comida can mean ‘food’ which was my intent. But in Spain it can also mean lunch which definitely is not my intent. And it can also mean ‘meal’ (the act of eating, not the food itself) which is a bit better.  In fact the authoritative dictionary of Spanish, the Diccionario de la Lengua Española from Real Academia Español has five meanings for comida.

Now really my project is to construct a robust translation tool for menus in Spain. The Spanish menú is too close to English and thus wasn’t Spanish-y enough for me and so I picked comida instead (plus I wasn’t quite sure how to get the ú in the blog name at WordPress).

So traduzcomenú would be a better name but now it’s too late to change.  It’s probably fair that the name I chose doesn’t make sense and thus a clue to a true Spanish speaker how clueless I am. Not a good start if I’m claiming I’m going to build a really good translation tool.

Oh well, live and learn.

A blogging dilemma

I’m using this blog (partially) to “document” interesting tidbits I encounter while doing research for my anticipated smartphone app to translate menus in Spain. That app needs to have a comprehensive and accurate dataset to use in the translation, not just the equivalent English term (which doesn’t always exist) but also some description. For example, what is sobrasada? Yes, it’s ‘sausage’ but saying that (or even ‘spicy pork sausage’) doesn’t tell you very much.

So I’m using various sources to build up a “big data” corpus which will have translation errors and other errors. But algorithmically I can extract from that corpus what I’ll need to power the app. But I have to build that corpus manually, often exploring “puzzles” I find in trying to figure out a proper equivalent in English for some culinary item I find in Spain (btw, I am focusing on Iberian Spanish and trying to prevent terms only found (or used differently) in the New World from defocusing my corpus).

So I’m doing several things with these posts. First they are a kind of journal (or lab notebook) for various translation/description puzzles I try to solve. While I have many MSWord files with the raw work the blog posts highlight some interesting (at least to me) bits. Second by writing for potential readers I have to work a bit harder to try to have my posts accurate and at least somewhat coherent (instead of the real-time stream-of-consciousness in my raw material). This more careful writing makes the posts better but does have a real downside – it’s SLOW. It might not seem like it to you, Dear Reader, but I probably spend more time writing a post about something interesting in a menu than it took me to decipher the entire menu. So at some point the blogging gets in the way of my work.

But the real “dilemma”  I have is that I just don’t get the posts done, at anywhere near the rate I’m discovering the tidbits I want to write about. And days later when I go back over my raw data I often can’t recreate my thoughts or discover I forget to include links or definitions or whatever and don’t much feel like repeating my work.

My posts are fairly long which is good and bad. It’s good because I try to weave multiple points into a post, often with some background research. It’s bad, because the posts are probably too long for most readers’ attention spans and because I don’t get them done.

So every now I’m tempted to do short posts, literally for each situation I encounter, rather than trying to organize multiple examples into a single post.

For instance, I’ve started looking at a new source. Previously I’d used menus I could extract from restaurant websites along the course of the Camino de Santiago, and several online glossaries and dictionaries. But I’d also stumbled on many sites (focused on Spain and entirely in Spanish) for recetas (recipes). These are more tedious to process but often contain information I don’t find elsewhere and therefore can stuff in my corpus so potentially less frequently used (in menus) terms are still incorporated.

So I just started a small trial to look at this recipe site. Under its recetas tab it has 14 categories, and under Pasta y Arroz (pasta and rice) there are 15 webpages with about 12-16 recetas per page. IOW, this is a lot. And every receta is presented on the webpage as a caption (to a photo) where I can use Google Translate and then manually produce a side-by-side Spanish and English pair, such as:

Ñoquis de calabaza y boniato con salsa de gorgonzola Pumpkin and sweet potato gnocchi with gorgonzola sauce

For this I’d extract for my corpus ñoquis (gnocchi ), calabaza (pumpkin), boniato (sweet potato), salsa (sauce), and gorgonzola (gorgonzola). If I double check these term associations by looking in the Oxford dictionary or the DLE (more authoritative, but harder to use than Oxford) I  could add these associations to my corpus with higher confidence levels. IOW, mistakes are bound to get into the corpus without a lot of checking, but I’m also hoping the “big data” type filtering will eliminate the spurious pairs.

But what I just described as the process in this post took me quite a bit more time than it did for me to extract the side-by-side pair (still tedious but relatively quick) and do a quick visual parsing (really looking for any terms that require more research). Note that while I have no fluency in Spanish I do know a bit about the grammar and thus know how to spot parts-of-speech and change the word order used in Spanish to my normal English and thus find the term-by-term association. This entry was simple to do and the only (slightly) interesting part is that the original ‘gnocchi’ does have a different word in Spanish but ‘gorgonzola’ doesn’t (and as a somewhat interesting question, are these “Italian” words or now so incorporated in English, at least by foodies, to consider them English words (known linguistically as ‘loanwords’).

So of the first webpage of pastas this was the most interesting puzzle:

Escudella con sopa de galets, el plato estrella de la Navidad catalana Escudella (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) with soup of galets (is this short for galettas?), the star dish of Catalan Christmas

but Oxford has it with a definition (didn’t have translation) in which case it was a specific dish

no, galets appears to be a type of pasta (shells) https://www.tienda.com/products/galets-nadal-pasta-sandro-desii-su-40.html

This is my raw entry. Since escudella and galets appear in the Google Translate as same word in English (i.e. not translated or perhaps there is no translation) this is the type of thing I look for to do more research. When I merely asked Oxford for the translation of  escudella it said that was missing. What it does show (helpfully) is close matches which in this case I tried its suggestion of escudilla (which is bowl and kinda seems to fit this recipe name). So you see the note I made to myself (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) but that’s just a start. Since I’ve done this a lot I immediately used the Oxford a different way; instead of asking for translation I asked for definition (of escudella ) and it had this in Spanish (then with Google’s English:

Plato que consiste en un caldo de carne y hortalizas, colado, en el que se cuece arroz, fideos u otro tipo de pasta; es un plato típico de Cataluña, comunidad autónoma de España. Plate consisting of a broth of meat and vegetables, strained, in which rice, noodles or other type of pasta are cooked; It is a typical dish of Catalonia, autonomous community of Spain.

Now I could immediately point out that Google’s translation of plato as ‘plate’ is not correct as plato also means ‘dish’ which fits better but that’s the typical kind of digression I get into that just makes posts take even longer.

Now meanwhile I thought I recognized galets. I did a previous post about the menu from a store selling cookies (as a bit of diversity from just restaurant menus). So I double checked by asking Oxford for the Spanish translation of ‘cookie’ (which is lists also as biscuit in British English) and it has galletas (as I thought I recalled). So I thought this might be some colloquial term for cookie.

But now my “translation” ‘bowl with soup of cookies’ is pretty obvious nonsense and so no better than the untranslated correspondence. So, since this is a new source and I’d already discovered I could click on each receta and get a full page explanation (intro to the disk, ingredients, preparation) I began to see the flaws in my attempt to unravel this puzzle. As the recipe page itself is entirely in Spanish I have the same kind of puzzle, i.e. Google again botched some of the translation. But there is enough text and importantly a picture that I could try some searching and I found galets as an item I can buy online (I’ve often used this source in this project). These look like (in both the recipe picture and the tienda picture as fairly ordinary pasta shells (I don’t see what’s special about them) but pasta shells are pasta shells (except maybe tiny details) so now I’d know what I am getting if I’d picked this off a menu in a restaurant.

So finally I know both these words don’t have English translations so I’d want a different kind of entry in my corpus of a short description and then potentially a longer one. Thus a diner using my app could learn about this dish.

So there, you see what I mean. This post has taken me far longer than the original analysis. Yet it’s good (for my purposes, hopefully somewhat interesting to you, Dear Reader) to have this more complete explanation (I can re-read this post someday when I’ve completely forgotten this and have to resolve something in my app). But if I’d simply written this one item in the most brief form (to jog my memory later, plus at least some glue prose to make it read better than my raw notes) I would have gotten this done.

But it also means I’d probably have many more posts which is mixed benefit as well. So, IOW, there really isn’t a great answer.

So I have a solution. I can use categories to distinguish the posts that are really minimal and that I create almost immediately after doing the work for the corpus. These will really be post “fragments” but at least I get more recorded.

For instance, I was looking at a menu on Friday and its Menu del Dia was for Mother’s Day so I had in mind a post to create on the 5th. But instead I spent most of the day cooking for our Cinco de Mayo feast (and drinking a few too many margaritas). So I never did that post and now the “joke” of it is gone as its timeliness is past.

So I’ll continue to struggle with this, fragmentary and terse posts, or (sometimes too long) complete posts.

Where did I go?

I was generating fairly regular posts but then dropped out of sight for almost two weeks – what happened? Well I’ve been out of town and thus mostly offline, south to Oklahoma. It’s not that Oklahoma doesn’t have the Net – I was just busy and my work on food terms in Spain is on a computer back home so I had nothing new to post.

Oklahoma is a long and not very interesting drive from Nebraska with most of the distance in Kansas. To most people the variation is scenery is so slight they’d say it all looks the same (and it has some of the same dusty and dry character of the part of Spain now along my virtual trek on the Camino). But to those of us starved for something to see there is a difference, even several regional variations (e.g. the Flint Hills) on the drive and it is easier to make that drive with brief excursions off the main route.

I was doing the trip to meet with a new attorney to finally start the process in Oklahoma to transfer my mother’s estate to her heirs. Her/our family has had a farm there for four generations. The farm isn’t much, as a farm. It served, many decades ago, as a subsistence farm for the family with most of its production for the family’s own consumption. Some cream and eggs got sold for cash to buy things. But industrialized agriculture, in the USA, has largely driven this type of farm out of operation. Today it serves just as grazing land for a tenant rancher. Much of the land in the immediate area is abandoned for agriculture.

But today the land grows something else – energy. On our 1/8th section (80 acres, sounds large but that is small in USA) there is one wind turbine from a fairly large wind farm (just like the turbines one sees along the Camino as Spain is more advanced in use of wind power than the USA). It was chugging away most of the time we were there (this is the windy and stormy time of year) and every revolution puts some cash in the pocket of landowners. Wind is new, oil and natural gas are old. The new and sometimes controversial technology of horizontal drilling and fracking has drastically increased production. So there is a new well, over a mile away on the surface, that has sent out its horizontal shafts under our land. And these horizontal wells, with a much large collection area (than a vertical shaft) is also a nice income.

That is, if I can ever get the deeds settled. Back when the land was just for low value farming the legal standards of ownership records were less. Today there is more at stake and so the standards are higher. Probably in any multigenerational ownership story, almost anywhere, there are gaps – some probate was never filed with the county clerk, some conveyance deed was properly signed or dated, or some change in marital status wasn’t recorded, or whatever. Everyone (local) knows Person X owns the land but challenged these claims may not stand up. So therefore I will have substantial legal bills and years of chasing lost documents to ever establish ownership (by my mother) which no one challenges. What fun!

Meanwhile the drive, as I mentioned, is fairly boring so we try to spice it up a bit with geodashing. Once upon a time there was no GPS (at all, then for a while it was only massively expensive military technology). I happened to work next door to Trimble who developed the first civilian GPS technology, later made more affordable and so learned of GPS before most people. So when commercial GPS was new and just barely available to the public it was a novelty and a number of “games” evolved using GPS. geocaching is the best known. For a while everyone wanted to rush out to those spots on the globe, known as confluences, where the GPS would read XX.0000 and YY.0000.

There are only so many of those and all that could be found have been. So geodashing  was developed to create artificial and thus sustainable purely random locations to find. And to make a game out of the search. Why? For fun. What is there? geocaching goes to some place, for sure, that another person has been (they left the cache there) but geodashing goes to a completely unknown (to outsiders, obviously locals know it) location. The game insists on not violating trespassing so often the location is not reachable (we must get with 100 meters). So each month when the new dashpoints are published we silly folks doing this game put them on maps and figure out whether they can be reached via public right-of-ways and then, more importantly, if there is any pattern that can allow reaching the most dashpoints with the least driving.

OTOH, when one has a long drive we look for something to break up the monotony by locating nearby dashpoints along the route. The drive from Nebraska to Oklahoma can be done purely on freeways (really limited access multilane highways as one part, the Kansas Turnpike is definitely not “free”). It’s really boring to just see 550 miles of pavement. Tourists drive through the midwestern “fly-over” USA states, especially along I-80 in Iowa or Nebraska hoping to get to the interesting tourist destinations further west, so I-80 looks really monotonous (and is).

But get off the main route, designed for speed, even if a non-tourist part of USA interesting things can be found. Before the Interstate highway system drivers were on two-lane roads that deliberately went into every town along the way. Frankly this is a lot like what I see on the Camino, a route that reaches a new small town every few miles. As in the USA there is some parallel route high-speed highway to go from the major spots, i.e. Logroño to Burgos that bypasses all these towns. But the Camino walking moves at a different pace and that is exactly the point.

And it is the same point with geodashing. There is no there-there at a random longitude and latitude (sometimes there actually is). It is the JOURNEY, not the destination. The slogan of geodashing is “getting there is all the fun” and that’s why we crazy people do this. There are surprises everywhere and interesting things one never even knew existed. Sure everyone knows about Yellowstone or Glacier or Grand Canyon or Yosemite but what is in Templeton Iowa or Arthur Nebraska? Scale is everything and that is part of the appeal, to me, of the Camino. When you zip by at 120kph in a car everything outside is a blur, but passing on foot at 5kph (and easy to stop and look around) the world is different. And driving on a farm road (which here look much like most of the Camino route) at 50kph and being able to stop anywhere since it might be hours before another car comes by is a very different way to see the world.

So the route from Nebraska to Oklahoma is really boring, unless you can get off the main road, if only for a bit, and see something you never expected by going to someplace entirely random. There may be huge historical differences between geodashing and a pilgrimage on the Camino but there is also a lot of similarity.