Back to menus; a big project

My primary purpose for this blog is to record my progress in developing an application to translate menus in Spain. I worked diligently on this for about nine months but then got into some side-trips in other projects. But now I’m trying to get back to that primary objective.

For 78 days now I’ve also been trying to actually learn Spanish via the nice online application, Duolingo. While this diverted me from my primary task it has been useful. My sister always thought my idea was silly and that instead I should just learn the language. That’s not a bad idea but it looked harder (and more time consuming) than my primary limited work just to read menus, based on the assumption I’d soon be heading to Spain to tour along the route of the Camino de Santiago. Therefore I needed results sooner than I could learn the language.

To build my application I’d first need a large corpus of terms from menus with accurate English equivalents. To do that I’d import the text from websites into a working document and crunch through all the terms. Often that gave me some interesting observations that I was converting to posts, hopefully also interesting to my readers. Obviously there are going to be mistakes in manually collating data so my corpus needed to be carefully curated, with the terms and my “guesses” at translation with a “confidence” factor. Then via the large corpus I could extract the accurate equivalent Spanish to English translations I’d need for the application.

That’s a long slog so a couple of times I went ahead and created a minimally curated “glossary” which I have as a page here at this site. In my searches I found a number of glossaries, or even dictionaries in Spanish, covering food. Years ago when I first got interested in these I just extracted all the glossaries I could find and manually collated them into a single glossary. It was a mess!

The trouble is that food terms in Spanish (my searches) yield results that either don’t apply to Spain’s food dialect or were just wrong. After all any other person who compiles glossaries makes mistakes too. Or I’d make mistakes extracting and collating them. And my lack of any fluency in Spanish meant I often misinterpreted the raw material I was attempting to organize. That previous experience convinced me I needed to be very precise about collating material AND focused on Spain as the source of the raw material and so my idea about creating a corpus evolved.

But in nearly a year I still don’t have that corpus. And without it I can’t build my application. And in the meantime I needed to get some “drill” code done since I reached the point where I was forgetting more than I was learning. And while Duolingo is fairly good for learning Spanish it’s not as good for repeating previous lessons (and their vocabulary). And repetition is the key to learning a language. So I found myself forgetting vocabulary I’d once before acquired.

So I set out to build a drill application, which has some of the same elements I’d need in the translation application. And like compiling glossaries I’ve done this also, in the past – the first time for Italian food terms. So I’ve built drill programs before with only limited success.

The key to a drill program is to be efficient and force me to do repetitions of the vocabulary I know the least well. That’s harder than it sounds. Plus most of the types of drill I did (glorified flashcards, a common language learning technique) took so much time that as my vocabulary grew my repetition, of any particular word, got less and less frequent. Even with an hour a day I could only repeat a fraction of the vocabulary I’d acquired.

So I had some ideas how to improve this and make the drill more efficient. But I needed data even to do the programming. So I fairly quickly assembled the glossary I posted at this blog without being too concerned about its accuracy.

So with that lengthy background now I can describe what I’ve more recently done and the “big project” I’m now doing. I built my first version of the drill application centered around the Duolingo vocabulary. As I’d do each lesson I would fairly careful assemble the “database” (a complex XML) to feed the feed program. For my Duo vocabulary that now contains about 1100 “terms” and 1400 “forms” of those terms. By forms I mean the usual four spellings of adjectives (in Spanish both gender and number) and the first set of conjugations for verbs. Getting all that going for Duo vocabulary drills got me a fairly useful and efficient drill program which is helpful as a supplement to Duolingo.

So then using that code and crunching the glossary I’d assembled here I started on the food terms. And that was a bit of a mess because the glossary sucked.

So to fix this I went back to my 30 or so working documents of all the menus I’d processed. Rather than the more difficult chore of extracting material for a well curated corpus I just quickly (a couple of days) just extracted all the accumulated Spanish. That’s a tedious chore but it does reveal some of the problems of getting “raw” material from the websites. Naturally I found lots of spelling mistakes (easier for me to recognize now that I know a little Spanish) but also the inconsistencies in gender and sometimes number. Also many instances of words are very inconsistent on the use of accents in the Spanish words. My Duolingo study also let me learn the rule that accents sometimes change (for real, not typos) in certain circumstances.

So once I’d compiled all my “words” from all menus I had about 10,000 “raw” bits that I was able to clean up, de-duplicate and consolidate (like all the forms of adjectives under a single “term”) and ended up with about 5500 lines.

Then in a separate process I took the latest (v3.3) copy of my glossary and then combined that with about six other glossaries. That was a chore and resulted in about 4000 entries.

So then I combined these, all the glossary “words” and all the menu “words” and started going through all that by hand. I’m now down with everything through M (since I sort all 9000 or so lines into alphabetic order). I’ve done a few hundred “fixes” to my glossary and about 100 additions. But more importantly all those changes are in my XML “database” for the drill program. With a bit of code I can then extract from that XML to create text I can paste into the glossary page here.

So when I’m finally done with all that tedious manual work I can update my glossary and it will be a big change so I’ll make that the v4.0 version which I believe will be quite a bit better than my current v3.3 but not as good as a curated corpus needs to be. And, really my glossary will then mostly contain words that exist in reference sources (several online dictionaries I use) and/or reconciliation with the other glossaries I found.

Please note, therefore, than my word product is fully derivative from many sources and my editorial work and thus constitutes “original” work. I’m quite conscious of never (almost never) posting anything in this blog that would violate copyright, i.e. the wholesale use of someone else’s glossary.

And now all my material is synchronized – my XML database for the drill program, my derived glossary with reconciliation to other glossaries or reference sources, and I’m only including terms in either place that I’ve found in menus so my product is more closely aligned with Spain dialect and I can exclude other Spanish food terms.

Now, while that isn’t done, I’m back into the code for my drill program. In the case of my Duolingo vocabulary I feed into the drill program I (mostly) know that vocabulary by memory. Duolingo is divided into lessons (aka skills) that require 40 actual drills (to pass the skill and unlock the next one) which means about 800 individual drills. At Duolingo I’ve now done 16,843 “XPs” over 31 skills. On average each skill introduces around 30 words (forms actually). So when I do my “refresh my memory” drills with that vocabulary I have relatively few words I ever mark as uncertain, or worse, “I’m wrong” or “I’m clueless” (really forgot). That means all the scoring I’ve done with that vocabulary has relatively few “errors” and my aggregate score on most terms is 100%.

In contrast I’m much worse on my new food vocabulary. As I’d work on menus I’d “learn” many words, but since I had almost no repetition of those (the most common words appear on many menus so that was my repetition) and I’d done none of my own drill. Now that I have something to feed my drill program I’m getting a lot more “bad” scores. That’s good and bad. It’s bad because it means I don’t know those words very well, by memory. It’s good because now all the scoring of the drills I record in the XML has a lot more data than the drills on Duolingo vocabulary.

So that means back to programming. How do I consolidate tens of thousands of individual drills into some sort of metric that rates each word in the vocabulary as to how well I know it (and/or don’t confuse similar terms). Because I want to drill myself on what I know the least. I don’t very much need to drill on carne or aqua or cerveza or a few hundred other food words and I don’t want to waste the limited time I have for drills (even less than my free time because drill is tedious and I can only tolerate a certain amount each day). So that’s now the algorithms I’m trying to develop so my drill program is even more efficient and therefore more useful.

So while I thought I’d be done with this by now I have probably another week to finish cleaning up my food vocabulary and enhancing up my drill program.  But once I’m done with that I can spend 15-30 minutes every day (or most days) so I get more of the food vocabulary into longer-term memory along with a growing Duolingo vocabulary. Thus I’d hope to have reasonable fluency within a few months so soon I may need to head to some Spanish speaking country to test myself.

Now, note, all this is “reading” (and less “writing”) Spanish. Hearing or speaking is an entirely different problem. But without mastery over much of the vocabulary actual conversation is pretty hopeless. I’d originally assumed I’d have no more audible Spanish than a few phrases and the rest I’d do through reading (plenty of time to study a menu, have to be fast to have conversation).

Now, finally, all this I’m just doing for myself, other than relating some hopefully “interesting” tidbits here in the blog. While I’ve built many software products over my working life all this I’m just doing for myself. But at least, as a derivative from this work, I do hope to end up with the best glossary for food terms in Spain here at this blog as my contribution to others who might need this.

 

Advertisements

Something different

One problem with a virtual trek is that I don’t get an chance to take my own photos. I can’t post photos of other people so I can only talk about my “trek”. So photos to follow, but a little preface (scroll down if you’re impatient for the good stuff).

Well, actually I do go places. And I take photos. I very much enjoy the posts of loyal readers with fantastic photos, places I’d love to see, but at least I can experience through other people’s postings. So here’s a few to return the favor.

So, I recently got a new computer and I really wanted a new and fresh set of photos for my screen saver on my new large display. So I dug into my archive of over 40,000 photos to pick a few of the best. It was an adventure to look back over almost 20 years and a variety of digital cameras.

And while Spain, my current interest, is not much like Texas, there is some resemblance. When I first moved to Nebraska from the San Francisco Bay Area (Los Altos to be specific) I was really depressed. Withing an hour of my old house I could find, even on foot, beautiful country. Within a few more hours I could either be cross-country skiing or sipping wine in Napa or riding my bike along the Pacific Coast. In contrast even 6-8 hours of driving from Omaha it’s still just cornfields. So I went crazy, also given it is winter in Nebraska, and I threw my backpacking gear in the car and headed south. Three days later I found myself in Big Bend National Park in Texas. Now I get to say anything I want about Texas because I was born there in a city called Amarillo, needless to say nowhere near the correct Spanish pronunciation of the adjective, ‘yellow’. Texas is a huge state, probably bigger than Spain, so I’d never been to Big Bend and it was a thrill to visit. Later I convinced my wife that visiting some place where I’d been sleeping in a tent on the ground was still a fun vacation.

As many of my loyal Readers are not from the USA, you might still know that our insane president (pretender) wants to build a wall along the USA and Mexico border. Actually there is a “barrier” on most of the border except Texas. Folks in Texas hate “imminent domain” so even putting up fences has run into local opposition. But the real “barrier” is nature, fierce, but beautiful.

But far more important a big chunk of the US/Mexican border is a fantastically beautiful place, either the National Park  or the Texas State Park. Twice I’ve visited this area and the second time I had a digital camera so here are so photos to give you feel for this beautiful place AND how impossible the terrain is for any sort of hordes crossing the border. I’m not sure I’ve seen any border that is LESS possible for easy crossing. And it would be horrible to spoil the beauty of this area with an utterly useless Wall just to make MAGAs in Michigan (who’ve never been anywhere near the border) happy.

So here are my photos, please ENJOY this beautiful place. And for once I can contribute something to see.

Ick. There is something I don’t understand about posting photos. These photos look like a blurry mess, but not what I have in my files (these are originally 15Mpixel files from a Nikon). I’m trying various things to make them look like I see them, not sure what WordPress needs.

Here are a couple of scenic vistas in the general vicinity of the border:

Actually this isn’t quite near the border, it’s the Chisos Basin in Big Bend National Park but that’s where I was for this fantastic sunrise (it is about 5AM and a long exposure). Chisos Basis is the only accommodation in the park and is surrounded by mountains on all sides. The air is incredibly clear, and, of course dry (it is desert) so sunrises and sunsets are fantastic.

Do I mention you can see the sky here. My photos don’t even come close to the experience you can have, standing in the desert and seeing sky everywhere.

But now we come to the border.

From the US side this is looking north, in Texas State Park, with the Rio Grande behind us.

(Note: these photos look crummy to me, but they’re not all blurry like I see them as I make this post. I guess I don’t understand how to incorporate good photos in WordPress – click on the photo for a better one, but still much lower resolution than my original).

You can just barely see the river here, but this is a hint of surrounding country.

And here it is = the border, the Rio Grande – you can see the streams of immigrants flooding across. They come well equipped with climbing gear.

Again does that look like the kind of river you’re going to see a migrant caravan of women and children rushing across? Go luck kids.

A few miles down the river, still rough country – great sightseeing on the highway on the US side, pretty rough country with miles of desert on the Mexican side.

Here the Rio Grande might be easy to cross, but

here, not some much. This is the St Elena Gorge, as awesome cleft with steep cliffs on both sides of the border. When I first went to Big Bend my parents, who were “snowbirds” (people in cold climates with RVs who head to warmer climes near the border) warned my about Mexicans stealing my car. When I saw this gorge my reaction was – GOOD LUCK. A huge expanse of fierce desert to get to this gorge and then technical rock climbing to get to the US side. Hey, anyone intrepid enough to make that journey can steal my car! Needless to say there were no car thieves and anyone except USA tourists anywhere near this spot.

Maybe this crossing is lot easier, but still seriously demanding of outdoor skills.

And in case these barriers are not discouraging here’s a few other things you would face.

 

Amazing, this guy, about the size of my hand was just sauntering across the highway. Supposedly they’re fairly gentle but I wouldn’t want to put that idea to the test.

And, just more fun

These are called “horse crippler” cactus, and for good reason. Anyone daring this part of the world needs serious boots (and a good eye not to step on these).

A few times in my life I just zipped through the southwestern deserts of the USA but when I finally visited, slowly, on foot, these areas I was stunned at their beauty, something you have to see close up and in sync with nature.

The idea of putting a 10m high wall across this country, despite its stupidity for all the other reasons, is a criminal offense against the sanctity of nature. Spain has its beautiful spots, which I still hope to see, but the USA has fantastic spots as well.

Now, these photos are yucky, so I’m going to see if I can make them look better, more like I see them (I do have a rather good Nikon camera to shoot this stuff, not some two-bit cellphone camera).

Quiero hablar más español

It’s been quite a while since my last post. In addition to all the activities of the holidays I have continued, sporadically, to work on my project that is one of the subjects of this blog. So now I can report some progress.

As a reminder I am (slowly) working my way to develop a mobile application to translate restaurant menus in Spain. To accomplish this I am finding many menus from restaurants in Spain (only Spain to avoid Spanish terms from other Spanish-speaking lands). I translate these using machine translation (mostly Google Translate), then looking for discrepancies in that translation method and using either online dictionaries or Google searches to make better “guesses” about translation. Often terms on menus are not translated accurately (or at all) by machine translation

Once I have accumulated enough raw data (a never ending process) I can create a corpus with Spanish terms and the best English translation I can produce with a “confidence” factor (expressed as a probability). Once the corpus is large enough I’ll write code to extract the best food related (and a few other terms) vocabulary with the highest confidence levels of the accuracy of the translation. Once the vocabulary is “complete” (again a never ending process) I can build my application and then test it on all the menus I’ve accumulated. I’ll judge how well I’ve done this by expecting my translation tool to work much better than other machine translations.

Fine, a useful exercise as someday I hope to actually need to do this while touring Spain, an indefinite “wish” for me. Being able to accurately translate menus, as well as having knowledge of Spain’s cuisine I’d be able to wisely select my choices.

But, my sister, who was quite dedicated to mastering Spanish, albeit focused more on Mexican cuisine, was critical of my approach. Instead of just building an application her strong suggestion was merely that I should just become fluent in Spanish. A fine idea, but one I find very challenging.

Several times in my past I’ve attempted (not very vigorously) to learn Spanish. Since I lived much of my life in California some fluency in Spanish is almost a necessity. I first tried, decades ago, using the best technology then available, i.e. cassette tapes and accompanying text. Ugh. That was a bust. Later as computer tutorials became more common I also tried those, initially using DVDs (as the sound source, later just online voice recordings). These attempts all failed for me.

Why? For one thing I’m not very good at foreign languages. While I studied both French and German in several years of school classes I never got very far with those. My first trip to Germany was a joke at how badly I could either speak or hear. My only real exposure to having to use French was in Québec, during the time when speaking French was a strong “political” issue. I had a bit more success with that partly because everyone, e.g. waiters in restaurants, insisted on French. My stumbling attempts were at least considered a sufficiently sensitive effort that I had some success.

But with Spanish I have a different problem. The sounds of the language are much more alien to my ear – I really can’t hear the words, especially since, it seems to me, native speakers speak very fast and to my ear the words are run together. And, my attempts at speaking were even worse than my attempts to hear and understand. So this has been very discouraging and so I rejected my sister’s urging to just actually learn the language. Additionally I had the joke running through my head that her years of vigorous effort were analyzed by several other people that she had atrocious pronunciation, barely intelligible to a native Spanish speaker. If she couldn’t do it how could I possibly succeed.

BUT, in my effort to translate menus I’ve also found a serious stumbling block. Even with English menus often I need to have some conservation with the server to really understand the menu. And as I translated more and more menus I found this was even more true in Spain. Certainly discussing food with a knowledgeable server adds to the enjoyment of food (another lesson I learned from my sister who was more skilled at cooking than me and through example demonstrated how dining was more pleasant after discussing menu items in some detail).

So I happened to stumble on a new possible learning method. Just happening on an article on the Net about the best apps for “your new smartphone” (naturally timed with the assumption of Christmas gifts) I discovered Duolingo. Previously I’d done the demos with several of the subscription or purchased online tools with little success. But at least: a) Duolingo was free, and, b) it was available for my phone and so I could do the exercises at any time, not just during some study time while on my computer.

So I downloaded the app (both to phone and multiple computers) and committed myself to really giving an earnest effort to learn, at least some basic Spanish. Now, as best I know, traveling in Spain in the larger cities, especially those popular with tourists, probably doesn’t require speaking or hearing Spanish. When i visited Portugal I knew zero Portuguese but managed to get by OK (with some help from hotel staff making phone calls for me). And I managed to get by in both Japan and China, although with considerable help from the people I was visiting.

But my interest in visiting Spain is out in the countryside, initially focusing on the Camino de Santiago (the French route). Now I’m looking more at the Del Norte route since that part of Spain is more appealing to me that the dull plodding through country that looks a bit too much like the Great Plains or Central Valley of California. In such areas I would expect that at least some minimal conversational skill would be necessary. My hope would be: a) I could ask Spanish speakers to speak more slowly and thus hear each word, and, b) that my poor pronunciation wouldn’t prevent them from (mostly) understanding me.

So I’ve now worked as hard as I can on Duolingo. I strongly recommend this for anyone following my blog who might have the same need, especially as it is free (gracias to the community who create these lessons). I’ve made it through 12 days and 12 of the lessons. Duolingo requires a LOT of repetition and thus this forces me to work hard enough at estudio that I actually have made some progress.  Even the sentence I used as the title of this post would have been impossible for me prior to Duolingo.

In the first part of each exercise Duolingo introduces one to vocabulary (and without the more academic approach to grammar, i.e. simple conjugation of verbs). Then the exercises move more and more to responding to spoken phrases or sentences by: a) writing what was said in English, and, b) much harder, writing what was said in Spanish. Each exercise gets steadily harder making it difficult to “guess” and thus requiring actually learning something, especially when one has to actually type the Spanish (from an utterance), especially being picking about getting gender and verb conjugation right. The sheer repetition is working for me.

Despite my best progress ever attempting to learn Spanish I: a) still find it difícil to “hear” the utterance spoken at full speed.  I often either cannot hear the spaces between words or miss subtle bits (I really have trouble hearing una vs un). But since I must get every drill question right before I can proceed I muddle through. So thus far Duolingo reports I’ve now encountered 308 words (many useless for my purpose, also they count each version of a verb as a separate word). Thus far, as far as verbs go I’m still only in the present tense and with the singular persons (figuring out at usted is third person like él or ella was fun since Duolingo mostly uses the informal second person tú  as ‘you’, which often would be rude for me to use in conversation).

While Duolingo focuses on conversation instead of the typical more “academic” language study (all the grammar details, especially conjugations) I’ve done more exploration with other tools (especially spanishdict.com and Wikipedia) to go beyond the Duolingo simple lessons. I’m accumulating some of my own “lessons” to supplement the Duolingo lessons.

Now another challenge for me is that I’ve also learned, in past language learning efforts, that I’m fairly good at immediate duration memory. So while I’m intensely involved I learn to recognize many words. Unfortunately weeks later I’ve forgotten most of those. So, with Duolingo I actually repeat finished exercises to continue repetition which is key.

BUT, repeating everything is time-consuming and not that helpful. The real repetition I need to do is the vocabulary (or sometimes grammar) that I do badly. So now I’m thinking about another bit of programming for my own learning tool.

Once before I built a fairly complex bit of code to extend my English vocabulary. Using something built into Kindle I would mark English words that I either didn’t know at all (like reading more “academic” texts that use more esoteric vocabulary) or that I wasn’t really sure about. Kindle had a drill application that accumulated the words I’d mark as I encountered them in some book. But the Kindle drill, like Duolingo, wasn’t very “smart” about focusing my drill time on the words that gave me the most trouble. So in my own app I developed a scoring system that adjusted my drill to the words I most often missed and also then made sure all but the easiest (for me) words were at least repeated some. I spent a lot of time tuning how that algorithm worked but never was completely satisfied with it.

So with Duolingo as a model (incomplete for what I need) and all my past efforts at learning languages I soon will begin to build my study app (a fancy version of the classic flashcards, especially for verbs and gender). I can move all my Duolingo vocabulary to that app, plus much of what I’ve accumulated from menu study, plus just grabbing more words not found in either source from either: a) various lists I’ve found of the “most common” Spanish words, or, b) from going through a couple of dictionaries, tourist phrase books and grammar books I’ve purchased for my Kindle.

Eventually I would expect my drill app to be sufficient to potentially get by in parts of Spain where I might not find any English speakers. One thing I have learned from my foreign travel is that travel itself (public transportation, getting directions) often requires speaking to people who don’t know English (say, unlike typical tourist destinations, i.e. city hotels, museums and restaurants).

But all this is just a start. I know, largely from my experience in Québec that “immersion” is the real way to learn a language. To be someplace where there is no English mandates that I at least stumble through some sort of conversation to get what I need. Mi esposa loved her weeks in Oaxaca and wants to go back (which I’ve resisted) so perhaps I’ll give in and make the trip she wants as preparation for Spain (just as Québec can be a shorter preparation trip for going to France).

So, I won’t belabor this point much more in posts since I’ve focused this blog on food in Spain and the Camino. My efforts to learn a language are probably even more boring to my readers. But I will supplement some of my posts purely about food terms with a bit more of the conversational stuff I pick up through this other study.

 

 

Lost a post

This is primarily a note to myself to punctuate the flow of this blog to reflect some history.

I’m disappointed that I managed to delete a post I had mostly composed. Sometimes, and usually for more difficult posts, I work offline in MSWord to compose my posts. The way WordPress works isn’t that helpful for posts that take a while to compose or when I need to do more research for the subject of the post. And in the case of my lost post I was doing a lot of background research.

Previously I’d complained (in other contexts) about losing posts or especially comments in WordPress. Since the text editor running in a browser can’t access the local file system no temporary saves can be made (for posts, but not comments, WordPress does some saves to its cloud). I’ve lost enough that way to have adopted working offline (which can do temp saves) but even that is subject to human error.

So, what the post was about was restaurants in Astorga (surprised to find so many) and a local dish that is quite popular and featured by many of those restaurants, cocina maragato. I had done a lot of research on all this which was contained in the incomplete post I lost (also the menu translations that were background source information for the post). So you can go look at Astorga yourself (via Google Maps) and start with this link for cocina maragato.

Then, poof, in too much of a hurry I deleted the entire post AND the multiple menus I’d extracted and analyzed. What happened was that in attempting to add a new menu strange stuff showed up when I pasted the content from the website into MSWord. So I thought I was reversing that paste but instead deleted all the text in the file. Later in the evening while doing shutdown of my computer I got the notification from MSWord whether I wanted to save or not and, stupidly, just said yes without checking what changes it was trying to save. Poof, now I have an entirely empty file that previously had been many pages long.

This was quite discouraging to lose hours of work so I’m doing this notification post to kind of purge my despair and thus get back to work again on menus. And then make some new real posts.

 

A blogging dilemma

I’m using this blog (partially) to “document” interesting tidbits I encounter while doing research for my anticipated smartphone app to translate menus in Spain. That app needs to have a comprehensive and accurate dataset to use in the translation, not just the equivalent English term (which doesn’t always exist) but also some description. For example, what is sobrasada? Yes, it’s ‘sausage’ but saying that (or even ‘spicy pork sausage’) doesn’t tell you very much.

So I’m using various sources to build up a “big data” corpus which will have translation errors and other errors. But algorithmically I can extract from that corpus what I’ll need to power the app. But I have to build that corpus manually, often exploring “puzzles” I find in trying to figure out a proper equivalent in English for some culinary item I find in Spain (btw, I am focusing on Iberian Spanish and trying to prevent terms only found (or used differently) in the New World from defocusing my corpus).

So I’m doing several things with these posts. First they are a kind of journal (or lab notebook) for various translation/description puzzles I try to solve. While I have many MSWord files with the raw work the blog posts highlight some interesting (at least to me) bits. Second by writing for potential readers I have to work a bit harder to try to have my posts accurate and at least somewhat coherent (instead of the real-time stream-of-consciousness in my raw material). This more careful writing makes the posts better but does have a real downside – it’s SLOW. It might not seem like it to you, Dear Reader, but I probably spend more time writing a post about something interesting in a menu than it took me to decipher the entire menu. So at some point the blogging gets in the way of my work.

But the real “dilemma”  I have is that I just don’t get the posts done, at anywhere near the rate I’m discovering the tidbits I want to write about. And days later when I go back over my raw data I often can’t recreate my thoughts or discover I forget to include links or definitions or whatever and don’t much feel like repeating my work.

My posts are fairly long which is good and bad. It’s good because I try to weave multiple points into a post, often with some background research. It’s bad, because the posts are probably too long for most readers’ attention spans and because I don’t get them done.

So every now I’m tempted to do short posts, literally for each situation I encounter, rather than trying to organize multiple examples into a single post.

For instance, I’ve started looking at a new source. Previously I’d used menus I could extract from restaurant websites along the course of the Camino de Santiago, and several online glossaries and dictionaries. But I’d also stumbled on many sites (focused on Spain and entirely in Spanish) for recetas (recipes). These are more tedious to process but often contain information I don’t find elsewhere and therefore can stuff in my corpus so potentially less frequently used (in menus) terms are still incorporated.

So I just started a small trial to look at this recipe site. Under its recetas tab it has 14 categories, and under Pasta y Arroz (pasta and rice) there are 15 webpages with about 12-16 recetas per page. IOW, this is a lot. And every receta is presented on the webpage as a caption (to a photo) where I can use Google Translate and then manually produce a side-by-side Spanish and English pair, such as:

Ñoquis de calabaza y boniato con salsa de gorgonzola Pumpkin and sweet potato gnocchi with gorgonzola sauce

For this I’d extract for my corpus ñoquis (gnocchi ), calabaza (pumpkin), boniato (sweet potato), salsa (sauce), and gorgonzola (gorgonzola). If I double check these term associations by looking in the Oxford dictionary or the DLE (more authoritative, but harder to use than Oxford) I  could add these associations to my corpus with higher confidence levels. IOW, mistakes are bound to get into the corpus without a lot of checking, but I’m also hoping the “big data” type filtering will eliminate the spurious pairs.

But what I just described as the process in this post took me quite a bit more time than it did for me to extract the side-by-side pair (still tedious but relatively quick) and do a quick visual parsing (really looking for any terms that require more research). Note that while I have no fluency in Spanish I do know a bit about the grammar and thus know how to spot parts-of-speech and change the word order used in Spanish to my normal English and thus find the term-by-term association. This entry was simple to do and the only (slightly) interesting part is that the original ‘gnocchi’ does have a different word in Spanish but ‘gorgonzola’ doesn’t (and as a somewhat interesting question, are these “Italian” words or now so incorporated in English, at least by foodies, to consider them English words (known linguistically as ‘loanwords’).

So of the first webpage of pastas this was the most interesting puzzle:

Escudella con sopa de galets, el plato estrella de la Navidad catalana Escudella (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) with soup of galets (is this short for galettas?), the star dish of Catalan Christmas

but Oxford has it with a definition (didn’t have translation) in which case it was a specific dish

no, galets appears to be a type of pasta (shells) https://www.tienda.com/products/galets-nadal-pasta-sandro-desii-su-40.html

This is my raw entry. Since escudella and galets appear in the Google Translate as same word in English (i.e. not translated or perhaps there is no translation) this is the type of thing I look for to do more research. When I merely asked Oxford for the translation of  escudella it said that was missing. What it does show (helpfully) is close matches which in this case I tried its suggestion of escudilla (which is bowl and kinda seems to fit this recipe name). So you see the note I made to myself (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) but that’s just a start. Since I’ve done this a lot I immediately used the Oxford a different way; instead of asking for translation I asked for definition (of escudella ) and it had this in Spanish (then with Google’s English:

Plato que consiste en un caldo de carne y hortalizas, colado, en el que se cuece arroz, fideos u otro tipo de pasta; es un plato típico de Cataluña, comunidad autónoma de España. Plate consisting of a broth of meat and vegetables, strained, in which rice, noodles or other type of pasta are cooked; It is a typical dish of Catalonia, autonomous community of Spain.

Now I could immediately point out that Google’s translation of plato as ‘plate’ is not correct as plato also means ‘dish’ which fits better but that’s the typical kind of digression I get into that just makes posts take even longer.

Now meanwhile I thought I recognized galets. I did a previous post about the menu from a store selling cookies (as a bit of diversity from just restaurant menus). So I double checked by asking Oxford for the Spanish translation of ‘cookie’ (which is lists also as biscuit in British English) and it has galletas (as I thought I recalled). So I thought this might be some colloquial term for cookie.

But now my “translation” ‘bowl with soup of cookies’ is pretty obvious nonsense and so no better than the untranslated correspondence. So, since this is a new source and I’d already discovered I could click on each receta and get a full page explanation (intro to the disk, ingredients, preparation) I began to see the flaws in my attempt to unravel this puzzle. As the recipe page itself is entirely in Spanish I have the same kind of puzzle, i.e. Google again botched some of the translation. But there is enough text and importantly a picture that I could try some searching and I found galets as an item I can buy online (I’ve often used this source in this project). These look like (in both the recipe picture and the tienda picture as fairly ordinary pasta shells (I don’t see what’s special about them) but pasta shells are pasta shells (except maybe tiny details) so now I’d know what I am getting if I’d picked this off a menu in a restaurant.

So finally I know both these words don’t have English translations so I’d want a different kind of entry in my corpus of a short description and then potentially a longer one. Thus a diner using my app could learn about this dish.

So there, you see what I mean. This post has taken me far longer than the original analysis. Yet it’s good (for my purposes, hopefully somewhat interesting to you, Dear Reader) to have this more complete explanation (I can re-read this post someday when I’ve completely forgotten this and have to resolve something in my app). But if I’d simply written this one item in the most brief form (to jog my memory later, plus at least some glue prose to make it read better than my raw notes) I would have gotten this done.

But it also means I’d probably have many more posts which is mixed benefit as well. So, IOW, there really isn’t a great answer.

So I have a solution. I can use categories to distinguish the posts that are really minimal and that I create almost immediately after doing the work for the corpus. These will really be post “fragments” but at least I get more recorded.

For instance, I was looking at a menu on Friday and its Menu del Dia was for Mother’s Day so I had in mind a post to create on the 5th. But instead I spent most of the day cooking for our Cinco de Mayo feast (and drinking a few too many margaritas). So I never did that post and now the “joke” of it is gone as its timeliness is past.

So I’ll continue to struggle with this, fragmentary and terse posts, or (sometimes too long) complete posts.

Where did I go?

I was generating fairly regular posts but then dropped out of sight for almost two weeks – what happened? Well I’ve been out of town and thus mostly offline, south to Oklahoma. It’s not that Oklahoma doesn’t have the Net – I was just busy and my work on food terms in Spain is on a computer back home so I had nothing new to post.

Oklahoma is a long and not very interesting drive from Nebraska with most of the distance in Kansas. To most people the variation is scenery is so slight they’d say it all looks the same (and it has some of the same dusty and dry character of the part of Spain now along my virtual trek on the Camino). But to those of us starved for something to see there is a difference, even several regional variations (e.g. the Flint Hills) on the drive and it is easier to make that drive with brief excursions off the main route.

I was doing the trip to meet with a new attorney to finally start the process in Oklahoma to transfer my mother’s estate to her heirs. Her/our family has had a farm there for four generations. The farm isn’t much, as a farm. It served, many decades ago, as a subsistence farm for the family with most of its production for the family’s own consumption. Some cream and eggs got sold for cash to buy things. But industrialized agriculture, in the USA, has largely driven this type of farm out of operation. Today it serves just as grazing land for a tenant rancher. Much of the land in the immediate area is abandoned for agriculture.

But today the land grows something else – energy. On our 1/8th section (80 acres, sounds large but that is small in USA) there is one wind turbine from a fairly large wind farm (just like the turbines one sees along the Camino as Spain is more advanced in use of wind power than the USA). It was chugging away most of the time we were there (this is the windy and stormy time of year) and every revolution puts some cash in the pocket of landowners. Wind is new, oil and natural gas are old. The new and sometimes controversial technology of horizontal drilling and fracking has drastically increased production. So there is a new well, over a mile away on the surface, that has sent out its horizontal shafts under our land. And these horizontal wells, with a much large collection area (than a vertical shaft) is also a nice income.

That is, if I can ever get the deeds settled. Back when the land was just for low value farming the legal standards of ownership records were less. Today there is more at stake and so the standards are higher. Probably in any multigenerational ownership story, almost anywhere, there are gaps – some probate was never filed with the county clerk, some conveyance deed was properly signed or dated, or some change in marital status wasn’t recorded, or whatever. Everyone (local) knows Person X owns the land but challenged these claims may not stand up. So therefore I will have substantial legal bills and years of chasing lost documents to ever establish ownership (by my mother) which no one challenges. What fun!

Meanwhile the drive, as I mentioned, is fairly boring so we try to spice it up a bit with geodashing. Once upon a time there was no GPS (at all, then for a while it was only massively expensive military technology). I happened to work next door to Trimble who developed the first civilian GPS technology, later made more affordable and so learned of GPS before most people. So when commercial GPS was new and just barely available to the public it was a novelty and a number of “games” evolved using GPS. geocaching is the best known. For a while everyone wanted to rush out to those spots on the globe, known as confluences, where the GPS would read XX.0000 and YY.0000.

There are only so many of those and all that could be found have been. So geodashing  was developed to create artificial and thus sustainable purely random locations to find. And to make a game out of the search. Why? For fun. What is there? geocaching goes to some place, for sure, that another person has been (they left the cache there) but geodashing goes to a completely unknown (to outsiders, obviously locals know it) location. The game insists on not violating trespassing so often the location is not reachable (we must get with 100 meters). So each month when the new dashpoints are published we silly folks doing this game put them on maps and figure out whether they can be reached via public right-of-ways and then, more importantly, if there is any pattern that can allow reaching the most dashpoints with the least driving.

OTOH, when one has a long drive we look for something to break up the monotony by locating nearby dashpoints along the route. The drive from Nebraska to Oklahoma can be done purely on freeways (really limited access multilane highways as one part, the Kansas Turnpike is definitely not “free”). It’s really boring to just see 550 miles of pavement. Tourists drive through the midwestern “fly-over” USA states, especially along I-80 in Iowa or Nebraska hoping to get to the interesting tourist destinations further west, so I-80 looks really monotonous (and is).

But get off the main route, designed for speed, even if a non-tourist part of USA interesting things can be found. Before the Interstate highway system drivers were on two-lane roads that deliberately went into every town along the way. Frankly this is a lot like what I see on the Camino, a route that reaches a new small town every few miles. As in the USA there is some parallel route high-speed highway to go from the major spots, i.e. Logroño to Burgos that bypasses all these towns. But the Camino walking moves at a different pace and that is exactly the point.

And it is the same point with geodashing. There is no there-there at a random longitude and latitude (sometimes there actually is). It is the JOURNEY, not the destination. The slogan of geodashing is “getting there is all the fun” and that’s why we crazy people do this. There are surprises everywhere and interesting things one never even knew existed. Sure everyone knows about Yellowstone or Glacier or Grand Canyon or Yosemite but what is in Templeton Iowa or Arthur Nebraska? Scale is everything and that is part of the appeal, to me, of the Camino. When you zip by at 120kph in a car everything outside is a blur, but passing on foot at 5kph (and easy to stop and look around) the world is different. And driving on a farm road (which here look much like most of the Camino route) at 50kph and being able to stop anywhere since it might be hours before another car comes by is a very different way to see the world.

So the route from Nebraska to Oklahoma is really boring, unless you can get off the main road, if only for a bit, and see something you never expected by going to someplace entirely random. There may be huge historical differences between geodashing and a pilgrimage on the Camino but there is also a lot of similarity.

 

Post formatting problem found and fixed

I thought I was going to have to do a bunch of tedious work here at WordPress.com and then I discovered the problem.

I had a post (and thought there was more than one) where the entire body of the post had been converted to italics. This is bad since I make an effort to clearly mark Spanish (or other non-English) terms in italics with the English part of the post in non-italics. I noticed at least one post “screwed up”.

I tried various things to recreate the problem and failed to find what was causing this. The body of the post looks fine in the WordPress WYSIWYG editor but is wrong when viewed. I thought I’d have to repost a bunch of posts to correct this and that was going to mess up the history of this blog. But better to have the formatting of the posts correct.

So I started with the most recent post that had this problem. In one window I’d have the bad post open so I could copy and paste its text to a new post. A pain but the only way I thought I could fix it.

Then I saw the problem.

WordPress.com’s editor has a toolbar to select italics for some text in the body of the post. BUT that doesn’t work on the title. So being familiar with direct editing of HTML I used the <i>word</i> in the title, which works to get the Spanish word in italics in the title.

BUT, what if one forgets to include the </i> in the title?

Well, that messes up the HTML WordPress.com generates when they display the post and so italics is turned on but never off, so it applies to the entire post.

So mystery solved and a few posts are now repaired and in their proper (i.e. original) sequence in the blog.

Whew! Glad I spotted this and now know what I have to avoid.

Great posts from northern Spain

I just spent a while looking for every search term I could think of to find other peoples’ stories about Spain, especially northern Spain and Basque Country and my big interest FOOD. Many of the posts were really fantastic, great stories, wonderful photos, and lots for me to learn. I dream of doing some of the things the people making these posts are actually doing (and eating) so at least I get to share via their adventures. I’m doing my “virtual hike” (see older posts) but many people are out their on the Camino and seeing the real thing. At least I get to ride along.

I’ve always gotten a mixed message about the food. Some find it terrific and exciting, others (to say the least) not so much. I recognize that Italy and France have gotten all the attention (and their due) and when it comes to “Spanish” food the spicy and innovative dishes of the New World get lots of attention (and their due). But such a large peninsula with vast amounts of coastline, all sorts of climate zones, wonderfully interesting different languages and cultures within one country and long culinary traditions (as well as bastardized food of the modern era and over-popularized tourist temptation) is very exciting to me.

The people who have actually been there and their stories and especially in this age where sharing, virtually, with others, through photos and other media, is just very tempting. I’m doing my own project here and it’s the best substitute (in snowy and freezing Nebraska) for actually being there, so thanks to all those who share their stories.

 

Do I care if anyone is reading this?

Of course, but not very much.

I’ve done several blogs before. Most have gotten more (essentially) random hits than this one which is a bit surprising. It’s not like I expected this to be a trending hashtag, destined for viral spread, but the response has been underwhelming. How can a bunch of political rants get more attention than this (perhaps because the destruction of the U.S. by Trump is more important than Spanish food terminology – you think).

It’s a fair amount of work, in comparison to just keeping some notes and/or some thoughts about interesting things I’m discovering doing this project. But writing them up pushes me a bit. Even if you, Dear Reader, aren’t really out there you could be and I’d feel stupid to write something public that is badly done or especially wrong.

At the same time I’m just having fun with this project and using the observations I make in blog posts to both keep me motivated, keep me sharp, and looking for something interesting in what is otherwise a tedious process. Who knows if I’ll ever finish this and end up with some super app (Android only, sorry Apple makes it too difficult for app developers) to assist real people in real restaurants in real Spain.

But just think there are readers out there keeps me on my toes. Unlike our idiot president I can’t (won’t) just say any stupid thing I think about. I do, in some cases, think I’m being very clever about figuring stuff out. But then, unlike our insane president, I discover I’m wrong. Hey, being wrong is part of life and saying stuff in blog posts that isn’t quite correct is to be expected BUT the important point is to try – at least do some fact checking, at least try to be logical and consistent, at least try to be CORRECT.

I believe that nothing ever disappears in the digital world and like the real world every now and then works that attracted little attention in their day end up, eventually, having some value. With months into this project I’ve done a lot of searches to learn origins and/or definitions of terms, to disambiguate similar but not quite identical terms, AND, thus far, I’ve not found anyone else out there doing that. I have no doubt that someday an app will exist that will actually correctly and usefully translate menus, probably for most countries, not the literal and approximate (though still somewhat impressive) stuff existing software is doing – NO, something way better than that which is what I aspire to accomplish. But, as I mention in my About I am not a likely candidate to solve this problem. So real foodie, especially from Spain traditions with Spanish fluency could do this a lot better, but, thus far, I don’t see any such people stepping up to the plate.

So I’m on a virtual trek across Spain and I have (mostly) virtual readers of these posts. But that doesn’t mean the energy I use on a treadmill isn’t real, even though it’s not on the Camino, and it doesn’t mean that having some discipline to write posts carefully  because I might have readers isn’t also real.

So therefore thank you virtual Dear Readers – I know you’re really out there and I will continue to find every interesting thing I can about food in Spain and how to interpret the descriptive language into something meaningful for English speakers with some amount of foodie knowledge.

A couple of interesting new sources

For the most part I started collected my corpus of dual Spanish (Spain) / English words or phrases from menus I find online of restaurants that are definitely in Spain (so avoid other variations of Spanish in other parts of the world). It’s a tedious process to dig out the menus and create side-by-side tables in MSWord. But the slow and tedious process also allows me to learn (i.e. actual human intelligence vs Google’s AI approach) something that I’d miss with a more automated process.

And as I’ve mentioned my choice of restaurants to research comes from my virtual tour of the Camino de Santiago where I plot my cumulative mileage on a treadmill in my basement to actual waypoints along the trail. Given Google does a nice job of annotating various points of interest, esp. restaurants, I can find those that have menus online.

Fine, but recently I realized I can expand my sources for the corpus a bit more. Just out of curiosity I explored a link to a large grocery chain (BM SUPERMERCADOS) in Spain that happened to have an outlet in Estella. Exploring that website I found the Compra Online  link (Google translates to ‘online shopping’). And that part of the website has a large list of products one can purchase online (usually with pictures; and in categories) so a side-by-side translation corpus can be created, but also some brand names can be learned to subtract out of other menus where the brand name doesn’t translate and therefore is confusing what it means.

But then I found something even more interesting, again by accident. This is a real jewel, https://www.gallinablanca.es/recetas/. This is a large collection of recipes (recetas) which means lots of instruction of cooking terms plus lots about ingredients.  I’ve only just begun to explore this site but I also found it has a Diccionario (I think you can guess this as a cognate) truly a dictionary in that you click a word and a definition pops up, in Spanish (no English and Google Translate doesn’t work in these popups, so lots of fun to copy-and-paste the definition into a translation site). The website is produced by Gallina Blanca, which appears to be the maker (or brand) of various packaged food products which are also on sale at this site. There is a lot of food information here – too bad they don’t do an English version of the site so I’d get a better translation than Google. It’s a huge site as witnessed by its search results for ‘huevos’, 7,909 results!

And finally (and I’ll do a separate post on this) I found some food terminology that isn’t directly related to menus but can be used to supplement my corpus. Juice&World in Villatuerta is the manufacturer and distributor of various bottled drinks and they have their product list in both Spanish and English so I can obtain their translations (which, btw, doesn’t guarantee they do it any better than Google but hopefully they do). But  you get things like this to cut up to put in the side-by-side corpus

De la mezcla de zumo de lima, naranja y limón, con un toque de hierbabuena y menta, hemos creado esta bebida sin alcohol dando un estilo personal a la tradicional bebida cubana We have created this non-alcoholic drink from a mixture of lime, orange and lemon juice with a touch of spearmint and mint to give a personal style to the traditional Cuban drink

Now even though I don’t know Spanish I’ve done enough fiddling to figure out how to associate bits of the Spanish with their connected bit of English, like (easy) lima (lime, obvious cognate), naranja (orange, I happen to remember that) and limón (lemon, obvious cognate). But less obvious is hierbabuena which translates to spearmint even though spanishdict.com merely has its translation as mint because the y menta is the clue to tie to and mint in the translation and thus deduce spearmint as the word before y.

Interestingly it took a little remembering that adjectives follow nouns (often) and thus non-alcoholic drink is bebida sin alcohol.

This muddling through pieces of text with some sort of translation and with lookups, plus at least short-term memory, is actual part of my learning experience. If I had the time to do this all day-long (and I have tons of source material for that, already way behind on my inventory of links just from Estella alone and I really haven’t had the chance to do Pamplona, an even bigger list) I probably would know a lot of Spanish just from all the repetitive work that does help to burn words (plus a little structure of the language) into one’s brain.

Note: Added after original post. I was trying to locate the grocery I mentioned above on Google maps and instead ended up with this one, Dia, also in Estella. This gave me another interesting idea about confusing translations. Their online shopping is in categories so I was looking at pescado y marisco (fish and shellfish (or sometimes just generic term for any seafood)). And on that page there are images but also everything is either fish or some seafood, except tubo de pato which Google amusingly translated as ‘potato tube’. Since I’d just earlier been looking at potato options I wondered what a tube of potato might be (there is more to this story). In the image associated with this item it sure looks like the body of a squid and is labeled tubo de pato on the package. spanishdict.com fairly quickly resolves the silliness of Google’s translation by indicating pato is cuttlefish (the reverse lookup for ‘squid’ yields calamar, an obvious cognate to Italian but I have a hard time seeing any difference).

But based on only a single source is this information (Google translated):

They are selling a cephalopod of lesser gastronomic value than the squid that we appreciate,

The squid or giant squid , also known as luras in Galicia or cuttlefish in South America (although the cuttlefish is actually cuttlefish in our country, and is called choco when its size is like that of the palm of the hand), it constitutes several species , such as the common pota ( Todarodes sagittatus ), the flying squid ( Illex coindetii ), which is small in size, or the Argentine squid ( Illex argentinus ), which is granted greater quality.

Amusingly Google translated this article as “difference between squid and squid” given my query was ‘difference between pato and calamar’. It’s hard to say from a single source this is a correct distinction but it sounds good. Which then raises another issue – mislabeling of ingredients on menus. If one were concerned about this I suppose this is another reason to actually learn to speak and hear Spanish so one can query the server whether your menu item is the lesser cuttlefish or superior squid.

Note2: My other story was another stab at attempting to determine what patata fritas are (mentioned in earlier post). So, this grocery store has a convenient search so in went patata fritas and I got multiple pages of hits: mostly potato chips (including good old Lays) but also frozen potato wedges (kinda like steak fries, probably the closest to the literal translation) and also numerous frozen French fries (some with English on the packages, e.g. ‘frites’, ‘golden long’, and ‘wedges’). So this didn’t help any but it seems clear that if you want fries with your lunch you need to ask the server whether you’ll get chips or fries and I have no idea how to do that with minimal Spanish fluency.