1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Only registered members can see all the forums - if you've received an invitation to join (it'll be on your My Summary page) please register NOW!

  3. If you're looking for the LostCousins site please click the logo in the top left corner - these forums are for existing LostCousins members only.
  4. This is the LostCousins Forum. If you were looking for the LostCousins website simply click the logo at the top left.
  5. It's easier than ever before to check your entries from the 1881 Census - more details here

Geocoding old placenames

Discussion in 'Family Tree Analyzer' started by Bryman, Nov 25, 2013.

  1. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    I've just moved these posts and re-read them I've thus realised that you might not be getting what I think is a fundamental point. The Google Geocoding process is not deterministic thus you cannot just replace one bit of text with another and hope that will get the right lat/long the next time round. That is very messy solution and I'd actively avoid coding something like that. You want something that users can actively rely on adding a Google lookup into the equation then introduces randomness that would destroy the objective of having a reliable end result.

    What we need is a table that has

    Country, County, town/parish, lat, long, viewport

    ie: a fixed table so that if a user has entered a county name in its Victorian format the it will be in the list and will map to lat/long coordinates. So the table Tim suggested would be :

    England, Lancashire,, y1, x1, vp1 (this first one is just the county name no town or parish)
    England, Lancashire, Barrow-In-Furness, y2, x2, vp2
    England, Lancashire, Prestcot, y3, x3, vp3
    England, Lancashire, Ulverston, y4, x4, vp4
    England, Lancashire, Speke, y5, x5, vp5
    England, Lancashire, West Derby, y6, x6, vp6

    Now why is this better? Well think about it. It is very simple to generate a table that lists all the parishes in England, Wales, Ireland, Scotland with their old county name in the above format. Excluding the lat, long, viewport stuff. Various websites have that details.

    What do we do with such a file. Hmm lets think, well we could simply create a very quick GEDCOM from it and load it directly into FTAnalyzer. eg: using Excel it is trivial to create a line of CSV for each parish name and attach a header row eg: for a RESIdence record. paste that into a GEDCOM file that has a dummy header and dummy individual and hey presto you have a GEDCOM file where a dummy individual is resident in every parish in the UK.

    Then we would USE THE EXISTING CODE to geocode the locations in a fresh database. That file can then be geocoded using the existing tools, knowing that the database thus created is in exactly the format that FTAnalyzer needs. It can be a collaborative process eg: farm out different counties to different people so each volunteer works on a specific county. The database of old locations thus grows and can be collated into a central repository. This is what I mean by crowdsourcing. Spreading the load of checking the locations amongst lots of volunteers in exactly the same way FreeCEN did with census records.
     
  2. Britjan

    Britjan LostCousins Star

    This is not necessarily about old place names but do you envisage that there could eventually be a link to the wonderful data in A Vision of Britain ? You mentioned the site in another part of the forum about six months ago and your comments appear to be under-appreciated.
    Could you also clarify the reasonable expectation for any latitude or longitude reference in terms of digits? On my "to do list" is a visit to the map department at a nearby university to view a copies of maps which have parted company with a volume of biographies of world travelers in a particular field published well over 100 years ago. I am probably the only person on the planet interested but I'd like to have some sense of what I am looking for in terms of being able to transfer relevant information to a modern map.
     
  3. Bryman

    Bryman LostCousins Megastar

    Sorry to appear so stupid. I had noticed those lists but I had not fully appreciated their significance without the appropriate symbol being displayed.
    BTW, might it be possible to allow several changes to these filters at the same time rather than just one condition at a time?

    However, a little more explanation would still be appreciated for some of these possibilities.
    For instance, what is the difference/significance between Partial Match (Google) and Partial Match (Levels) and what should the user do to correct/improve the match in each case?
    What does Outside Country Area mean? I have seen this where Google matched with a location in North Carolina instead of London but I also have another location in Transvaal, South Africa which does not match at all.
     
  4. Bryman

    Bryman LostCousins Megastar

    Where does the Locations tab collect its data from? I have not paid much attention to this in the past as all of the entries are ticked, even though some are not countries. For instance two of the top level entries are for Balliol College and Jesus College. A third instance is Woolwich Arsenal. Do I need to find where these are specified within my tree and modify to provide complete addresses for each wherever they occur?
     
  5. Bryman

    Bryman LostCousins Megastar

    Sorry if I did not make myself clear. That is what I have already done to correct some entries in my table. Users can separately find the old location on one map and then move the pin on the modern map to the same location. It is a bit of a pain and this whole discussion is about trying to avoid the user having to go to such lengths.

    The difficulty that I am experiencing is in knowing just what format the data should be presented in and how variations can be identified.
    BTW, I don't understand you comment elsewhere saying that just changing the text of part of an address will not get a better match from the next geocoding run. Please can you explain why not? Isn't this what we are trying to achieve?
     
  6. Alexander Bisset

    Alexander Bisset Administrator Staff Member


    The simple yet complicated answer is "it varies"!!

    A degree of latitude is roughly 110 Km, whereas a degree of Longitude varies according to how far north or south of the equator you are. At the equator 1 degree is roughly 110km at the pole 1 degree is zero km. At UK levels 1 degree is roughly 48km. So if you are getting lat/longs accurate to 1 decimal place (dp) you are out by up to 11 km either way and so on down.
     
    • Thanks! Thanks! x 1
  7. Alexander Bisset

    Alexander Bisset Administrator Staff Member


    Displaying the symbol on the lists is a REALLY good idea I'll add that to the todo list.

    A partial match Google is when Google didn't get a match using the whole address but got a match using parts of the address (in what ever fashion Google thinks was best). A partial match levels is when Google gave no results for the address but the program tried again dropping the last part of the address and it got a match with the lesser detail. This could be as simple as dropping a house name from an address eg: England, County, Town, Street, My house = no match but England, County, Town, Street worked. That's a match at a lower level. Often these are pretty good matches but sometimes the lower level might be England which is a pretty poor match.

    To correct the match whilst on the Locations Geocode list simply right click and select from the list of options. For instance if the match text looks right but is subtly different you can mark it as verified. eg: Google says "London borough of Islington" and you have just "Islington" you can verify that as good enough. If it needs a tweak you can choose edit location and move the pin manually.

    Outside country area means Google found a match but the coordinates seem odd as they are outside the bounding boxes I drew around the various countries. So for instance it matches a location in a completely different country as per your example of North Carolina.
     
  8. Alexander Bisset

    Alexander Bisset Administrator Staff Member


    The locations tab takes its data from the locations in the facts in your file.

    If you have entries that aren't real countries then they won't be bold. A tick here indicates their google matched status. To tidy up your data I would recommend double clicking the location on the locations tab to see who is at that location then in your family tree program modifying the address to be more complete. So for instance "Jesus College" on its own is rather vague as it could be either Jesus College, Oxford, England or Jesus College, Cambridge, England.

    The idea is to encourage you to structure your location data. As the new places map which is in development and should be out in the next week or so will show up why. This will allow you to select a place eg: England and see everyone at that place or zoom in and select England, Essex and see everyone at in Essex. The idea being you could see where people might be living close together who aren't related but are both in your tree, ie: giving you a geographical analysis view of your tree. However this won't work if your place names are somewhat vague.

    You aren't wrong by having vague placenames. It's just that the program can give greater benefits if you eradicate the vagueness :)
     
  9. Bryman

    Bryman LostCousins Megastar

    Does this mean that an entry of just Balliol College would be treated as a country even though Google finds a match?
    How many levels are allowed/required in the specification of an address? What happens if more are provided?

    I have one location described in several ways . . .
    Scotland, Aberdeenshire, Aberdeen - this gets matched successfully, but is not very precise.

    Scotland, Aberdeenshire, Aberdeen, Kinellar - this gets partially matched (with ! symbol).
    Scotland, Aberdeenshire, Aberdeen, Kinellar, housename - this gets partially matched (with tick and ? symbol).
    The lat/long values are different for each case.

    For another location . . .
    New Zealand, Christchurch - this gets matched successfully, but is not very precise.
    New Zealand, Christchurch, Rolleston - this gets partially matched (with tick and ? symbol).
    New Zealand, Christchurch, Rolleston, no/street - this gets matched successfully.
    Once again, the lat/long values are different for each case.

    Very confusing.

    BTW, what is a viewport?
    Also, in Locations Geocoding Status Report, what is the difference (in Google Result Type) between Route and Street_address?
     
  10. Alexander Bisset

    Alexander Bisset Administrator Staff Member


    If its just text then its the randomness of a Google search again. It is dramatically better if the customised database of UK parishes and towns are specifically geocoded to be correct latitudes and longitudes of the historic placenames. Then there is nothing at all to search or correct its just right. These matches could then have a new status of Historic Database matched (or similar wording). ie: the user knows that the work of finding the co-ordinates has already been done.

    I'd implement this as a "import latest historic locations file" menu option that would simply merge the historic database file into the users personal database updating the data to this status (with a similar big green tick or other this is good data icon).
     
  11. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Yes it gets treated as a country. You can have 5 levels any more get treated as extra text in the 5th level.

    When you have a location such as Scotland, Aberdeenshire, Aberdeen, Kinellar, housename, then the program creates locations at each level in the file so that when its plotting locations it can "drop down" to a lower level if the higher level isn't geocoded. How well Google manages to find address is the fairly random element. You really need to double click on a location to view the precise pin location to see how close it is. If you like the location save it (or right click on the list and select verify), the status will then be updated from a ! or a tick ? to a solid green tick.

    The reason for the different lat/longs is that it's the centre point of the location. So the centre of Scotland, Aberdeenshire, Aberdeen is different from the centre of the village of "Scotland, Aberdeenshire, Aberdeen, Kinellar" and the housename is unlikely to be in the exact centre of Kinellar so the coordinates of that point will be different again. They would only be the same if the house was at the exact centre of the village and the village was in the exact centre of Aberdeenshire.

    The viewport is boundary of the viewing area. This is vital. Consider a farmhouse in the middle of the outback in the dead centre of Australia. If the viewport is such that you are zoomed right in on that farmhouse you might assume that your point of interest is that farmhouse. If you are zoomed out a lot to see the whole of Australia you can safely assume that the point refers to Australia and not to the specific farmhouse it happens to be on top of.

    Similarly a zoom level for the whole of London with a point slap in the middle of London you can safely assume the point refers to London and not to the street or house the point just happens to be on top of. So the viewport which says what the bounds of what you are viewing is provides vital context to the pin on the map.

    Google allocates various types of results to the data that is returned there is a partial explanation here.

    PS. 80% of my locations are Aberdeenshire with lots in Kinellar.
     
  12. Bryman

    Bryman LostCousins Megastar

    Thanks for your help Alexander. I can now see where the vagueness is coming from.
    My FT program has separate fields for Study Place and Institution. The Gedcom that gets generated then contains . . .
    2 PLAC Jesus College
    2 ADDR Oxford University
    Do I need additional address information in the Jesus College field as the Oxford University detail does not seem to be picked up by FTA?

    Fortunately, Google seems to have defaulted to matching with the Oxford location.
     
  13. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    The ADDR field in a GEDCOM is usually used for sources location ADDResses rather than fact PLACes. I'm not sure if I should be looking at ADDR fields or not, at present they are largely ignored unless the PLACe field is empty.

    The normal form of address I'd suggest would be Jesus College, Oxford, England. Although you could also put Jesus College, Turl Street, Oxford, England or if you want to really go to town Jesus College, Turl Street, Oxford, Oxfordshire, England
     
  14. Bryman

    Bryman LostCousins Megastar

    I have updated the PLAC information and Balliol College now looks fine. For some reason, Google has located Jesus College but then omitted the college from the location and just recorded it as Turl Street, Oxford with a Google Result Type of Route. Very strange.

    I have found a few entries for locations which had spelling mistakes and seem to get retained when the source record is updated and re-geocoded. How can I delete such entries from the table of coded locations or do I have to delete the whole table and recreate?
     
  15. Tim

    Tim Megastar and Moderator Staff Member

    Well I can now understand how your mind is working. Speaking for myself, Yes I was (and still am hopefully) under the belief that you could replace a word in the address string, with the modern equivalent.

    My thought process was along the lines of:
    If I look at the locations tab, at the places level, I can see data in Country, Region, Sub Region, Address and Place.
    If a word in Country, Region, Sub Region has a replacement word in the "new" table, then use the replacement word before passing to Google.
    I guess in reality that this means you need a 2nd locations table (probably not visible to us users?)? I.e. The table that we see today which has our gedcom addresses in, a hidden table that either has our addresses and/or the updated/replaced address segment, and it's this 2nd table addresses which are then passed to Google.


    Your proposal on how to build a table is very creative, and should work very well. How do you propose to use this new data? Is the user going to select one from the list or will you get FTA to do it?

    What is hard for us users to understand, is that quite often we only have to replace one word in the address string and then Google finds the correct place, e.g. replacing Cumberland with Cumbria.

    Your approach works fine if the address is only England, Lancashire, Ulverston, but what if the address is England, Lancashire, Ulverston, The Gill, 59 Back Sun Street? How do you plan to resolve that?
     
  16. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Tim, for that sort of thing you could use the FactLocationFixes.xml file that already exists and is user editable in the resources directory. It is what does a string replacement on load. Doing a further string replacement for Google seems like overkill? If you have a look at FactLocationFixes.xml you will see I've effectively already done what you suggest for Scotland as I know that country. It would be trivial to add that for England it just needs the entries.

    The point of it being an XML file rather than a database is that it is very flexible, user editable and doesn't require anything new in the program to work.

    I take it you are only thinking this would be needed at the county level?

    For information the format of "region typos" is:

    Code:
          <RegionTypo from="Peeblesshire"  to="Peebles" />
          <RegionTypo from="Forfarshire"    to="Angus" />
          <RegionTypo from="Argyllshire"    to="Argyll" />
          <RegionTypo from="Buteshire"      to="Bute" />
    ...
    
    That little lot gets rid of the horrid Anglicisation of Scottish counties with the wanton adding of -shire to everything. Note the idea was to standardise on old regions but I suppose it makes sense given the new mapping to standardise on modern regions. I've got some data in for England already eg:
    Code:
          <RegionTypo from="Beds"          to="Bedfordshire" />
          <RegionTypo from="Berks"        to="Berkshire" />
          <RegionTypo from="Bucks"        to="Buckinghamshire" />
          <RegionTypo from="Cambs"        to="Cambridgeshire" />
    ...
    
     
  17. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Hmmm looking at it a region typo won't work as its a fix to force locations into what is required for census lookups on Ancestry and FMP.

    However it does suggest that if its a small number of translations ie: replacing a county name is the only fix then using the file and adding a GoogleRegionFix entry would work. So you would have ...

    Code:
    <GoogleRegionFix from="Cumberland" to="Cumbria" />
    ...
    The Google geocode could then apply a "Region fix" prior to searching which would search for Cumberland in a region and replace with Cumbria before asking Google to geocode.
     
  18. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    V3.2.1.0-beta-test3 now supports FactLocationFixes of the form...
    Code:
      <GoogleGeocodes>
        <CountryFixes>
         
        </CountryFixes>
        <RegionFixes>
          <!-- England -->
          <RegionFix from="Cumberland" to ="Cumbria" />
        </RegionFixes>
        <SubRegionFixes>
         
        </SubRegionFixes>
      </GoogleGeocodes>
    
     
  19. Alexander Bisset

    Alexander Bisset Administrator Staff Member


    I note there is a download of historic maps and boundaries at Vision of Britain assuming they are geo-referenced then they could be loaded into the current version of FTAnalyzer. The downside is that they appear to be only available to government or academic institutions :(
     
  20. Tim

    Tim Megastar and Moderator Staff Member

    I can see it in beta test3 now, thank you.

    Can I suggest adding some comments into the code so that users can read what this does, how to add a new line etc?
    What happens when you release a new version of FTA, how will you pick up the changes from the users old version of .xml?
     

Share This Page