1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Only registered members can see all the forums - if you've received an invitation to join (it'll be on your My Summary page) please register NOW!

  3. If you're looking for the LostCousins site please click the logo in the top left corner - these forums are for existing LostCousins members only.
  4. This is the LostCousins Forum. If you were looking for the LostCousins website simply click the logo at the top left.
  5. It's easier than ever before to check your entries from the 1881 Census - more details here

Geocoding old placenames

Discussion in 'Family Tree Analyzer' started by Bryman, Nov 25, 2013.

  1. Bryman

    Bryman LostCousins Megastar

    I have many locations for which the address is no longer the same as when the event was recorded. Google is now only able to achieve a partial match, and often makes the wrong guess. For instance, some churches no longer exist and county/city boundaries have changed.
    Is there any way that geocoding can be performed with data relevant to the period of the event rather than the modern day?
     
  2. Alexander Bisset

    Alexander Bisset Administrator Staff Member


    It isn't possible for Google's Geocoding to do that as they'd have no commercial reason to set that up. However there are free resources that could be setup on a server to allow old location searches to be done. It would require a server to host them though.

    The problem is that very specific addresses need someone to physically say where that address is. Version 3.2.0.0 of FT Analyzer does allow you to edit locations you do this in the Geocode Locations report (from the maps menu on the main form) by right clicking on the address and selecting edit location. You can then move the pin around on the map by right clicking on the blue pin and moving it to the right place then saving that new location.

    The biggest issue for getting old locations is that there is no one source of exactly what latitude/longitude these places were. Thus there is nothing that can convert the names into locations. However by creating a list of locations that users have already found and saving that to a central server then it could build up over time so that historic locations were identified. However there are all sorts of issues with spelling of places etc that would need to be exact in order to match. It's not an easy problem.
     
  3. Tim

    Tim Megastar and Moderator Staff Member

    Having it on a server seems like a good sound idea, being on a server could also be what stops it going live.
    Now I think a small table of county x is now county y would not be a very big file.

    Or, even better, what about making it a user generated table? You could prefill the table with some obvious ones, and then let the users update/maintain the table?

    And then everywhere you see county x, you send county y to the geocode process?
     
  4. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    It needs to be some form of crowd sourcing data yes, where multiple people can update. However you need to be sure the data is valid with users able to update/maintain how do you prevent the well meaning but accident prone individual from corrupting the data with bad results.

    It's a double edged sword. If you don't allow lots of updates then it becomes a nightmare to maintain. If you do then the database could get out of hand with people making inconsistent updates. The best solution is a full crowd sourced solution where multiple updates are scored for quality and only those that pass a threshold are presented to end users as accurate. Low scoring ie: possibly less accurate ones are presented as possible but need confirmation. Voting to confirm the location improves it's score. You then have a natural means of filtering the data.

    The problem is that needs scale ie: hundreds/thousands of active users scoring the locations (think TripAdvisor and hotels) and there just isn't the scale at present to make a crowd sourced solution possible. Thus we are left with the issue of how do you determine how accurate a manually edited location is without then restricting the updating to a moderator type solution where trusted individuals review any proposed changes to the database, thus maintaining its accuracy.
     
  5. Tim

    Tim Megastar and Moderator Staff Member

    Yes, I agree with all your comments, which is why I believe a small preseeded table of aliases/replacements that comes with FTA, and from then on is managed by the user solely for themselves is the best approach. I can update one location in the table and fix hundreds of errors where google doesn't understand where Cumberland is.
     
  6. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    I'm thinking of possible ways of users "submitting" their corrections. Then averaging those to come up with a common list. However I'm not sure how best to achieve that at present.
     
  7. Tim

    Tim Megastar and Moderator Staff Member

    That just seems like a lot of effort and opening up a can of worms, due to many of the reasons you've already mentioned.

    Everybody's tree is virtually unique to themselves, and I'd only be interested in the name changes that affect my tree.
    A small user managed table, preseeded if you like, seems the quickest and simplest fix. A potentially huge server file that we all update somehow, sounds like it will put a big overhead on the time to do the process.

    In my simple mind, the process would be like this:
    • I'm editing the locations on the maps, I discover the reason why Google has placed my address in Lancashire 200 miles away is because that area of Lancashire has been renamed.
    • I press a button on the menu bar that opens up the "Alias Table"
    • I type in the old area name, and in the next column type in the new area name (new area name could be blank)
    • Save
    • Rerun geocoding on all part matches
    This might mean that you need to find all locations with the old name in?
     
  8. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Yes but to generate the list of aliases I was thinking that we could have a process by which the user clicks a button to "submit" their found locations. This would likely only be worth doing down to the 3rd level ie: that of parish/town eg: England, Lancashire, Preston any more detail beyond that is likely to be too specific and only match one or two people whereas a town/parish level should help a lot.

    The user would have no other interaction other than clicking a submit button.

    The database could then be available though a second option that would kick in searching when the Google search fails, or through a specific menu item to search user contributed locations. It would need some sort of algorithm to average the user contributed points with some verification that all England points were inside the bounding box for England and all Lancashire points were inside a bounding box for Lancashire etc.

    It would probably require the occasional review by someone to weed out any invalid data. eg: bad spellings, inaccurate points etc.
     
  9. Tim

    Tim Megastar and Moderator Staff Member

    Hmmm

    It can work like you describe, but then I'd like to also suggest some sort of weighting or hierarchy.
    So use the ones in my table before using the the averaged weighted ones.

    I would hate to be able to resolve a known address for my tree but not be able to use it because 2 or 3 other people have resolved it slightly differently and that's the one uploaded into the table?

    Maybe have 2 step process? Ist step is the users own table. 2nd step is to use the public table?
    All the ones you fix from step 1 won't be touched in step 2.
     
  10. Bryman

    Bryman LostCousins Megastar

    For what it's worth, I am favouring Tim's approach. Let each person have and be responsible for their own alternative reference file to be used by FTA if the initial geocoding of a location is incomplete. That way there would be no requirement for anyone to monitor submissions centrally and each user would be responsible for appropriateness and spellings, etc. A suggested starter/seeding file could be available centrally, from which new copies would be created initially but then edited and maintained individually.

    It would not be necessary for such a file to correct all partial geocodings, just the majority, so would not need to be large. As Tim suggests, to search for Cumbria as an alternative to the original Cumberland could potentially improve the geocoding results enormously. Some variations may be more problematic and will require more detailed consideration, such as where churches/streets no longer exist and that level of the address should therefore be disregarded/blanked.

    Whether it might be necessary/advantageous to allow users to help one another with suggestions could be addressed later as experience grows. If anyone were to be unhappy/unable to maintain their own file then they would be no worse off than now.
     
  11. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Yes but how is that file created in the first place? My suggestion is a means of automatically feeding data into such a file and automatically providing the benefits to the users without the need for the end user to do anything special.
     
  12. Tim

    Tim Megastar and Moderator Staff Member

    Yes, I think we're all talking about the same thing, a starter seeded file. Well we could put a call out to forum members for known changes to counties or we could compare an 1851 county list with a current one. I'm thinking along the lines of the 80:20 rule. 80% of fixes to Google could be achieved by 20% of the corrections.
     
  13. Bryman

    Bryman LostCousins Megastar

    The Locations Geocoding Status Report shows a column for Location and another for Google Location, together with a symbol at the left indicating success or otherwise of the process for that location. Might it be possible to add another blank column (and to the file) for Alternate Reference (??) where such a column does not already exist.

    Then when Google fails to make a match (equivalent of some sort of cross symbol?) FTA could add a seeding suggestion, if known, to the Alternate Reference column and retry the geocoding of that location. If users are not satisfied with the added suggestion they should be able to edit that record in a similar way to now. That would cater for the redevelopment of locations such as replacement of a 19th century Workhouse with a modern housing estate.

    It may be possible to hold basic seeding suggestions within FTA, depending on the required format (**), but then further suggestions could be created from the existing location column with one less level of specification, eg "country, county, town" instead of "country, county, town, road". If a match is still not found and the user does nothing then the next attempt at geocoding would decrease the level of specification further to "country, county", etc. FTA might need to adjust the success indicator??

    (**) Initial seeding depends very much on the format needed but should include Cumberland -> Cumbria, parts of Middlesex/Surrey/Essex/Kent -> London or Greater London, introduction of Avon, etc. The actual requirements may be date related. I am hoping that Alexander can give more definite advice due to his involvement with the actual coding.

    The main requirement is that users should continue to enter location specifications as now, identical to that recorded on the census/BMD record. Any adjustment to fit in with modern Google matching to be provided by FTA, provided not horrendously difficult. When I first enquired, I had hoped that Google might use an external source of data for matching which might allow for an earlier version of data to be available which could be selected based on date of event.
     
  14. Tim

    Tim Megastar and Moderator Staff Member

    I had wondered about contacting Google and suggesting that they should really understand where these old places were.

    It would get people to use Google Maps more.
     
  15. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Interesting suggestion Bryman but the issue is how does FTAnalyzer get this "seeding" data for an alternate reference? There is nowhere it can look to get an alternate name, any database would need to be manually created and creating such a database even at the town level is a massive undertaking especially when user's can't spell eg: I've got a test file from a user with a location of Kirby, Lnacashire. Not sure where that is but the name of the file mentioned Ireland in the title!! ;)

    Note that the geocoding process already "drops down" a level to try it without the detail which is where you get partials at a different level.

    I think we all want some form of historic to modern gazeteer conversion process but how to create the file is the issue. There are online documentation on the subject eg: Family Search but they don't appear to have a downloadable database or an online API that the program could search.
     
  16. Tim

    Tim Megastar and Moderator Staff Member

    Is that why its a test file? ;)

    (Runs off to check for any spelling errors in his file....)
     
  17. Bryman

    Bryman LostCousins Megastar

    That is why I favour the responsibility to be with each user as much as possible. We all make mistakes but it is not fair to make one person responsible for errors that may not even be of his/her own making.

    Note: I also hate arguing with myself - I always seem to lose!

    Is it worth trying to contact the Family Search site to see if they can suggest a way that we could progress? It certainly looks to be very similar to what I was originally hoping for. Well found. I suppose that in the worst case, users could access that site manually and update the FTA Locations table accordingly where Google fails to find an adequate match.
     
  18. Tim

    Tim Megastar and Moderator Staff Member

    How does something like this look?

    Clipboard01.jpg


    Found it and corrected it :p

    Yes, would love to see an old to new lookup table.
     
  19. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    On the Locations Geocoding status report the Filters drop down menu has complete lists of the various status flags.

    Note this is what the Locations tab - countries and regions is meant to be for. Glance through the list of countries to see if they are recognised countries or not, if not the correct them if appropriate. Then do the same for regions. eg: Under England will be all the regions these should only be ones that are valid, if there are unknown ones or invalid then correct and try again. Reload the GEDCOM and rinse repeat until your countries are all recognised and the regions make sense.

    For instance in your 29th October file - granted it is out of date - you have places like "Barrow" and "Barrow in Furness" as countries, but you have the same location with a proper full location string else where. By correcting these entries your countries list reduces to meaningful list and your data quality goes up as will the matching on Google. By searching through and fixing your countries list you would have seen Lnacashire and places like Kkent which is also present :)
     
  20. Alexander Bisset

    Alexander Bisset Administrator Staff Member


    The user already has a means of editing their locations. Manually looking up an online Gazetteer isn't going to enable them to edit anything, they still have to pickup a pin and move it. To be clear TEXT is utterly useless for mapping, they must use precise (to at least 6 decimal places) lat/long values. So it is useless to know that location X used to be called Y the table needs to have the lat/long for the location.

    This is why I know that the gazetteer of old locations must be produced via an automated process as no human is going to be accurate enough to manually produce such a thing.

    It really is incredibly simple. Users submit a datafile that data gets merged into a central repository - users can request a download from the central repository. The end user does nothing fancy just click some buttons. The program creates the files in exactly the right format required. The only issue is how to maintain the quality of the data.

    Note it HAS to be a graphical editing tool as I'd challenge anyone to be productive manually editing 6-8 decimal places of lat/long figures and maintain accuracy.
     

Share This Page