1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Only registered members can see all the forums - if you've received an invitation to join (it'll be on your My Summary page) please register NOW!

  3. If you're looking for the LostCousins site please click the logo in the top left corner - these forums are for existing LostCousins members only.
  4. This is the LostCousins Forum. If you were looking for the LostCousins website simply click the logo at the top left.
  5. It's easier than ever before to check your entries from the 1881 Census - more details here

FTA exporting Lost Cousins data

Discussion in 'Family Tree Analyzer' started by peter, Apr 22, 2014.

  1. peter

    peter Administrator Staff Member

    I don't suppose that FTA would be capable of adding information, such as LostCousins facts, to a Gedcom file?
     
  2. Tim

    Tim Megastar and Moderator Staff Member

    I did try this approach with another member a few months back. They provided me an extract from LC and their gedcom, and I was able to match about 60%.
    The harder part came when I had to put the matched data into a format that I could then load into the gedcom.

    However, things have moved on now, if Alexander could produce (or modify an existing) report that had the census refs in the same format as LC, then matching would become a lot easier.
     
  3. Bryman

    Bryman LostCousins Megastar

    I must have been one of the weird ones because I tried and was only partially successful. I highlighted the names of individuals on my GenoPro charts for whom I had entered census information into LC but could not easily identify which censuses had been entered.
    My charts are quite colourful as I also indicate Blood Relatives, Cousins (with zero removes), Direct Ancestors.
     
  4. Bryman

    Bryman LostCousins Megastar

    I have sometimes wondered about this but isn't FTA better as a reporting tool rather than data creation? Creating reports in a particular format is one thing but actually inserting records into a gedcom file could open up a whole bag of worms, as touched on by Tim.

    I am currently having a lot of difficulty identifying the census references in my Referrals Report and think that I have just narrowed the problem down to the way that GenoPro saves and matches source data. Every FHS does non-standard things slightly differently and Alexander might have to give up his day job to manage the extra workload.
     
    • Agree Agree x 1
  5. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    The problem is Peter that the GEDCOM file is only an export from the users tree in whatever family history program they are using. Now it may be possible if there could be an export from the website with the sufficient information to uniquely identify the person in the users tree. However it would need the user to export the data from the website run some sort of matching routine in FTAnalyzer and then for FTAnalyzer to export the data back to the users family history program. However and it's a very big however getting the data into the users family history program is not at all an easy task there are a massive number of ways that the different programs Import data and merge it into someone's tree. Supporting those options is a nightmare.

    However worse than that is the issue of any form of automated updating of a persons tree. I have always taken the attitude that I can generate lists for people to then update their own trees but it is up to the individual to take charge of their own data. I'd be extremely reluctant for the program to generate data to be imported as then there is liability issues in something goes wrong.

    So adding data to a GEDCOM file doesn't actually help it needs to be added to the users family history program otherwise when they update their tree and export a new GEDCOM then it would lose the Lost Cousins data.

    Having the user manually add data to their tree is safer and means the user retains full control of the process. I suspect the best we can do is to create lists on the website to make it easier for users to update their tree. Having the census ref against the household as you recently said you added makes this a lot easier. Tim had done a bit of work on documenting this but I'm not sure if it needs updating in light of the recent website changes.
     
  6. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Yes I think this approach is better if we could match up a report from the website with a report from FTAnalyzer then the user could quickly see gaps.

    The big problem is that the two sources contain different information. The Lost Cousins website has no information about the individual ids the family history program used it only had census references. Similarly FTAnalyzer only has the individual ids and some of the census references. Unfortunately very few people seem to add in census references in a sufficiently consistent format to ensure they are all picked up, so it would be extra work to suggest users retro fit their files with standardised census references.

    So it's a bit of a catch 22 there is nothing specific on either the website or FTAnalyzer to definitely match up on and sadly that makes writing the reports in the exact same order a bit hit or miss. See the recent addition of the referrals report for an example of a close but not quite match.

    Where we could speed things up would be for FTAnalyzer to generate a list of census entries you'd not entered onto Lost Cousins as it does at present, but in a format that could be loaded onto the website. If we were able to get that working then it could make it much easier for users to add new relatives if all they do is :

    1) update the census in their tree
    2) load GEDCOM into FTAnalyzer
    3) export missing Lost Cousins file
    4) load missing Lost Cousins file onto Lost Cousins website.
    5) edit new entries on website to add extra info such as maiden name, have certificates etc.

    The core issue here is creating a routine that presented the info in the correct format to the website and adding the extra code to the website to allow the import.

    Note that data uploaded in this format may need an extra step for the user to confirm what was uploaded before committing the changes to the database. The data might also be worth flagging as FTAnalyzer generated so it was easy to track in the database. That might be useful as my experience is users are more likely to get things right if they have manually checked things than relying on an upload.

    That said being able to mass upload missing entries could mean that the initial complaint that users often give for not entering data is that they have already entered it. That excuse would vanish as they would have a couple of clicks means of adding large numbers of new entries to the website. Which could mean hundreds of new additions a week and many more matches.
     
  7. peter

    peter Administrator Staff Member

    I was thinking in particular of Family Historian, which I understand uses Gedcom as its native format. I would have thought that it would be possible to match well over 90% reliably (since people on censuses tend to be living with their families, and Gedcoms are family oriented).

    Of course, there will be a few that can't be matched, and for those the program could either produce an exception report for the user to handle manually, or interrogate the user in real time.

    It surely doesn't matter if everything goes pear-shaped because the user will still have their original tree? But when it works it will be a massive timesaver.
     
  8. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Sorry Peter you have misunderstood. The chance of a match has absolutely no relation to what program is being used it has everything to do with how the user has recorded census facts. The only reference data that is available on the Lost Cousins website is the census reference if that reference hasn't been entered on the users file then there is absolutely NOTHING I can do to auto match their records. Being a raw GEDCOM format won't improve the chances of a match if the user didn't record the actual census reference in the first place. A 90% success rate would require 90% of census entries to have census references.

    Family historian is no better in this regard, it does indeed store data in a raw GEDCOM format and adheres to the standard very strictly, however pushing data into the file still requires it to be correct and flawless otherwise there is a risk of damaging the users data. That plus any routine really needs to be working for as many users as possible and not just a specific program such as Family Historian.

    Also you are forgetting that most individuals in a tree are members of two families. Ie: a family where they are a child and a family where they are a parent. These are two completely different GEDCOM families and there is nothing at all on the Lost Cousins website that would say what the family reference number is from the GEDCOM. So the chance of a match is entirely dependent on the existence of a census ref in the users file.

    All that FTAnalyzer could do would be to produce a file the users family tree program would still need to do the actual import and merge and for many programs this is either impossible or a tediously manual process of verifying every record. I'm also not sure how would someone "have their original tree" they would have had to make absolutely sure they had a backup before the import and it would depend on their family history program how easy it was to import and or revert a change. You also have to consider that they might not notice a problem straight away only a few weeks later when they've already added other things. Etc.

    Personally I value my tree far far far too much to allow ANY program to auto update my tree. Since I would never ever want to use such an auto merge feature myself I'd not want to risk inflicting it on other users.

    As I say the only safe way of approaching this is to export a specially formatted file from FTAnalyzer into the website. The reason this is safer is that you and I would have 100 percent control over the process you from your end me from mine. No user intervention required. It also means that since the interface would be FTAnalyzer to Lost Cousins then you eliminate vagaries in the way family history programs work.

    The export file that FTAnalyzer created would only be valid if the user already had all the data the website needed to create a record. It would just save the user a vast amount of typing on the website with zero risk to their file and NO complications due to different tree programs or versions.
     
  9. Tim

    Tim Megastar and Moderator Staff Member

    I think there are 2 approaches that need to be looked at and both warrant some further thought, analysis and discussion.

    This first one would avoid having to manually create Lost Cousin Facts in your FHS. This enables FTA to produce more meaningful reports for you.

    Importing data from Lost Cousins into an existing gedcom file.

    I do agree with Alexander that his could be quite dangerous and it's not something that FTA should just do. BUT if there was a process that people could follow that allowed them to merge pre-prepared data from LC into a copy of their gedcom, then people can happily open this new gedcom (it would have a different name) and verify they were happy with it before then using it as their new master file.​
    These are the steps I followed when doing my test.​

    As has been mentioned, the gedcom requires data to be centred around an individuals ID number. LC however does not have ID's.​
    So you have to match a person from a household and census year from LC with the corresponding individuals ID in the gedcom. This is now a whole lot easier as FTA now produces a report (Lost Cousins Referral) that includes Census refs and ID's.​
    Once you have a name and a census year, you need to create an entry that can be loaded into the gedcom. This needs to be in this format​
    Code:
    0 @I2377@ INDI
    1 EVEN
    2 TYPE Lost Cousins
    2 DATE 1841
    We need to establish the location where this can be inserted into the gedcom.​



    Importing data from FTA into Lost Cousins.

    As Alexander has already mentioned, this would need to be in a strict controlled format and could load records directly into the database.​

     
  10. peter

    peter Administrator Staff Member

    I may be missing something here, but surely most of the time based on forename/surname/age there will only be one person in a tree who is a close match for a census entry - excluding ONS, of course. When you add in Ahnentafel numbers, maiden names, middle names, the relationship shown, and the fact that most people will be with other family members I'd have thought that the chance of matching them up is actually pretty high?

    I just sorted my My Ancestors page by name and looked down the list - at a quick glance I couldn't see any instances of two people with the same name and a similar birth year (except where it was the same person on two different censuses). Of course, I'm not under the impression that it would be easy to match in this way - but if anyone's capable of doing it, surely Alexander is?

    Whilst it would be great if LostCousins facts (and perhaps census facts and census references) could be reverse-fitted into a Gedcom file I do accept that there are risks - and there are surely limitations too for those programs which don't use Gedcom as their native format and may therefore only import/export a subset of their data in Gedcom format.

    Another approach would surely be for FTA to accept both sets of data - Gedcom and LostCousins - but rather than trying to create a new Gedcom file incorporating both sets of data, instead produce reports based on information from both. Perhaps for someone whose main objective is to identify relatives they haven't entered, this might be an acceptable approach given the amount of effort it would save them?
     
  11. Tim

    Tim Megastar and Moderator Staff Member

    I thought the same thing till I tried it. Trouble is that on LC we enter the data as it's been transcribed on the 1881 census. In family trees, people correct the DOB, the forenames, middle names and surnames. Hence I said that I was successful for 60% and the others I'd have to manually align/check to make sure they were the same people. Having the census refs now in both files should make this so much easier.
     
  12. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    You are missing the fact Peter that lots of people have one name studies for instance I have 287 William Bissets in my tree 8 of them are the same age on the 1881 census. Matching them automatically is not possible. Matching them against others in the family gets more complicated and time consuming.

    However the major problem here is the risk of a false positive it corruption of user data. That is far far too high a risk to contemplate.

    You also identify the other issue which is that it is entirely down to the users program how easy or not it is to import and merge data from an external source. A lot of programs simply don't have a load external data and merge function.

    Loading data from Lost Cousins may be possible but it still has the issue that it is not going to be a reliable match. Hence I've taken the approach of trying to make reports that as closely as possible allow the user to compare side by side the two reports one from FTAnalyzer and one from the website. If the look and feel of the reports is as close as possible it should allow the user to visually check the two which is far easier than the computer trying to work out matches. Humans are dramatically better at pattern matching than computers.

    The point then becomes how does the computer then remember a match well that's precisely the reason the extra Lost Cousins tag in the GEDCOM was born. With the user manually adding the tag they signal they have entered that person on the website. This means the user is in full control of their file and isn't relying on a third party program making changes to their data. I think most people would be more comfortable being in full control of their data. Especially since we cannot guarantee their program will even be able to import an automated file.

    Auto matching could be done with some errors and never as good as a user could do themselves, and I question the value of trying to match when there is no fixed data to match on as the false positive is too high and users just think a program is broken if it gets things wrong.
     
  13. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Ah and of course Tims point is extremely valid too most data in people's trees is cleaned up and not as Lost Cousins requires ie: the transcription warts and all.

    So your asking the computer to match warts and all data with cleaned data (changed names, dates, ages etc) and no references to go on a very high failure rate is likely which just means people thing the program isn't working correctly. They don't get its a near impossible task.
     
  14. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    Three posts in a row sorry but... The one thing that would dramatically improve the matching would be if users had the opportunity to add an optional reference number to the individual they enter on Lost Cousins. This would then be a 100% match guaranteed.
     
  15. peter

    peter Administrator Staff Member

    In my experience they also tend to correct the names on their My Ancestors page, so that's not going to be a problem for many users.
     
  16. peter

    peter Administrator Staff Member

    There are only 100 ONS that have been officially implemented at LostCousins (there may be a few others who didn't notify me first) - and many of them have used a separate account for their ONS.

    So it's not something that's going to cause problems for the vast majority of LostCousins members.
     
  17. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    People will think the program isn't working properly if it gets matches wrong which it undoubtably will. Sorry but without a reference field matching is guaranteed to be imprecise which only lowers users feeling towards the product. It's a lose lose situation. A lot of time and effort towards a goal which turns users off the product.

    A matching routine without a reference field is a monumentally difficult task as there are dozens of variables to consider. I've just gone through this with v3.6 and implementing the duplicates report. You aren't even guaranteed that the data in the online tree will even exist in the GEDCOM eg:user has entered servant details but not entered them in their tree.

    Regardless of instructions to users they do odd things. The program would have to consider all these possibilities. It's an enormous task for not even zero gain its an enormous task that risks alienating users.

    So I went with reports from both sides that are as close as possible so the users can match their data and put in their own effort as the most effective approach.
     
  18. peter

    peter Administrator Staff Member

    Unfortunately it's a lot of extra work for users, which could discourage them from using LostCousins at all. This is why I asked whether it would be possible to add those tags automatically to Gedcom files.

    However, since it seems unlikely this would be possible, the next option is automated matching. If it is good enough and fast enough it won't matter if it has to repeated; but if a lot of manual intervention is required then it would make sense to write a file that records the manual matches (so that they don't have to repeated next time).

    An alternative would be for them to correct the names and birthdates - this should bring the matching close to 100%, and it's something that many people have already done.
     
  19. peter

    peter Administrator Staff Member

    But that would be obvious from the relationship shown.
     
  20. Alexander Bisset

    Alexander Bisset Administrator Staff Member

    NO, NO, NO, NO!!!!!!

    Sorry but what part of "the match needs to be on a reference number" are you just not getting??? ANY system that relies on fuzzy matching of names or dates is NOT going to give reliable matches. I though you'd written code before? There is a massive gulf between comparing two numbers and having a binary yes/no decision as to whether it's a match or not and trying to match names and open ended dates that may or may not be modified and may or may not exist.

    I'm not sure if your are trying to be deliberately difficult or not it certainly feels that way. I'm going out of my way to write a program for FREE that makes your website easier to use and all I get is obstacles and mis-direction it's as if you fundamentally don't understand the basics of how the program works.

    I WANT Lost Cousins to succeed I've written a program that makes it unbelievably easy for users to find and add data and all you talk about is introducing features that would undermine the program. Why don't you try actually using it and see how easy it is rather than criticising it for something that would be easier addressed from the Lost Cousins site side.

    Sorry Peter but sometimes your approach really antagonises people.
     

Share This Page