Log Analysis for Digital Societies (LADS)

Frequently Asked Questions

FAQ

  1. In simple search, do boolean operators (and, or , not) have an effect on the search? Or is that just an option for advanced search (from the drop down choice menus)?
  2. Does placing quotations around search words perform an exact phrase search (providing emphasis on strict order of words and spaces between them)?
  3. Following the previous question, the query column in the database has queries with quotations, others without, others with several brackets around them, what is the significance of that? The following line is a strange example of a query in the database: ((((((((("auditing ")))))))))
  4. Regarding the column of "nrrecords", does it represent the total number of results retrieved for a query across all collections, or does it only count the number of records in the selected collection?
  5. Is it at all possible to get the results that the users viewed in order to correlate those with queries (not talking about objurl which seems to be either external links or broken ones)?
  6. There are many other actions in the database that were not described in the pdf description file you provided on the website (e.g. service_... options_... ..etc) Do you have any description for them? Most important, what is the "search" action and when is it logged (not search_sim, or any of the others described in the file, just "search")?
  7. Some language acronyms are unidentifiable, such as: tag, dee, ge. Do you have a list of which languages these represent? Moreover, the Czech Language is stored in 3 different acronyms: cs, cz, cze. Is there a difference?
  8. Is it a must that we process the complete logs for any statistical analysis? Are we allowed to perform some data cleaning, for example for null/empty actions, null/empty/unidentified languages, null session ids, malformed/mal-coded queries?

Please click on the questions to get the answers.



Answers

  1. In simple search, do boolean operators (and, or , not) have an effect on the search? Or is that just an option for advanced search (from the drop down choice menus)?

    No, they are considered as words, not as boolean operators. You have to use the advanced search if you want to use AND, OR and NOT.
  2. Does placing quotations around search words perform an exact phrase search (providing emphasis on strict order of words and spaces between them)?

    No.
  3. Following the previous question, the query column in the database has queries with quotations, others without, others with several brackets around them, what is the significance of that? The following line is a strange example of a query in the database: ((((((((("auditing ")))))))))

    These are errors in the translation of the string into a query common language. We have left the data as they were recorded in the logs and did not do any cleaning.
  4. Regarding the column of "nrrecords", does it represent the total number of results retrieved for a query across all collections, or does it only count the number of records in the selected collection?

    "nrRecords: the number of records retrieved for the collection involved by the action of the user." (I cited the document which describes the logs)
  5. Is it at all possible to get the results that the users viewed in order to correlate those with queries (not talking about objurl which seems to be either external links or broken ones)?

    If you mean the documents, no. Unfortunately in the logs only a fake identifier was recorded. It is not possible to know what was the catalogue clicked.
  6. There are many other actions in the database that were not described in the pdf description file you provided on the website (e.g. service_... options_... ..etc) Do you have any description for them? Most important, what is the "search" action and when is it logged (not search_sim, or any of the others described in the file, just "search")?

    You're right, I'm updating the document. In the meanwhile, here it is the remaining list of actions:

    option_save_session_favorite Session favorite saved
    option_send_mail Record sent by email
    options_save_reference Record saved for reference manager use
    service_denmark full record service link used
    service_hungary full record service link used
    service_netherlands full record service link used
    service_uk full record service link used
    service_all full record service link used
    show_help_helpfilename "help" link clicked
  7. Some language acronyms are unidentifiable, such as: tag, dee, ge. Do you have a list of which languages these represent? Moreover, the Czech Language is stored in 3 different acronyms: cs, cz, cze. Is there a difference?

    Language has been recorded using the ISO 639 language code. Any code which is not in this table was an error in the logging system.
  8. Is it a must that we process the complete logs for any statistical analysis? Are we allowed to perform some data cleaning, for example for null/empty actions, null/empty/unidentified languages, null session ids, malformed/mal-coded queries?

    Yes, and yes.