Log Analysis for Digital Societies (LADS)
Frequently Asked Questions
FAQ
- In simple search, do boolean operators (and, or , not) have an effect on the search? Or is that just an option for advanced search (from the drop down choice menus)?
- Does placing quotations around search words perform an exact phrase search (providing emphasis on strict order of words and spaces between them)?
- Following the previous question, the query column in the database has queries with quotations, others without, others with several brackets around them, what is the significance of that? The following line is a strange example of a query in the database: ((((((((("auditing ")))))))))
- Regarding the column of "nrrecords", does it represent the total number of results retrieved for a query across all collections, or does it only count the number of records in the selected collection?
- Is it at all possible to get the results that the users viewed in order to correlate those with queries (not talking about objurl which seems to be either external links or broken ones)?
- There are many other actions in the database that were not described in the pdf description file you provided on the website (e.g. service_... options_... ..etc) Do you have any description for them? Most important, what is the "search" action and when is it logged (not search_sim, or any of the others described in the file, just "search")?
- Some language acronyms are unidentifiable, such as: tag, dee, ge. Do you have a list of which languages these represent? Moreover, the Czech Language is stored in 3 different acronyms: cs, cz, cze. Is there a difference?
- Is it a must that we process the complete logs for any statistical analysis? Are we allowed to perform some data cleaning, for example for null/empty actions, null/empty/unidentified languages, null session ids, malformed/mal-coded queries?
Please click on the questions to get the answers.
Answers
-
In simple search, do boolean operators (and, or , not) have an effect on the search?
Or is that just an option for advanced search (from the drop down choice menus)?
No, they are considered as words, not as boolean operators. You have to use the advanced search if you want to use AND, OR and NOT. -
Does placing quotations around search words perform an exact
phrase search (providing emphasis on strict order of words and
spaces between them)?
No. -
Following the previous question, the query column in the database
has queries with quotations, others without, others with several
brackets around them, what is the significance of that? The
following line is a strange example of a query in the database:
((((((((("auditing ")))))))))
These are errors in the translation of the string into a query common language. We have left the data as they were recorded in the logs and did not do any cleaning. -
Regarding the column of "nrrecords", does it represent the total
number of results retrieved for a query across all collections, or
does it only count the number of records in the selected collection?
"nrRecords: the number of records retrieved for the collection involved by the action of the user." (I cited the document which describes the logs) -
Is it at all possible to get the results that the users viewed in
order to correlate those with queries (not talking about objurl
which seems to be either external links or broken ones)?
If you mean the documents, no. Unfortunately in the logs only a fake identifier was recorded. It is not possible to know what was the catalogue clicked. -
There are many other actions in the database that were not
described in the pdf description file you provided on the website
(e.g. service_... options_... ..etc) Do you have any description for
them? Most important, what is the "search" action and when is it
logged (not search_sim, or any of the others described in the file,
just "search")?
You're right, I'm updating the document. In the meanwhile, here it is the remaining list of actions:
option_save_session_favorite Session favorite saved
option_send_mail Record sent by email
options_save_reference Record saved for reference manager use
service_denmark full record service link used
service_hungary full record service link used
service_netherlands full record service link used
service_uk full record service link used
service_all full record service link used
show_help_helpfilename "help" link clicked -
Some language acronyms are unidentifiable, such as: tag, dee, ge.
Do you have a list of which languages these represent? Moreover, the
Czech Language is stored in 3 different acronyms: cs, cz, cze. Is
there a difference?
Language has been recorded using the ISO 639 language code. Any code which is not in this table was an error in the logging system. -
Is it a must that we process the complete logs for any
statistical analysis? Are we allowed to perform some data cleaning,
for example for null/empty actions, null/empty/unidentified
languages, null session ids, malformed/mal-coded queries?
Yes, and yes.