PHP or JavaScript - interpreting a chat

Associate
Joined
30 Dec 2005
Posts
415
Evening all!

I've hit a brick wall with this problem... can't understand why it's so hard either! I'm hoping that some of you guys will have some ideas...

I'm trying to write a script in either JavaScript or PHP (doesn't matter which) that will allow you to copy and paste a conversation either from an email or an instant messaging program such as msn or adium. The idea is the script would read the conversation and split it up into it's messages, as well as extract the sender name and the time...

So for example if I had this conversation:
gavin holt
20:50
this is gavin's 1st message

rich
20:50
this is rich's 1st message

gavin holt
20:54
this is gavin's 2nd message

54:44
this is gavin's 3rd message

54:47
this is gavin's 4th message

rich
20:55
this is rich's 2nd message

56:46
this is rich's 3rd message

gavin holt
20:57
this is gavin's 5th message

It would be interpreted by the script to this..
Gavin - 20:50 - this is gavin's 1st message
Rich - 20:50 - this is rich's 1st message
Gavin - 20:54 - this is gavin's 2nd message
Gavin - 20:54 - this is gavin's 3rd message
Gavin - 20:54 - this is gavin's 4th message
Rich - 20:55 - this is rich's 2nd message
Rich - 20:56 - this is rich's 3rd message
Gavin - 20:57 - this is gavin's 5th message

Now obviously emails and IM conversations aren't in the same format, so it'd have to be able to cope with a variable input.. so you could specify that the start of each message is in the following format:
rich
%TIME%
...

or
Richard @ %DATETIME%
...

or
On %DATETIME% Richard wrote:
...

If anyone has any ideas about how to go about this or has seen this done before I'd really appreciate the input.

Thanks in advance!
 
Regular expressions are your friend! However, you say you would like to interpret chat and e-mail. With regular expressions (unless you want to make it fairly complex....) you would need to ensure that the format of the chat would stay the same. It would help if you posted real examples of the text you wanted to process.
 
Well that's the thing.. the chat could come from any email client or any instant message program, so the format could change considerably. What it needs is 2 input boxes that let you specify the format of each message (one input box for each user)..

for eg
box1 said:
On %DATETIME% Richard wrote:
...

box2 said:
On %DATETIME% Gavin wrote:
...

If it has that capability then it could work with any conversation..
 
Having a quick think about this, my initial path would be to split up the messages into a header and data, the header being the "On xxx Gavin wrote:" bit.

Then if you want variable format you're going to want to think of as many potential variables as possible and create a list in your code, you've got one there as '%DATETIME%', you'd also want '%NAME%' and maybe others (if you want email then you're gonna want 'to' and 'cc' I'd imagine.

Then if you have (as you suggested) an input box that reads something like:
"On %DATETIME% %NAME% wrote:"

You should be able to split that into the 4 parts, then match each part to your list of variables to create some store of what parts of the header you actually want to store. Then when a message is pasted in you know that from the 'header' (which you should be able to identify using some logic based on the variables or all 4 parts of the example header) you want words 2 and 3 (and what they 'mean') and then anything between that header and the next header can be stored as the data...

Hopefully that makes sense, and is actually sensible, but that's where I'd start at least, probably throw it away as useless shortly afterwards but hey :p
 
Great concept.. thanks for your ideas!

Instead of "On %DATETIME% %NAME% wrote:" i'm thinking it could run off "On %DATETIME% Richard wrote:", as the word Richard won't change (just like On and wrote).

So what it needs to do is convert the above filter into a regular expression, and then also create one for the other user in the conversation.

At that point I'll have 2 regular expressions for identifying the headers... now it's just figuring out the process for running these against the content to extract the messages!
 
Only issue with hardcoding the names like that is what if the conversation is amongst 3, or more, people, with %NAME% or similar at least it would work for, depending on the code behind it, for any number of people in a conversation. But whether you need that or not...

Once you've got the regular expressions, or even (and what i'd do due to limited re knowledge) just have code parsing a string, it's fairly simple to run that against the content. In both php and Javascript you should get a block of text that you can read line by line, so read in a line, compare with re/method.

If the line is a header (and we've only just started parsing) then any following lines would be the message, until we reach another header (at which point the last message is complete and can be 'saved' in whatever manner you choose) or the end of the text and again the last message is complete etc...
 
Back
Top Bottom