Combining Database Records

Soldato
Joined
27 Dec 2005
Posts
17,316
Location
Bristol
I'll try and keep this short. I have a visitor log database of approx. 15,000 records. I've recently changed how it logs visitors from session ID to IP, as session ID was kicking up to many problems (visitors not being tracked correctly resulting in 10 different records for the same person when they visit 10 pages etc).

As a result I want to combine all the database's records based on IP, signifying the difference in days (if over 24h) of each record.

I think I've got something that *should* work (I have backed up the database :p) but I get an 500 Internal Server error when doing it through my browser, and PHPMyAdmin kicks up an error on the first line when doing a SQL Query directly through there (different syntax? no idea).

The code is as follows:

PHP:
$sql = mysql_query("SELECT * FROM sessionlog ORDER BY date"); // get all records
while($row_change = mysql_fetch_array($sql)){ // for each record
$firstid = $row_change['id']; // get fields
$ip = $row_change['ip'];
$history = $row_change['history']; // this is the browsing history
$lastdate = $row_change['date'];

$sql2 = mysql_query("SELECT * FROM sessionlog WHERE ip='$ip'"); // find all records with the same IP
while($row_change2 = mysql_fetch_array($sql2)){ // for each matching record
$curhist = $row_change2['history']; // get fields
$curid = $row_change2['id'];
$date = $row_change2['date'];

$datediff = $date - $lastdate; // subtract dates (all unix)
if($datediff > 86400){ // if result is 24 hours
$dategap = date('d-m-Y', $row_change2['date']); // convert the date to a legible format
$curhist = "$dategap -> $curhist"; // append the current history with the date of the visit
}
$history = "$history -> $curhist"; // add history of current IP record onto the existing record
$lastdate = $row_change2['date']; // reset the last visit date to current IP record's date

$query = "DELETE FROM sessionlog WHERE id=$id LIMIT 1"; // delete the current IP row
mysql_query($query);
}

$query = "UPDATE sessionlog SET history='$history' WHERE id='$firstid' LIMIT 1"; // update the history of the first record with the new history
mysql_query($query, $link);
}

I know it's a hella-lot of database querying/updating which I assume is why I'm getting a 500 through my browser. I assume it'd help to take the DELETE outside of the second while(), but I'm not experienced with arrays so not sure how to do this for all the found/matching IP records.

Any advice as to how to do this in a non-totally bodged way would be great :p.

Edit: Now get...

Fatal error: Allowed memory size of 94371840 bytes exhausted (tried to allocate 50488200 bytes) in /home/*****/*****.co.uk/*****/combineips.php on line 35
 
Last edited:
Could you post your DB structure and some sample data? (just export it from PHPMyAdmin)
 
Of course!

Code:
CREATE TABLE IF NOT EXISTS `sessionlog` (
  `id` int(7) NOT NULL auto_increment,
  `ip` varchar(15) NOT NULL,
  `history` longtext NOT NULL,
  `date` int(10) NOT NULL,
  `repeat` varchar(50) NOT NULL,
  `sample` varchar(100) NOT NULL,
  `ref` varchar(500) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=19257 ;

--
-- Dumping data for table `sessionlog`
--

INSERT INTO `sessionlog` (`id`, `ip`, `history`, `date`, `repeat`, `sample`, `ref`) VALUES
(19256, '78.86.242.58', 'home', 1267650434, '', '', ''),
(19255, '81.158.46.138', 'home', 1267649625, '', '', 'http://www.google.co.uk/search?hl=en&client=safari&rls=en&ei=UMyOS8mjCY_40wTW2dzkDA&sa=X&oi=spell&resnum=0&ct=result&cd=1&ved=0CAwQBSgA&q=award+winning+wedding+films&spell=1'),
(19254, '86.179.225.244', 'home', 1267649440, '', '', ''),
(19243, '67.195.115.232', 'dvd -> facebook -> youtube -> blog -> callback -> press -> facebook -> dvd -> facebook -> availability', 1267645343, '', '', ''),
(19253, '86.177.128.241', 'home', 1267644959, '', '', 'http://www.google.co.uk/url?sa=t&source=web&ct=res&cd=4&ved=0CCIQFDAD&url=http%3A%2F%2Fwww.idoweddingfilms.co.uk%2F&rct=j&q=wedding+videography&ei=6rmOS_-PCs-TjAe3vamiCw&usg=AFQjCNEJQLJLLwvboIJgbgHHX0zKQig0lQ'),
(19252, '94.10.21.57', 'home', 1267644598, '', '', 'http://uk.search.yahoo.com/search?rd=r1&p=+i+do+wedding&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-702'),
(19204, '66.249.68.174', 'blog -> home', 1267643701, '8401', '', ''),
(19251, '94.193.172.165', 'home', 1267642539, '', '', ''),
(19250, '90.213.45.165', 'home -> blog', 1267639019, '', '', ''),
(19249, '86.164.87.152', 'home -> equipment -> packages -> hdstills', 1267636839, '', '', ''),
(19248, '195.54.236.29', 'faq', 1267634139, '', '', 'http://www.google.co.uk/search?hl=en&client=firefox-a&hs=KhC&rls=org.mozilla:en-GB:official&channel=s&q=what+sort+of+videography+do+i+do+at+my+wedding&start=20&sa=N'),
(19247, '89.240.223.73', 'home -> packages -> equipment -> testimonials -> availability -> facebook', 1267633114, '', '', ''),
(19246, '78.145.221.129', 'home -> packages', 1267632787, '', '', 'http://www.google.co.uk/search?hl=en&source=hp&q=Wedding+Video+Bristol&btnG=Google+Search&meta=&aq=f&oq='),
(19245, '195.59.149.193', 'home', 1267632011, '', '', ''),
(19040, '84.12.97.61', 'contact -> home', 1267631654, '3947', '', 'http://www.google.co.uk/search?hl=en&q=wedding+videographers%2C+bristol&meta=&aq=f&oq='),
(19242, '62.49.5.39', 'home -> blog -> home -> youtube -> home -> about -> home -> about -> faq -> about -> hdstills -> home -> hdstills -> press -> home -> awards -> press -> home -> press -> faq -> press -> home -> press -> home', 1267629975, '', '', ''),
(19244, '86.26.192.152', 'home -> equipment -> packages -> availability', 1267629412, '', '', 'http://www.google.co.uk/search?hl=en&source=hp&q=i+do+wedding+films&btnG=Google+Search&meta=&aq=0&oq=i+do+wedding+film'),
(19240, '74.125.16.1', 'home', 1267568727, '', '', ''),
(18750, '82.32.12.118', 'about -> home', 1267560401, '10802', '', 'http://www.google.co.uk/search?hl=en&safe=off&client=firefox-a&rls=org.mozilla%3Aen-GB%3Aofficial&hs=LLo&q=adammillbank+wedding&btnG=Search&meta=&aq=f&oq='),
(8435, '66.235.124.58', 'home -> referral', 1267551683, '2748', '', ''),
(17891, '188.220.207.232', 'home -> packages -> equipment -> home -> hdstills -> packages -> equipment -> awards -> ', 1267533702, '', '', 'http://mail.google.com/mail/?ui=2&view=bsp&ver=1qygpcgurkovy');

That help?
 
The data you exported only has unique IPs, and I assume the 'real' data has duplicated IPs from several visits?

Also, using IPs means that people on a shared internet connection won't show up as separate entries in your log so you might get slightly odd results. You would also be better off having multiple entries in a separate table for their history rather than sticking it all into the history column.

The session thing should have worked unless it was generating new session IDs each time they changed pages for some reason.
 
The data you exported only has unique IPs, and I assume the 'real' data has duplicated IPs from several visits?

Yes, I just exported the first 20 rows, but you get the idea :).

Also, using IPs means that people on a shared internet connection won't show up as separate entries in your log so you might get slightly odd results. You would also be better off having multiple entries in a separate table for their history rather than sticking it all into the history column.

The session thing should have worked unless it was generating new session IDs each time they changed pages for some reason.

Not a huge issue about the shared internet - it's a small site anyway and it's mainly to track where people have come from that request samples and what they've been up to.

The sessions worked for some, not for others, so decided to just go with IPs in the end anyway.
 
Are you using Google Analytics at all? It might do most of what you need.

How about this.. I've marked my comments with 'Pho:' - there's better ways to do this but this might speed it up enough to get it to work for you:

PHP:
<?
$sql = mysql_query("SELECT * FROM sessionlog GROUP BY IP ORDER BY date"); // Pho: get records, unique by IP

$ids_to_delete = Array();	// Pho: store IDs to delete here

while($row_change = mysql_fetch_array($sql)) { // for each record
	$firstid = $row_change['id']; // get fields
	$ip = $row_change['ip'];
	$history = $row_change['history']; // this is the browsing history
	$lastdate = $row_change['date'];

	$sql2 = mysql_query("SELECT * FROM sessionlog WHERE ip='$ip' AND ID<>$firstid"); // Pho: find all records with the same IP, but not the one we've already read above
	
	while($row_change2 = mysql_fetch_array($sql2)) { // for each matching record
			
		$curhist = $row_change2['history']; // get fields
		$curid = $row_change2['id'];
		$date = $row_change2['date'];

		$datediff = $date - $lastdate; // subtract dates (all unix)

		if($datediff > 86400){ // if result is 24 hours
			$dategap = date('d-m-Y', $row_change2['date']); // convert the date to a legible format
			$curhist = "$dategap -> $curhist"; // append the current history with the date of the visit
		}
		
		$history .= " -> $curhist"; // add history of current IP record onto the existing record  Pho: using .= means append to history
		$lastdate = $row_change2['date']; // reset the last visit date to current IP record's date

		array_push($ids_to_delete, $curid);	// Pho: add ID to deletion list
	}

	// Pho: Only update if we need to, save some time
	if ( mysql_num_rows($sql2) > 0 ) {
		$query = "UPDATE sessionlog SET history='$history' WHERE id='$firstid' LIMIT 1"; // update the history of the first record with the new history
		mysql_query($query, $link);
	}
} 


// Pho: finally clear duplicated entries
if ( count($ids_to_delete) > 0 ) {
	$ids_to_delete_list = implode(",", $ids_to_delete);
	$query = "DELETE FROM `sessionlog` WHERE id IN ($ids_to_delete_list)";
	mysql_query($query);
}
?>

edit: moved the deletion outside of the loops to save a bit more time.
 
Last edited:
Are you using Google Analytics at all? It might do most of what you need.

How about this.. I've marked my comments with 'Pho:' - there's better ways to do this but this might speed it up enough to get it to work for you:

edit: moved the deletion outside of the loops to save a bit more time.

Thanks for your help, but I still get the following error:

Fatal error: Allowed memory size of 94371840 bytes exhausted (tried to allocate 62599094 bytes) in /home/****/****/combineips2.php on line 51

:(
 
It's a very hungry script :o.

You could perhaps do it in chunks?
SELECT * FROM sessionlog GROUP BY IP ORDER BY date LIMIT 0,1000
SELECT * FROM sessionlog GROUP BY IP ORDER BY date LIMIT 1001,2000

etc

failing that if you want to zip up the whole table of data and mail it me I can see how it handles on mine.
 
You could try setting the limit manually:

Code:
ini_set("memory_limit","15M");

Still kicks up the error until I make it massive (ie. 200M) and then it kicks up a 500 Internal Server Error again.

It's a very hungry script :o.

You could perhaps do it in chunks?
SELECT * FROM sessionlog GROUP BY IP ORDER BY date LIMIT 0,1000
SELECT * FROM sessionlog GROUP BY IP ORDER BY date LIMIT 1001,2000

etc

failing that if you want to zip up the whole table of data and mail it me I can see how it handles on mine.

Doing it in chunks probably would help but then that won't combine all the matching records will it?

Sending it isn't really an option - the database logs the email addresses of customers that have contacted us etc, and while I'm sure you're very trustworthy it's usually not a good idea to send such things to people you don't know! May well be a last resort though.

What was the "better" way of doing this that you referred to?
 
I'm not entirely sure off hand, I mainly use MSSQL which would allow you to do all of this inside the database without having to leverage PHP (I'm not sure if you can anything similar with MySQL).

How many unique IPs do you have?
SELECT COUNT(0) FROM sessionlog GROUP BY IP ORDER BY date

You could nuke the email out of the log before sending it (or just remove the whole referrer column if it's in there).
 
I'm not entirely sure off hand, I mainly use MSSQL which would allow you to do all of this inside the database without having to leverage PHP (I'm not sure if you can anything similar with MySQL).

How many unique IPs do you have?
SELECT COUNT(0) FROM sessionlog GROUP BY IP ORDER BY date

4,497 total

You could nuke the email out of the log before sending it (or just remove the whole referrer column if it's in there).

May do that! It's in the sample column.
 
Just exported the database for you and it's come in at 66MB :eek:. The last export (yesterday) was 1.3MB, so the script must've been doing something, though no idea what. I've deleted and replaced the table with the 1.3MB backup now anyway.
 
4,497 total

Ah ok, so it's going to be doing 4,497 queries within the page for the 'SELECT * FROM sessionlog WHERE ip='$ip' AND ID<>$firstid' part - quite a lot :D.

Just exported the database for you and it's come in at 66MB :eek:. The last export (yesterday) was 1.3MB, so the script must've been doing something, though no idea what. I've deleted and replaced the table with the 1.3MB backup now anyway.


Heh nice. Is that compressed? PHPMyAdmin lets you export it "zipped", "gzipped" or "bzipped" (or just compress the exported .sql separately with 7-Zip). As it's mostly text it'll compress down nicely.
 
Back
Top Bottom